SignBank loading: SignWriting: "AttributeError: 'numpy.ndarray' object has no attribute 'decode'" #70

cleong110 · 2024-05-30T16:58:10Z

This snippet from the example Colab notebook causes an AttributeError.

signbank = tfds.load(name='sign_bank')

for datum in itertools.islice(signbank["train"], 0, 10):
  print(datum['id'].numpy().decode('utf-8'), datum['sign_writing'].numpy().decode('utf-8'), [f.decode('utf-8') for f in datum['terms'].numpy()])

Rewriting it to be three print statements localizes to sign_writing

It seems this is because that is actually an array of shape (1,), rather than being bytes. Taking the first element, THEN calling decode works

Compare rwth-phoenix-weather-2014t

The text was updated successfully, but these errors were encountered:

cleong110 · 2024-05-30T17:12:51Z

Checking the first 5k data in the dataset, it seems there can be 0, 1, or 2 items.

Looking at the source code, we also see that the Feature is a Sequence
https://github.com/sign-language-processing/datasets/blob/master/sign_language_datasets/datasets/signbank/signbank.py#L198

cleong110 · 2024-05-30T17:13:32Z

I wonder if we can make it so that the library just automatically detects internally if it's a Sequence and prints accordingly?

cleong110 · 2024-05-30T17:18:13Z

The quick fix for this issue would be to simply edit the example notebook with a note, maybe something like:

signbank = tfds.load(name='sign_bank')

for datum in itertools.islice(signbank["train"], 0, 10):
  print(datum['id'].numpy().decode('utf-8'))
  for signwriting_item in datum["sign_writing"]: # This feature is a Sequence of strings
    print(signwriting_item.numpy().decode('utf-8'))
  print([f.decode('utf-8') for f in datum['terms'].numpy()])

cleong110 · 2024-05-30T17:21:03Z

My notebook where I test downloading SignBank: https://colab.research.google.com/drive/1hs_UjwKv_mMxZvtittI4AD--SA6cpT5k?usp=sharing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SignBank loading: SignWriting: "AttributeError: 'numpy.ndarray' object has no attribute 'decode'" #70

SignBank loading: SignWriting: "AttributeError: 'numpy.ndarray' object has no attribute 'decode'" #70

cleong110 commented May 30, 2024

cleong110 commented May 30, 2024

cleong110 commented May 30, 2024

cleong110 commented May 30, 2024 •

edited

Loading

cleong110 commented May 30, 2024

SignBank loading: SignWriting: "AttributeError: 'numpy.ndarray' object has no attribute 'decode'" #70

SignBank loading: SignWriting: "AttributeError: 'numpy.ndarray' object has no attribute 'decode'" #70

Comments

cleong110 commented May 30, 2024

cleong110 commented May 30, 2024

cleong110 commented May 30, 2024

cleong110 commented May 30, 2024 • edited Loading

cleong110 commented May 30, 2024

cleong110 commented May 30, 2024 •

edited

Loading