NFC normalize strings?

The NUG indicates that strings (dimension and variable names, anyway) should be NFC normalized.

"""
... names are normalized according to Unicode NFC normalization rules during encoding as UTF-8 for storing in the file header. This is necessary to ensure that gratuitous differences in the representation of Unicode names do not cause anomalies in comparing files and querying data objects by name.
"""

(and next CF release will specify NFC normalization for all text)

But as far as I can tell, netCDF4 isn't doing that. It probably should.

I think it may be as easy as adding:

```
import unicodedata
pystr = unicodedata.normalize('NFC', pystr)
```
to `_strencode()`

Granted -- this does mean that users may  get something slightly different back when they round-trip a anme through netcdf.

If that's a concern, the you could call `unicodedata.is_normalized`, and raiae an error instead.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NFC normalize strings? #1379

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

NFC normalize strings? #1379

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions