Skip to content

NFC normalize strings? #1379

Open
Open
@ChrisBarker-NOAA

Description

@ChrisBarker-NOAA

The NUG indicates that strings (dimension and variable names, anyway) should be NFC normalized.

"""
... names are normalized according to Unicode NFC normalization rules during encoding as UTF-8 for storing in the file header. This is necessary to ensure that gratuitous differences in the representation of Unicode names do not cause anomalies in comparing files and querying data objects by name.
"""

(and next CF release will specify NFC normalization for all text)

But as far as I can tell, netCDF4 isn't doing that. It probably should.

I think it may be as easy as adding:

import unicodedata
pystr = unicodedata.normalize('NFC', pystr)

to _strencode()

Granted -- this does mean that users may get something slightly different back when they round-trip a anme through netcdf.

If that's a concern, the you could call unicodedata.is_normalized, and raiae an error instead.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions