Description
The NUG indicates that strings (dimension and variable names, anyway) should be NFC normalized.
"""
... names are normalized according to Unicode NFC normalization rules during encoding as UTF-8 for storing in the file header. This is necessary to ensure that gratuitous differences in the representation of Unicode names do not cause anomalies in comparing files and querying data objects by name.
"""
(and next CF release will specify NFC normalization for all text)
But as far as I can tell, netCDF4 isn't doing that. It probably should.
I think it may be as easy as adding:
import unicodedata
pystr = unicodedata.normalize('NFC', pystr)
to _strencode()
Granted -- this does mean that users may get something slightly different back when they round-trip a anme through netcdf.
If that's a concern, the you could call unicodedata.is_normalized
, and raiae an error instead.