iconv and combining characters
Bruno Haible
bruno@clisp.org
Thu Jan 22 18:09:00 GMT 2004
Chris Heath wrote:
> separate codeset name for Unicode that may be non-NFC. Something like:
> iconv -f UTF-8-UNNORMALIZED -t L1
> This has the advantage of not having any speed/memory penalty for those
> who know their data is NFC.
That's a good suggestion. While
iconv -f UTF-8 -t UTF-8
will remain a no-op,
iconv -f UTF-8-UNNORMALIZED -t UTF-8
will actually be useful. UCS-4-UNNORMALIZED and UTF-16-UNNORMALIZED should
be covered similarly.
Andreas Schwab wrote:
> MacOS X uses NFD throughout. (If you have filenames in NFC and you import
> them via NFS to MacOS X the Finder gets confused.)
Good point. This means we should also offer something like
iconv -f UTF-8 -t UTF-8-DECOMPOSED
Thanks for the good suggestions.
Bruno
More information about the Libc-alpha
mailing list