iconv and combining characters

Bruno Haible bruno@clisp.org
Thu Jan 22 18:09:00 GMT 2004


Chris Heath wrote:
> separate codeset name for Unicode that may be non-NFC.  Something like:
>   iconv -f UTF-8-UNNORMALIZED -t L1
> This has the advantage of not having any speed/memory penalty for those
> who know their data is NFC.

That's a good suggestion. While
    iconv -f UTF-8 -t UTF-8
will remain a no-op,
    iconv -f UTF-8-UNNORMALIZED -t UTF-8
will actually be useful. UCS-4-UNNORMALIZED and UTF-16-UNNORMALIZED should
be covered similarly.

Andreas Schwab wrote:
> MacOS X uses NFD throughout.  (If you have filenames in NFC and you import
> them via NFS to MacOS X the Finder gets confused.)

Good point. This means we should also offer something like

    iconv -f UTF-8 -t UTF-8-DECOMPOSED

Thanks for the good suggestions.

Bruno



More information about the Libc-alpha mailing list