This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

RE: SAXON and UTF-8


Michael Kay writes:
 > > Windows Notepad saves UTF8 files with Byte Order Mark, and
 > > AFAIK, the XML
 > > parser in Saxon (AElfred) doesn't support this (at least it
 > > didn't last time I checked).
 > >
 > 
 > The question is, can an XML document (or entity) in UTF-8 encoding start
 > with a BOM? The fact that Unicode allows it, and the fact that Notepad can
 > create it, doesn't make it legal XML.
 > 
 > My reading of the XML spec is that it expects to find BOM only in UTF-16
 > files. I can't see any total prohibition of a BOM in a UTF-8 file, but the
 > spec certainly seems to assume that they won't occur. If anyone thinks
 > otherwise, I'd like to see evidence from the XML specification, which is the
 > only definitive source.
 > 
 > This is of course totally off-topic for XSLT.

At the risk of straying further off topic ...

It's my understanding that UTF-8 is an 8 bit encoding in which there
are certain "prefix" octects which control the meaning of some number
of subsequent octets.

Does it make any sense for an 8 bit encoding to have a byte order
mark.  It is after all already an ordered stream of bytes.

Since this is unrelated to XSLT, please reply to me directly at

    naha@ai.mit.edu



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]