[JDEV] Writings from the Journal of TCharron

arh14 at cornell.edu arh14 at cornell.edu
Thu Aug 5 12:04:09 CDT 1999

Well, I was of the opinion that if exploiting the capability of XML to be 
encoded in various charsets added negligable complexity, then go for it.  
If it is actually non-trivial then I don't mind not supporting it.  I 
just wanted to be sure that, regardless of the encoding of the XML 
document itself, the message content could be transported independently 
in whatever encoding clients wished.


On Thu, 5 Aug 1999, Jon A. Cruz wrote:

> Well, having any document be composed of mixed encodings might cause some
> problems, especially when it comes to where the practice differs from the theory.
> It starts to add an extra complexity that grows the chance for bugs in processing
> and other ways.
> For a little hint of the complexity, just read this section of the XML spec:
> http://www.w3.org/TR/1998/REC-xml-19980210#sec-guessing
> and that's just for the few known encodings for encoding the encoding.
> One example is if a document contains an encoding that is not recognized by the
> parser. Since the encoding declarations are just plain-text labels, the parser
> might not recognize some encodings even if they are support. In any case, if the
> parser hits an unrecognized encoding, it can't handle the rest of the document,
> and would need to throw an exception. This can be worked around by some form of
> content negotiation, but that has problems also.
> There are many other things, but just keep in mind the extra complexity that
> letting the XML doc be encoded in various formats will bring. Standardizing on
> just UTF-8 would be similar to TCP/IP protocols standardizing on network byte
> order. It just makes programming so much simpler and error-resistant.
> arh14 at cornell.edu wrote:
> > I think I've deduced that we agree entirely.  *Letting* the XML doc be
> > encoded in various formats, while it doesn't necessarily help us now,
> > doesn't hurt anything (as long as everybody reads the encoding header on
> > the doc and complies).  This is separate from the encoding of the
> > actual messages, which should always be allowed to be variable, and is
> > facilitated by a concise message 'encoding="foo"' attribute.
> >
