[JDEV] Writings from the Journal of TCharron

Scott Robinson scott at tranzoa.com
Thu Aug 5 19:09:57 CDT 1999

Interleaved response.


* Jon A. Cruz translated into ASCII [Thu, Aug 05, 1999 at 09:49:31AM -0700][<37A9C09B.3574F6CF at geocities.com>]
> Well, having any document be composed of mixed encodings might cause some
> problems, especially when it comes to where the practice differs from the theory.
> It starts to add an extra complexity that grows the chance for bugs in processing
> and other ways.

I may seem unfeeling and slightly bug-happy, but if a client cannot
implement the standard properly, then I can't feel any pity for the fact it
crashs. :) However, your point on extra complexity is well taken. It comes
down to, is there a better way of implementing alternate encodings? For the
XML itself, we've only said that clients should LOOK FOR a "charset="
property... and not that anyone is going to use it. It's the "<message
encoding=" where the real suggestion has been made. If there is anyway to
reduce the complexity we've suggested, I'd love to hear it.

> For a little hint of the complexity, just read this section of the XML spec:
> http://www.w3.org/TR/1998/REC-xml-19980210#sec-guessing
> and that's just for the few known encodings for encoding the encoding.

It's hard, but possible. The C/S will need to notice when the XML stream
ends and when it receives more data afterwards that will almost certainly be
whitespace and "<?xml". If the C/S can't figure out what the data is,
then it'll have to assume a weird charset.

I realize there is a problem in not knowning how the C/S will start the
next XML stream. Obviously, if or when internationalization is included in
the Jabber spec, we'll have to specify what happens between streams.
> One example is if a document contains an encoding that is not recognized by the
> parser. Since the encoding declarations are just plain-text labels, the parser
> might not recognize some encodings even if they are support. In any case, if the
> parser hits an unrecognized encoding, it can't handle the rest of the document,
> and would need to throw an exception. This can be worked around by some form of
> content negotiation, but that has problems also.

We have to assume a client won't be able to support all encodings. There is
nothing wrong with this. As for the throwing of the exception, it seems this
is the evil everyone is trying to avoid. Why?

> There are many other things, but just keep in mind the extra complexity that
> letting the XML doc be encoded in various formats will bring. Standardizing on
> just UTF-8 would be similar to TCP/IP protocols standardizing on network byte
> order. It just makes programming so much simpler and error-resistant.

It also leaves problems for internationalization later on. That's been shown
before. Either way, we already noted that UTF-8 and UTF-16 (as stated in the
XML spec) will be our default.

> arh14 at cornell.edu wrote:
> > I think I've deduced that we agree entirely.  *Letting* the XML doc be
> > encoded in various formats, while it doesn't necessarily help us now,
> > doesn't hurt anything (as long as everybody reads the encoding header on
> > the doc and complies).  This is separate from the encoding of the
> > actual messages, which should always be allowed to be variable, and is
> > facilitated by a concise message 'encoding="foo"' attribute.
> >
> --
> "My new computer's got the clocks, it rocks
> But it was obsolete before I opened the box" - W.A.Y.
> _______________________________________________
> jdev mailing list
> jdev at jabber.org
> http://mailman.jabber.org/listinfo/jdev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 240 bytes
Desc: not available
URL: <http://mail.jabber.org/pipermail/jdev/attachments/19990805/fc86f9db/attachment-0002.pgp>

More information about the JDev mailing list