[JDEV] Writings from the Journal of TCharron

arh14 at cornell.edu arh14 at cornell.edu
Wed Aug 4 12:23:24 CDT 1999

On Wed, 4 Aug 1999, Scott Robinson wrote:

> What you said is along the lines in my head. I'll spew my thoughts some
> more, since they have been a bit more refined.
> First off, since we love being able to debug manually with telnet, the C/S
> MUST support ASCII. Moreover, since UTF-8 has ASCII and it is the XML
> standard, therefore the C/S should support UTF-8. There is really nothing
> suprising here, but I'll just put that down.
> Second, I was waiting for the proper time to discuss UNICODE... which was to
> be my suggestion. Personally, and I'll admit I have not yet screwed around
> with expat, although I've received the vibes it is quite difficult to change
> charsets in mid-stream, I believe that since the XML standard allows for a

Sorry if I'm thick, but what would be the reason for switching 
charsets in mid-stream of document parsing?  Wouldn't the entire XML doc be 
normalized to one standard, and, given a message encoding parameter, the 
client would decide what it wants to do with the normalized characters?  My 
understanding is that the XML markup itself should never deviate from a 
pre-stated charset, but the CDATA might (which, really, the parser doesn't 
care about, right?).  If a standard is set, it will ultimately be the 
client's responsibility to make sure all outgoing messages are 
normalized, and all incoming messages are reconstituted in their favorite 
Star Trek dialect.

> charset different from UTF-8, that the C/S should be able to use that
> particular feature. I would note, that if the C/S cannot understand UNICODE
> (just an example) there should be a way of saying it. ala HTTP's "Accept:
> charset/ascii, charset/utf-8" and "Deny: charset/unicode".

Should you really rely on the facility of XML to use different charsets?  
Really the only thing that needs to change charsets is the CDATA of 
users' messages.  The markup itself never needs to deviate from a set 
standard encoding.  This standard encoding should be broad enough to be 
able to store every other encoding clients might want to use.  You don't 
want to change the nature of the messenger based on the characteristics 
of the message (if that makes any sense).

> Standardizing on UNICODE, though, might be a way to go. I'm not sure, but if
> the C/S plain receives/sends ASCII, it could just convert inside and
> everyone could be happy.
> The following comments are certified werid.
> Scott.

an interloper,

More information about the JDev mailing list