[Standards] Binary data over XMPP

Tomasz Sterna tomek at xiaoka.com
Tue Nov 6 14:46:32 UTC 2007

Dnia 06-11-2007, Wt o godzinie 14:35 +0000, Dave Cridland pisze:
> > Let's take first 256 allowable UTF-8 characters [...]

> Can't do that, because many of those characters are going to be  
> illegal even in CDATA sections.

First _allowable_ 256 UTF-8 characters are for sure legal in CDATA

> But bear in mind that even then, to encode a single octet will yield  
> between 1 and 3 characters.

I would only use those UTF-8 characters that maps to maximum 2 bytes.
Leaving the 3byte and more...

And a better mapping:
Bytes that are valid UTF-8 characters are mapped 1 to 1.
Only the invalid ones are mapped to 2byte characters.

This way if the "binary" data is ASCII text, it stays human readable.

This is a simple 256 rows translation table, that could be defined

  /\_./o__ Tomasz Sterna
 (/^/(_^^'  Xiaoka.com
._.(_.)_  XMPP: smoku at xiaoka.com

More information about the Standards mailing list