[Standards] Binary data over XMPP
tomek at xiaoka.com
Tue Nov 6 14:46:32 UTC 2007
Dnia 06-11-2007, Wt o godzinie 14:35 +0000, Dave Cridland pisze:
> > Let's take first 256 allowable UTF-8 characters [...]
> Can't do that, because many of those characters are going to be
> illegal even in CDATA sections.
First _allowable_ 256 UTF-8 characters are for sure legal in CDATA
> But bear in mind that even then, to encode a single octet will yield
> between 1 and 3 characters.
I would only use those UTF-8 characters that maps to maximum 2 bytes.
Leaving the 3byte and more...
And a better mapping:
Bytes that are valid UTF-8 characters are mapped 1 to 1.
Only the invalid ones are mapped to 2byte characters.
This way if the "binary" data is ASCII text, it stays human readable.
This is a simple 256 rows translation table, that could be defined
/\_./o__ Tomasz Sterna
._.(_.)_ XMPP: smoku at xiaoka.com
More information about the Standards