[Standards] Binary data over XMPP
dave at cridland.net
Tue Nov 6 14:35:24 UTC 2007
On Tue Nov 6 13:00:44 2007, Tomasz Sterna wrote:
> Dnia 05-11-2007, Pn o godzinie 16:23 +0100, Tomasz Sterna pisze:
> > Alternatively we could invent binary-2-utf mapping which has less
> > overhead than BASE64.
> Simplest that comes to mind:
> Let's take first 256 allowable UTF-8 characters and assign them to
> values of a single byte.
> That would be less than 33% BASE64 overhead.
Can't do that, because many of those characters are going to be
illegal even in CDATA sections.
You could take all those ones, though, and add 256 to the codepoint
value before encoding - that would - I think - be sufficient.
But bear in mind that even then, to encode a single octet will yield
between 1 and 3 characters. Encoding essentially random data - which
includes the output of any decent encryption algorithm - will encode
half the octets using 2-byte characters, yielding - on average - a
50% inflation. That's higher than base64, of course.
It's possible that a modified UTF-7 might be better. (And UTF-7,
modified or not, is acceptable UTF-8).
Dave Cridland - mailto:dave at cridland.net - xmpp:dwd at jabber.org
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade
More information about the Standards