[Standards] Binary data over XMPP

Dave Cridland dave at cridland.net
Tue Nov 6 14:35:24 UTC 2007

On Tue Nov  6 13:00:44 2007, Tomasz Sterna wrote:
> Dnia 05-11-2007, Pn o godzinie 16:23 +0100, Tomasz Sterna pisze:
> > Alternatively we could invent binary-2-utf mapping which has less
> > overhead than BASE64.
> Simplest that comes to mind:
> Let's take first 256 allowable UTF-8 characters and assign them to  
> 256
> values of a single byte.
> That would be less than 33% BASE64 overhead.
Can't do that, because many of those characters are going to be  
illegal even in CDATA sections.

You could take all those ones, though, and add 256 to the codepoint  
value before encoding - that would - I think - be sufficient.

But bear in mind that even then, to encode a single octet will yield  
between 1 and 3 characters. Encoding essentially random data - which  
includes the output of any decent encryption algorithm - will encode  
half the octets using 2-byte characters, yielding - on average - a  
50% inflation. That's higher than base64, of course.

It's possible that a modified UTF-7 might be better. (And UTF-7,  
modified or not, is acceptable UTF-8).

Dave Cridland - mailto:dave at cridland.net - xmpp:dwd at jabber.org
  - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
  - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade

More information about the Standards mailing list