[Standards] Binary data over XMPP

Dave Cridland dave at cridland.net
Tue Nov 6 14:35:24 UTC 2007


On Tue Nov  6 13:00:44 2007, Tomasz Sterna wrote:
> Dnia 05-11-2007, Pn o godzinie 16:23 +0100, Tomasz Sterna pisze:
> > Alternatively we could invent binary-2-utf mapping which has less
> > overhead than BASE64.
> 
> Simplest that comes to mind:
> Let's take first 256 allowable UTF-8 characters and assign them to  
> 256
> values of a single byte.
> That would be less than 33% BASE64 overhead.
> 
> 
Can't do that, because many of those characters are going to be  
illegal even in CDATA sections.

You could take all those ones, though, and add 256 to the codepoint  
value before encoding - that would - I think - be sufficient.

But bear in mind that even then, to encode a single octet will yield  
between 1 and 3 characters. Encoding essentially random data - which  
includes the output of any decent encryption algorithm - will encode  
half the octets using 2-byte characters, yielding - on average - a  
50% inflation. That's higher than base64, of course.

It's possible that a modified UTF-7 might be better. (And UTF-7,  
modified or not, is acceptable UTF-8).

Dave.
-- 
Dave Cridland - mailto:dave at cridland.net - xmpp:dwd at jabber.org
  - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
  - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade



More information about the Standards mailing list