[Standards] Binary data over XMPP
Dave Cridland
dave at cridland.net
Tue Nov 6 08:35:24 CST 2007
On Tue Nov 6 13:00:44 2007, Tomasz Sterna wrote:
> Dnia 05-11-2007, Pn o godzinie 16:23 +0100, Tomasz Sterna pisze:
> > Alternatively we could invent binary-2-utf mapping which has less
> > overhead than BASE64.
>
> Simplest that comes to mind:
> Let's take first 256 allowable UTF-8 characters and assign them to
> 256
> values of a single byte.
> That would be less than 33% BASE64 overhead.
>
>
Can't do that, because many of those characters are going to be
illegal even in CDATA sections.
You could take all those ones, though, and add 256 to the codepoint
value before encoding - that would - I think - be sufficient.
But bear in mind that even then, to encode a single octet will yield
between 1 and 3 characters. Encoding essentially random data - which
includes the output of any decent encryption algorithm - will encode
half the octets using 2-byte characters, yielding - on average - a
50% inflation. That's higher than base64, of course.
It's possible that a modified UTF-7 might be better. (And UTF-7,
modified or not, is acceptable UTF-8).
Dave.
--
Dave Cridland - mailto:dave at cridland.net - xmpp:dwd at jabber.org
- acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
- http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade
More information about the Standards
mailing list