[Standards] Binary data over XMPP

Rachel Blackman rcb at ceruleanstudios.com
Fri Nov 9 18:37:08 UTC 2007

On Nov 9, 2007, at 10:27 AM, Rachel Blackman wrote:

>>> On Nov 9, 2007, at 8:47 AM, Tobias Markmann wrote:
>>>> There are already several binary-to-text encodings which perform  
>>>> a bit
>>>> better than Base64, two of them are:
>>>> 1. http://en.wikipedia.org/wiki/ASCII85 invented by Adobe
>>>> 2. http://base91.sourceforge.net/
>>> Both of those seem to allow < and &, which make them less than ideal
>>> for embedding in XML.
>> "XMPP is not XML" :-)))
> No.  But just because a is not b does not imply that b is not a.   
> XMPP is a /subset/ of XML: all XML is not valid XMPP, but all XMPP  
> is (or should be) valid XML when the session is taken as a  
> document.  :)
> Both from a design standpoint, and a practical standpoint (re-using  
> existing XML parsers for XMPP is easy given that XMPP obeys a subset  
> of the XML rules).  So one would think that < and & are still  
> equally important not to have appearing raw in an XMPP stream.

On top of which, if you modify the XMPP stream/parser rules to allow  
raw & and < in a stream you really have to roll your own parser  
anyway.  So at that point, why the hell not just send the raw binary  
blob rather than trying to needlessly encode it?

I mean, if you are completely throwing out the idea and redoing how  
streams work, why do it halfway?  Why change it so that you can allow  
< and & raw in a stream, just so that you can shave a few bytes off by  
replacing BASE64?  Let's just go to a completely-binary protocol like  
AIM's OSCAR; it opens up a lot of doors without having to worry about  
parsing rules.  Just define a binary packet format with a header and a  
length field and hey, we're good to go on whatever!

Facetious comments aside, my point is that if we're talking about  
modifying how the XMPP parser works, why bother doing things halfway  
with little workarounds?  Throw out XMPP 1.0 entirely and come up with  
an extensible 2.0 binary protocol.

If we like to chant the 'XMPP is not really XML' mantra and the 'we  
must shave off every byte we can to spare the poor mobile users'  
mantras, that's great.  But considering we only have 3 actual main  
stanza types, a purely binary (and not necessarily XML-related)  
protocol would be more efficient.  And if we're going to break the  
world by changing how XMPP parsing works, then why on earth would we  
go through the pain of breaking our protocol to glue the ability to  
include a few extra characters in just to go ASCII85 or BASE91 instead  
of BASE64?

I think we've lost sight of whatever the original problem we were  
trying to solve was (inline images?  Size of binary blobs to mobiles?)  
and have become caught up in hypothetical solutions which may no  
longer be directly connected to the issue.  :)

Rachel Blackman <rcb at ceruleanstudios.com>
Trillian Messenger - http://www.trillianastra.com/

More information about the Standards mailing list