[Standards] Binary data over XMPP

Dave Cridland dave at cridland.net
Tue Nov 6 10:16:19 UTC 2007

Forgive me for sounding like an idiot, but I seem to be missing the  
point here:

On Mon Nov  5 17:45:53 2007, Rachel Blackman wrote:
> Expat (a fairly common XML parser out there) will do the job just   
> fine.  Your network engine has to separate each stanza out, sure,  
> but  that's not hard.  And then you can pass each stanza unaltered  
> through  expat and get back your usual XML structures.

Is this saying that given a string containing multiple stanzas, you  
need to seperate them out into one stanza per string, before feeding  
them in? I thought that with a SAX-like XML parser, you needn't  
bother doing that.

>   You would no longer be  able to do that with binary blobs; you  
> would have to special-case blob  stanzas fairly heavily, since I  
> guarantee you that if the characters  '<' or '>' appear un-escaped  
> in the binary blob, Expat will choke and  die.
Sure, but there's two options with an escaping mechanism - either  
synchronized or non-synchronized - and they can be negotiated easily.

With a non-synch mechanism, the sender just sends out the <blob/>  
element, then sends out the binary data, then continues with XML. It  
can be done in a single TCP packet, but it requires that the receiver  
processes the data into stanzas prior to processing through the XML  
parser. Some receivers already do this, so it seems reasonable that  
this can be an option.

With a synch mechanism, the sender sends out a <blob/> element, and  
then waits. The receiver then says it's ready for binary data  
(sending a stanza to indicate this), and the sender then sends the  
binary data - followed immediately by more XML as required, since a  
"binary parser" is going to be octet counting anyway. For people who  
parse all the network traffic at once through a SAX-like parser, this  
should work fine, at the expense of some efficiency.

Note that anyone can send non-synchronized blobs, but not everyone  
can receive them, so a client (for instance) which is built to stream  
network data directly into a SAX parser can still *send* blobs  

> If we really need a non-BASE64 method of sending binary data  
> between  clients, I suggest we re-use Jingle.  That already is a  
> mechanism for  negotiation of 'I want to send you this type of  
> data, how do I get it  to you?'  There's very few cases I can think  
> of where we would want to  be sending binary blobs in a  
> server-cached manner anyway.

Server-proxied, not cached.

This implies that encrypted chat sessions don't go via the server,  
for example, meaning that a client intending to encrypt all  
conversations by default is going to use XMPP purely as a session  
initiation protocol, and lose all efficiency (and a degree of  
privacy) as a result. Or else it'll be base64 encoding the entire  
conversation, and lose efficiency that way.

Either way, it will directly impact the usage of encryption - and  
that's ignoring the other ways that binary data is commonly used  
within XMPP.

Dave Cridland - mailto:dave at cridland.net - xmpp:dwd at jabber.org
  - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
  - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade

More information about the Standards mailing list