[Standards] Binary data over XMPP

Dave Cridland dave at cridland.net
Mon Nov 5 11:40:18 UTC 2007

It seems to me that there's a number of cases where shipping binary  
blobs over XMPP is useful, and we don't want to be resorting to  
base64 every time.

I'm thinking, in particular, that this is needed for encrypted  
stanzas, images, and file transfers.

Is it worth our while to consider a single standardized mechanism for  
doing so? There's a number of ways this might work, here's one as a  
basis for discussion:

A new top-level stanza of (say) <blob/>, which much the same  
attributes as any other routable stanza, but also has an octet count.  
Upon receipt, the XML processing is suspended, and the following  
octets are handled verbatim:

<blob from='portia at example.com/court' to='shylock at example.net/court'  

I'm using characters here instead of octets for clarity, but the  
"contents" of the blob element could contain NUL octets, non-UTF-8  
data, etc. Note that I've chosen to express it as an empty element  
followed by the contents - this is primarily because I strongly  
suspect that this is simpler to process for many implementations,  
although it is distinctly un-XML-ish.

The above won't handle imagery, and other blobs that need  
referencing. There's two ways of tackling this - we either allow for  
blobs to be sent inlined with other elements (which I think would be  
difficult to handle), or else we define a new URI scheme - or reuse  
cid - and stick id and content-type attributes on <blob/>, so:

<message from='portia at example.com/court'  
to='shylock at example.net/court'>
Yo, Shylock, here's a pound of flesh.
<html xmlns='http://jabber.org/prototocol/xhtml-im'>
<body xmlns='http://www.w3.org/1999/xhtml'>
  <p>Yo, Shylock, here's a pound of flesh: <img src='cid:foo'/></p>
<blob from='portia at example.com/court' to='shylock at example.net/court'  
id='foo' octet-count='426'  
content-type='matter-transport/flesh'/>[426 octets of, presumably,  

(See RFC1437 for the top-level MIME type used).

Alternately, we might prefer that the blobs are carried on demand in  
this instance.

Finally, we should probably consider blocking and flow-control - at  
this point, I'll either suggest we examine BEEP, or else we just  
reuse what we have in IBB.

Now, we can't expect that the entire Internet will bend to our will  
and instantly upgrade, so we need a sane fallback - probably to IBB,  
or something fairly similar. The interesting question is whether we  
choose to have this negotiated end to end (which means we'll need to  
have each hop along the route tested), or whether we say that this  
down-conversion happens within servers.

Dave Cridland - mailto:dave at cridland.net - xmpp:dwd at jabber.org
  - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
  - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade

More information about the Standards mailing list