[Council] The Jabber Transport, and File Transfer Too

Mike Lin mlin at mlin.net
Sun Jun 9 22:22:19 CDT 2002


Hey guys, 

So there has been this debate on JDEV about whether file transfers
should be done in-band over a Jabber session, in a parallel Jabber
session, or totally out of band with some other protocol. Things have
pretty much degenerated into an unfortunate flame war, as they tend to
do, but I would like to describe for the record the standards-related
thinking about the Jabber transport that has emerged through my working
with various groups and individuals over the past 12 months. I will use
the first-person pronouns because no doubt I will incorrectly state some
of the ideas that I've been learning. However, few of these ideas are
really mine, but rather those of the really smart guys in this group. It
would take a long time to give credit to everyone, but the principals
are Jeremie, Julian, Chet Murthy here at IBM Internet Technology where I
intern, Dirk way back at JabberCon, and Joe Hildebrand, indirectly
through his Jabber.NET code. And Max Metral through skillfully sticking
to his guns. Thanks guys.

I would like to propose some specific technical steps forward which I
believe are critical for Jabber to reach its full potential. I'll
explain these steps, what their implications are, and how I believe they
back up the position I have been arguing in this recent debate. 

1. Length Prefixed Framing 

Come on. We've been pushing this one since before JabberCon. All Jabber
packets should be prefixed in the wire protocol by their byte lengths.
This is just a massively better way to do things. It frees you from
dependence on the capabilities of your XML parsing suite. It allows you
to read packets into statically sized buffers. It opens the possibility
of additional functionality that I will describe. The only reason
sendmail can afford not to do this is because it spools everything to
disk anyway. See JEP-0017, XATP, and BEEP for some suggestions of how to
do this. 

2. Maximum Packet Size 

Now that the server knows the size of a packet before it receives it, it
is practical to assign a maximum packet size, beyond which packets will
not be accepted. It is critical for low-latency (=> high throughput)
routing that the server minimize its disk access (either by explicit
file buffering or O/S paging) while handling packets. Length prefixed
framing combined with a maximum packet size is the only reliable way to
ensure this. Where memory requirements are known ahead of time, the
server can decide exactly when it is necessary to buffer a packet on
disk, and when it can speedily shuffle everything in physical RAM. 

The length prefixed framing + maximum packet size also means that the
server has decent approximate knowledge of its bandwidth requirements in
the near future. This is extremely valuable for building a daemon that
performs and scales. 

It would be nice for all Jabber nodes to standardize on one maximum
packet size, however, this seems impractical. So information about
maximum packet size should be exchanged in the XML Streams handshake.
This gets a bit sticky when you begin introducing multi-hop S2S routing,
but this issue does not seem unresolvable. 

3. Inline Binary Data 

It is impractical to encode large amounts of binary data into base64 and
then transmit it as an XML element. Jeremie has proposed XML Inline
Binary (XIB), which establishes a new "compiled" version of XML that
contains inline binary data, and defines a correspondence between this
form and a standard XML-compliant,encoded form. Meanwhile, a W3C Note
proposes using MIME to attach binary data to SOAP messages using entity
references. 

I submit that XIB solves the immediate problem of avoiding the encoding
of binary data, but ultimately the SOAP Attachments model is more useful
because it removes the requirement that the server receive all the huge
binary data of the packet before it knows its XML structure. The SOAP
Attachments model even sets up for the possibility that the binary
attachment data could be zero-copy streamed. 

However, SOAP Attachments' use of MIME to frame its data blocks is
probably undesirable for our wire protocol, because it is complicated.
Given that we have this XML parsing infrastructure in place, we should
use it as far as practical, and do minimal work otherwise. But XIB is
too far. 

All these ideas coalesce into... 

4. Next Generation Jabber Wire Protocol 

I propose a new binary wire protocol with all the attributes of the one
I am about to describe. I am not here describing _the_ wire protocol to
adopt, but merely one that illustrates desirable features. 

-- Begin Speculative Wire Protocol -- 

A four-byte unsigned integer encoding the total length of the packet
precedes each packet. 

Each packet may be subdivided into several parts. A packet that is not
subdivided has one part. These parts may either be XML markup, or else
opaque binary data. Each part is prefixed by a four-byte header which
encodes three pieces of information: 

a) A "more bit", which is 1 if more parts of this packet follow or 0 if
this is the last part. If a part has more bit = 0, then the next four
bytes on the stream after the part are the length of the next packet,
followed by that packet's first part. 

b) A 7-bit type code, which specifies the type of the data contained in
the part. Initially, only XML and opaque binary data type codes will be
used, leaving 126 other possibilities, although I don't forsee a need
for them. The server "should" verify that the payloads of all XML parts
are indeed well-formed XML, and "may" reject the packet if they are not;
although these may be configuration options depending on performance
requirements. Due to heavy computational requirements, servers should
not be required to validate XML payloads against a schema for the
forseeable future, although this option may be enabled for reference
implementation and testing. The MIME-type of opaque binary data may be
specified in a manner to be described in a moment.

c) And a 3-byte length tag, which encodes the length of the part, up to
16MB. Binary attachments larger than this thus must be chunked into
parts no larger than this, and implementations are encouraged to align
chunking on word or even page boundaries. 

The first part of a packet is required to contain XML Stream traffic,
and contains the usual Jabber protocol element that we are used to.
(Most Jabber packets will thus have one part, with the more bit = 0,
type code = XML, and some length.) The XML in this part may reference an
attachment using some XML namespace which specifies the sequence number,
or range of sequence numbers if chunked, of the attachment, and its
MIME-type. 

Packets under this protocol can encode Jabber protocol elements up to
16MB in size, and have up to 127 attachments up to 16MB in size.
Realistic packet sizes should not approach these theoretical encodable
limits in the immediately forseeable future; but these limits are robust
enough so that this wire protocol can be efficiently used for a long
time.

The server operator may choose to apply limits to the maximum part size,
the maximum total packet size, or both. The server operator may also
choose to allow only XML payloads, and/or XML payloads of specific
namespaces only. However, since all lengths are known ahead of time, the
server's I/O should be very fast and manageable, so server operators are
encouraged to apply liberal maximum packet sizes. Implementations should
be intelligent about throttling down I/O for large packets while under
heavy load, giving priority to smaller XML-only packets which are more
likely to be related to presence and messaging applications.

There is no defined full-fidelity way to transform this wire
representation into a well-formed XML document. I submit that this is
unecessary and perhaps even undesirable. It may, however, be possible to
define a correspondence with SOAP Attachments.

-- End Speculative Wire Protocol -- 

Again, this wire protocol is only meant to be illustrative of desirable
features, and no doubt many people will not like it. This could probably
be built on BEEP just as well. This particular protocol has more than
anything else been designed for ruthless efficiency. All headers are
fixed length words; they need none of the loose string processing
required by MIME. It is simple enough that it can be easily, reliably,
and efficiently implemented as an event-driven finite state machine -
suitable for running in kernel mode. It keeps well-bounded the size of
each packet and its parts so that the recipient can optimize I/O. The
large binary data does not need to be loaded into an XML parser. The
fact that the XML containing the routing information is fully received
before any of the large binary data means that the server can begin
processing the packet before the large binary data is fully received,
and the possibility even exists for the server to zero-copy stream the
binary data to its destination under certain conditions. Finally, it
allows the server operator to execute fairly fine-grained control over
the size and content of data transmitted in-band. These are the
attributes we need from our binary wire protocol.

A key concept is that this wire protocol separates payload framing,
which is best done in a terse, efficient binary format, from routing and
semantic structural information, which arguably is best done as
extensible XML. It may also be desirable in a future proposal to
separate XML envelope (routing) information from payload content, even
if that payload content is XML.

Some may initially see the use of a binary wire protocol for Jabber as a
step in the wrong direction, since it is desirable for our wire protocol
to simply be a streaming XML document. This binary wire protocol
reflects a basic conclusion that restricting Jabber's wire protocol to a
well-formed XML document limits Jabber's potential, since it makes it
impractical to transport large payloads in-band due to expensive
encoding, poor framing, and other performance penalties. This
disadvantage is not only applicable to binary payloads; it is arguably
even more undesirable for the server to have to load into a DOM any
complex XML markup while only the envelope information is relevant. I
have instead introduced a simple binary wire protocol that provides as
much help as possible to Jabber servers to efficiently transport
messages of almost arbitrary content and size. Highly structured data
like web-service queries may still be transported as inline XML wrapped
in a very simple frame; but now the possibility exists to transmit
arbitrary binary data to anyone with a Jabber ID. 

So now I will address the question of, why the hell would you want to do
that? 

The first thing I would like to note is that the changes I've proposed
are not _that_ difficult. Simple protocol elements, other XML markup,
and small attachments are wrapped in an 8-byte header which is not at
all difficult to build, and even easier to deconstruct. Large binary
payloads may have to be chunked in separate parts or packets, or sent
over a separate session in order to achieve messaging concurrency; but
the algorithms for doing so are readily intuitive.

This is not really so hard. I would even say that it is much easier than
building an FTP client and server into Jabber client software, which is
what some have proposed for file transfer. This is just a nightmare.
Even if you were willing to handle the complexity of driving this
stateful session in this foreign protocol, you would still have to jump
through absurd hoops to make this work through firewalls. If you need to
send a gigabyte, fine, you'll want to squeeze every last drop out of the
network, so go out of band. For any reasonably sized data (documents,
pictures, and some day not too far away, maybe even streaming video), we
should not have to do all this work. 

This brings us to the most important point. Transporting everything
in-band means we adopt all the rich routing capabilites of that band.
Once we do this, all this trouble of PASS and proxy servers and NATs
simply goes away, because the Jabber servers should all be configured
correctly. The protocol provides sufficient read-ahead hints for Jabber
servers to be quite efficient about moving data along. If the bandwidth
requirements are too intensive, server operators have many knobs and
dials they can adjust to fine-tune the allowable bandwidth usage - well
beyond the capabilities of today's server karma. This is really cool
stuff, and I think it'll work. 

The most critical problem that remains to be solved for this or any
other proposal moving forward is security. The recent work with SASL has
been great, but client-server authentication is not the real problem.
Dialback today is somewhat acceptable for preventing packet forging;
however, it has difficulty with firewalls and is insufficient for
multi-hop routing. Totally end-to-end trust based on PKI will be
necessary but probably not sufficient moving forward.

I would like to conclude by again noting that although I've been using
"I"'s throughout, few of these ideas are originally mine; I simply
didn't want to misrepresent any of my own transcription errors. I
believe that the steps outlined here form the critical basis for Jabber
to become a highly scalable, universal messaging sytem.

-Mike




More information about the Council mailing list