[Standards-JIG] XMPP bandwidth compression

Bob Wyman bob at wyman.us
Sat Jul 3 19:04:37 UTC 2004

Jean-Louis Seguineau wrote:
> Although the explanations provided in this thread are all very
> enlightening on describing the great XML futures lying in front of 
> us, I believe their implementation and mainstream usage may be a
> little far away. Some of us may have to solve immediately some
> tangible issues in the mean time... 
	I agree that what we're talking about here is, at best, "futures".
As I think has been made clear in several posts, there aren't yet many
people who are experiencing significant performance or bandwidth issues with
Jabber/XMPP. None of this stuff will make sense to work on until we actually
experience and understand real problems in the field.

> most of the discussion so far has been very server centric. I mean
> that all use cases for 'in XML compression' seemed to point to a need
> for the stream to be entirely decoded by the receiving entity (which
> is OK for a server but may not be true for an XMPP routing device).
	Yes, a broader range of use-cases should be discussed. In the
specific case of routers, it is important to specify whether we're talking
about traditional routers that are "address-oriented" and thus focus on the
outer wrappers (to, from) of a message or content-based routers that make
decisions based on the detailed contents of the message. In the latter case,
a content-based router will, in fact, typically need to parse and inspect
the entire object before it can decide where to route it. In the former
case, the question would be "How do we optimize access to the addresses that
the router is interested in?"
	Given Jabber/XMPP as it is currently defined (i.e. with XML only and
no alternative encodings), the process of extracting the "to" data from the
header of a chunk of XML involves doing string of character comparisons. You
need to recognize the type of packet (i.e. presence, message, etc.) and you
then need to dig around to find the <to/> and extract the string that is its
value. Since the addressing wrappers follow a pretty consistent format, it
would be much more efficient to encode those wrappers using a "schema-aware"
binary encoding that used length-counted strings and tagged values. Thus, a
router could, in only a few instructions, index into the "to" field, find
its length, and then just do a memcpy (in C or C++) to extract the address.
It wouldn't have to do *any* character comparisons, string matches, etc. in
order to extract the addressing information. Thus, one would expect that
such a router would be able to demonstrate performance benefits over a
text-XML only system. (Note: It would be important to try to get as much
consistency as possible in the addressing wrappers in order to avoid the
cost of packet-type detection, etc.)
	Of course, the benefits may not be significant enough to justify the
work. What I've said above is only theory -- not a proposal. We would have
to gain real experience and see real problems in the field before this stuff
is worth looking at.

		bob wyman

More information about the Standards mailing list