[Standards] [Operators] Future of XMPP Re: The Google issue

Ivan Vučica ivan at vucica.net
Wed Dec 4 16:18:09 UTC 2013

Alright, since XMPP 2.0 was mentioned, here's a few thoughts on
compatibility-breaking changes that would be scratching a few itches I have
with today's XMPP. I've read about some of them elsewhere; sorry for not
listing source material, as any reading I did was done months ago.

In an XMPP 2.0, being encoded with XML should not be the first problem to
solve. First problem to solve should be gripes large installations have
when it comes to federation and load balancing.

There aren't many servers with thousands of concurrent users behind a
single domain, and it seems to me there is a good reason: it's not
something supported by XMPP itself. Large installations have separate
backend infrastructure which often uses different payload delivery
mechanisms, unrelated to XMPP itself. I'm not familiar with what ejabberd
does, but Facebook quite openly states that their internal servers don't
speak XMPP.

How to approach this? I don't know. Here's a thought. Since XMPP does make
significant use of DNS, does have s2s and components, how about some type
of "connection-handling slaves"? Have the initial c2s connection redirected
to a slave specific to a JID. Have the possibility of a received s2s stanza
being distributed to appropriate c2s, but have the receiving server also
respond with "in future, for this JID, talk to this slave".

Another issue that should somehow be elegantly solved in the core protocol
is reducing chattiness through presence filtering and stanza multicasting.
In s2s connections, why not introduce the concept of "Here's a list of all
targets for the following iq stanza" or "Please deliver this presence
stanza to everyone in the 'from' list".

Finally, when implementing a client, we have XML namespaces and their
inconsistent (or with some parsers hard-to-do) implementation. Namespaces
are an awesome solution, but since they are not implemented completely
consistently, XMPP 2.0 would have to ensure that their value is more
obvious and better tested with a compliance suite.

If XMPP didn't depend on some XMLisms like namespaces, it'd be easy to
switch to a different transport mechanism, if someone prefers it. JSON?
Plists? Protobufs? Custom binary? Doesn't matter. Whether XML is used is, I
think, far from the most troublesome problem with deploying large XMPP
installations and federating. If you want to scale, you have to use
non-standardized solutions that are not supported by a lot of otherwise
interesting server software.

On Wed, Dec 4, 2013 at 2:44 PM, Dave Cridland <dave at cridland.net> wrote:

> (Switching list, CCing Alexander)
> On Wed, Dec 4, 2013 at 1:56 PM, Alexander Holler <holler at ahsoftware.de>wrote:
>> Am 04.12.2013 14:05, schrieb Ralph Meijer:
>>  Alternatively, it makes total sense to use a different protocol on PANs
>>> and/or LANs and then bridge it to XMPP for WAN transport. For example,
>>> Peter Waher is working on bridging MQTT and XMPP, and MQTT also has a
>>> special profile for sensor networks based on non-TCP/IP settings, like
>>> Zigbee.
>> I would prefer to make a clean cut and to develop something like XMPP 2.0
>> or similiar which got rid of XML in favor of some header based protocol
>> (e.g. protocol buffers or even something as simple like
>> <type><length><optional_hash>content (in binary form, a bit more would
>> be needed to enable nested types, but it's just to express how it should
>> have been done).
>> I think it's relatively easy to exchange the XML-based parts of current
>> XMPP-implementation to something like protocol buffers. All the concepts
>> and other stuff would still work, but the really ugly thing of parsing
>> stream based XML would be gone.
> XML parsers are really fast, and those designed for XMPP, or at least,
> those designed with XMPP in mind, are particularly fast for XML Stream
> processing.
> There *is* an argument that XML makes transporting pure binary hard, but
> quite honestly if we wanted to have arbitrary binary sections, we'd be
> pretty much forced into using a very different conceptual structure.
> One option, though, is EXI, which "knows" - with some encouragement - to
> ship values as binary even though in traditional XML serialization, they'd
> be base64 encoded. My only worry is that the level of benefit that this
> gives is rapidly eroded by how good XML parsers have got, especially when
> you consider the overhead that known-schema causes to the complexity of the
> protocol.
> The problem with trying to switch wholesale to an entirely non-XML
> protocol is that any attempt to maintain transparent compatibility with the
> XML-XMPP means having a common model expressible in either XML or some
> other format. There are attempts to do this (EXI is arguably one, XER (XML
> Encoding Rules, X.693 if I recall) is another path), but in general they're
> hopelessly inefficient and ugly unless you *also* have schema awareness at
> both ends.
> XMPP is *not* a hard-schema protocol for the most part - we can and do
> cheerfully sling extra elements and attributes in all over the shop. Only
> the core is hard - that is, the stanzas and stream - the rest should be
> considered simply "complete as far as they go".
> A final problem with protocol buffers and similar concepts is that they're
> binary, and therefore knock out a whole range of applications, such as
> javascript environments. This may well be a short-term view, though, as
> Javascript probably has gained some binary handling already.
>> Especially the problem that you need to parse the whole stream until you
>> even know how long a packet (stanza) is, is a very ugly concept. Together
>> with the surrounding <stream:stream> this is imho something never should
>> have been done. XML was designed for documents (of fixed sizes, e.g. you
>> get the size from the file system), but not for streams.
> I'm assuming you mean the self-framing thing here. "Parse" is a very
> loaded word. You can pretty much lex most of it, especially if at that
> layer in the server you don't care too much about well-formedness. There's
> a good argument that you should be strict about well-formedness only on
> output, anyway. A good (for XMPP) XML parser will cheerfully do this for
> you, and do it all so fast any additional overhead is just not worth
> worrying about.
> If I'd been there in 1999, I would have argued strenuously against
> self-framing XML. I agree whole heartedly it was a design error, and I
> would have gone for a split between "header" and "body", and had octet
> counting all the way. But it's now a solved problem, and I don't even blink
> anymore.
> Actually, there are arguments in favour of the way we do things, such as
> being able to serialize to XML directly on output, instead of having to
> serialize entire stanzas and count the bytes before transmission - I'm
> wouldn't claim these to be overwhelming arguments in the case of XMPP, but
> I've seen them made for plenty of other cases. HTTP's chunked transfer
> encoding is a result of this kind of argument.
>> Using another port, that would even be downwards compatible.
> Putting aside my dispute with that "downwards compatible" claim above, I
> think the notion of running an "XMPP 2.0" clean redesign just isn't a
> practical concept. It'd be very interesting from the point of view of a
> thought experiment in protocol design, but utterly useless in terms of
> realistic deployment.
>> What would be left, is to modify the presence stuff to get rid of the
>> need for ever lasting (tcp) connections.
> Actually, presence hasn't required everlasting TCP connections for years.
> Between BOSH and XEP-0198, that's again a solved problem.
> For anywhere I've said "solved problem", you're free to substitute the
> words "case where the state of the art has mitigated the problem to the
> point the incremental gain from a fuller, but more drastic, solution is no
> longer worthwhile".
> Dave.

Ivan Vučica
ivan at vucica.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.jabber.org/pipermail/standards/attachments/20131204/9761c032/attachment.html>

More information about the Standards mailing list