[Standards-JIG] An XMPP Race Condition most Vexing

Chris Mullins chris.mullins at coversant.net
Tue Oct 24 22:32:15 UTC 2006

I just came across a very unusual race condition that likely exists in
all XMPP and Jabber servers, and I wanted to kick the idea around and
see if we can come up with a solid answer. 

Let's suppose two users are, at exactly the same time, subscribing to
each other's presence. 

So user 1 sends:
<presence to="user2 at soapbox.net" type="subscribe" />

.. and user 2 sends:
<presence to="user1 at soapbox.net" type="subscribe" />

On each side, the user gets the other's subscription request, and
immediately responds with "subscribed".
<presence to="user1 at soapbox.net" type="subscribed" />
<presence to="user2 at soapbox.net" type="subscribed" />

... and during this, the server (as required) processes the stanzas from
a particular user in order. 

The problem, and the race condition, is that the rosters will NOT end up
being correct.

At one instant, the server will modify user1's roster to be
"subscribe="from" and user2's roster to be "subscribe=to".
At the SAME instant, the server will modify user2's roster to be
"subscribe=from" and user1's roster to be "subscribe=to".

We end up then seeing a roster that looks like:
<iq id="33" type="get"><query xmlns="jabber:iq:roster" /></iq>
<iq id="33" type="result"><query xmlns="jabber:iq:roster"><item
jid="user2 at soapbox.net" name="user2" subscription="to" /></query></iq>

<iq id="34" type="get"><query xmlns="jabber:iq:roster" /></iq>
<iq id="34" type="result"><query xmlns="jabber:iq:roster"><item
jid="user1 at soapbox.net" name="user1" subscription="both" /></query></iq>

At the time each user made the change, it was correct. All packets have
been processed in order. The problem is that two packets were operating
on the same logical construct at the same time and there's no lock to
guarantee the item. 

The problem, and what we're missing in the protocol, is some sort of
lock construct around a Roster Tuple. We solve almost all of the XMPP
Race Conditions by requiring packets to be processed in order - but
anytime two users interact there needs to be additional locking to
prevent the issue. 

BTW - the odds of this ever happening in the real world, are, of course,
very, very low. But as more bots come online race conditions will
continue to be exposed.  This is especially true on very high speed
networks with multi-processor, multi-core servers and multi-processor,
multi-core clients. The degrees of parallelism that we're seeing can
be... scary. 

Any thoughts for a way to eliminate this at the protocol level? We can
obviously fix this case this case in our codebase with an "items being
edited" lock table, and acquiring and releasing locks when modifying
roster items. A more general approach would sure be nice though. 

Chris Mullins

More information about the Standards mailing list