[Standards] Reliable message delivery (XEP-0198 and XEP-0184)
dave at cridland.net
Wed Oct 27 15:38:36 UTC 2010
On Wed Oct 27 15:38:40 2010, Simon Tennant (buddycloud) wrote:
> I'm aware that Prosody has an implementation of this and that
> M-Link has also been working on XEP-0198 support.
Matt and I have even done some experiments with XEP-0198 over S2S
sessions, too. Worth observing that in this case, a severed S2S would
cause the sending server to retransmit the "lost" stanzas cleanly,
without the client even being aware of the issue.
> Is there a way that we can avoid multiple implementations by
> addressing the Mickaël's issues
> https://support.process-one.net/browse/EJAB-532 (message replay,
> mixing of concepts)?
I don't think that 198 does mix concepts, from a technical standpoint.
In particular, it only provides stanza reliability as a by-product of
providing stream resilience; it was designed to provide stream
resumption in the face of severed TCP sessions (such as those due to
mobile gateways and home-hub NATs). As I recall, much of the effort
went into avoiding replays, and acks essentially fall out of that.
In order to provide this single feature, both ends need to have a
clear understanding of the precise point the stream severed.
In order to do this in turn, both ends maintain a counter of how many
stanzas they have sent and received, and communicate this on
resumption so as to (deliberately) replay the lost stanzas.
This replay requires that the sender store every stanza transmitted
in order - to reduce this storage, 198 include in-stream acking to
allow a sender to discard stanzas from this store.
If there are additional features conflated in, it's the throttling,
but I'm not clear anyone's actually implementing that.
Now, replay can indeed occur with XEP-0198 - if a stream is dropped,
then any unacked stanzas (really, the portion of the stream that is
unacked), are replayed - the problem is that if a "proper" resumption
fails for whatever reason, then the sender has to assume for safety
that the unacked stanzas were not received the first time, and
therefore resends unilaterally, potentially causing duplicates.
Effectively, therefore, 198 replaces loss with duplication; the
assumption is that duplication is better than loss in the general
That's it - there is nothing more to 198 than that.
There are some key things here, though - receiving an ack tells you
nothing about whether the recipient will receive it. It certainly
doesn't tell you if they have. But then, it's not meant to - if you
want to know that for certain, use XEP-0184.
Secondly, the acks (and negotiation) are top-level elements, not part
of stanzas, because they are properties and messages of the stream
itself, and not routable content. Because this doesn't deal with
anything beyond the hop-by-hop case, there's no need to discuss
intermediate entities - there are none.
Thirdly, the acks really are precisely as fine-grained as needs be -
receivers can only ack to a checkpoint on the stream, and that
checkpoint is measured in stanzas. The protocol would either not
work, or be hopelessly complex, if we allowed out-of-order acks.
Luckily, XEP-0184 provides these for messages, and moreover in
conjunction with XEP-0198 both the message and the receipt are very
likely to get there and back.
> I'm a firm believer in implementing something first and then
> writing a spec based on the learnings; could some of the existing
> implementers please comment on the perceived deficiencies of
> XEP-0198 and XEP-0184?
XEP-0184 in and of itself does not provide "reliability" in the same
sense, and would cause end-to-end redelivery on error, but as JS
comments, it does achieve its core objective of providing simple
receipts to indicate delivery of messages to the client.
With 198 alone, it is unlikely that you'll lose a message. It is,
however, possible - although you need multiple persistent failures to
occur simultaneously. If you want something specified as "reliable
messaging", though, I can offer you an excellent X.400 product.
However, 198 does reduce message loss due to severed TCP connections
to a very low rate, and because it does so hop-by-hop, it restricts
replay (intentional and potentially duplicating) to a hop-by-hop as
184, on the other hand, is good for UI indications, since you don't
"lose" messages with 198 as such, whereas you do with 184 (you don't
resend, you replace). This allows a user to take sensible action,
whereas with 198, the action is - if reconnection can happen at all -
Overall, the specification is very solid, and I can clearly see how
to implement it in all cases.
> Do parts of the spec need to change?
I've made some suggestions for improvements - these were largely
agreed with by Tobias, and mostly editorial in nature.
Dave Cridland - mailto:dave at cridland.net - xmpp:dwd at dave.cridland.net
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade
More information about the Standards