[Standards] Reliable message delivery (XEP-0198 and XEP-0184)

Dave Cridland dave at cridland.net
Wed Oct 27 15:38:36 UTC 2010


On Wed Oct 27 15:38:40 2010, Simon Tennant (buddycloud) wrote:
> I'm aware that Prosody has an implementation of this and that  
> M-Link has also been working on XEP-0198 support.
> 
> 
Matt and I have even done some experiments with XEP-0198 over S2S  
sessions, too. Worth observing that in this case, a severed S2S would  
cause the sending server to retransmit the "lost" stanzas cleanly,  
without the client even being aware of the issue.


> Is there a way that we can avoid multiple implementations by  
> addressing the Mickaël's issues  
> https://support.process-one.net/browse/EJAB-532 (message replay,  
> mixing of concepts)?
> 
> 
I don't think that 198 does mix concepts, from a technical standpoint.

In particular, it only provides stanza reliability as a by-product of  
providing stream resilience; it was designed to provide stream  
resumption in the face of severed TCP sessions (such as those due to  
mobile gateways and home-hub NATs). As I recall, much of the effort  
went into avoiding replays, and acks essentially fall out of that.

In order to provide this single feature, both ends need to have a  
clear understanding of the precise point the stream severed.

In order to do this in turn, both ends maintain a counter of how many  
stanzas they have sent and received, and communicate this on  
resumption so as to (deliberately) replay the lost stanzas.

This replay requires that the sender store every stanza transmitted  
in order - to reduce this storage, 198 include in-stream acking to  
allow a sender to discard stanzas from this store.

If there are additional features conflated in, it's the throttling,  
but I'm not clear anyone's actually implementing that.

Now, replay can indeed occur with XEP-0198 - if a stream is dropped,  
then any unacked stanzas (really, the portion of the stream that is  
unacked), are replayed - the problem is that if a "proper" resumption  
fails for whatever reason, then the sender has to assume for safety  
that the unacked stanzas were not received the first time, and  
therefore resends unilaterally, potentially causing duplicates.  
Effectively, therefore, 198 replaces loss with duplication; the  
assumption is that duplication is better than loss in the general  
case.

That's it - there is nothing more to 198 than that.

There are some key things here, though - receiving an ack tells you  
nothing about whether the recipient will receive it. It certainly  
doesn't tell you if they have. But then, it's not meant to - if you  
want to know that for certain, use XEP-0184.

Secondly, the acks (and negotiation) are top-level elements, not part  
of stanzas, because they are properties and messages of the stream  
itself, and not routable content. Because this doesn't deal with  
anything beyond the hop-by-hop case, there's no need to discuss  
intermediate entities - there are none.

Thirdly, the acks really are precisely as fine-grained as needs be -  
receivers can only ack to a checkpoint on the stream, and that  
checkpoint is measured in stanzas. The protocol would either not  
work, or be hopelessly complex, if we allowed out-of-order acks.

Luckily, XEP-0184 provides these for messages, and moreover in  
conjunction with XEP-0198 both the message and the receipt are very  
likely to get there and back.


> I'm a firm believer in implementing something first and then  
> writing a spec based on the learnings; could some of the existing  
> implementers please comment on the perceived deficiencies of  
> XEP-0198 and XEP-0184?
> 
> 
XEP-0184 in and of itself does not provide "reliability" in the same  
sense, and would cause end-to-end redelivery on error, but as JS  
comments, it does achieve its core objective of providing simple  
receipts to indicate delivery of messages to the client.

With 198 alone, it is unlikely that you'll lose a message. It is,  
however, possible - although you need multiple persistent failures to  
occur simultaneously. If you want something specified as "reliable  
messaging", though, I can offer you an excellent X.400 product.

However, 198 does reduce message loss due to severed TCP connections  
to a very low rate, and because it does so hop-by-hop, it restricts  
replay (intentional and potentially duplicating) to a hop-by-hop as  
well.

184, on the other hand, is good for UI indications, since you don't  
"lose" messages with 198 as such, whereas you do with 184 (you don't  
resend, you replace). This allows a user to take sensible action,  
whereas with 198, the action is - if reconnection can happen at all -  
all automatic.

Overall, the specification is very solid, and I can clearly see how  
to implement it in all cases.


> Do parts of the spec need to change?
> 
> 
I've made some suggestions for improvements - these were largely  
agreed with by Tobias, and mostly editorial in nature.

Dave.
-- 
Dave Cridland - mailto:dave at cridland.net - xmpp:dwd at dave.cridland.net
  - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
  - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade



More information about the Standards mailing list