[Standards] Use of XEP-0198 resumption under adverse network conditions

Guus der Kinderen guus.der.kinderen at gmail.com
Wed Nov 4 12:30:33 UTC 2020


Hi Dave,

Thanks for sharing this. To verify that I got it wrong, can I dumb your
suggestions down by summarizing them as:

   - Increase the timeout after which a connection is considered
   unrecoverably dead (to ... how many minutes?)
   - After a period of inactivity that's a lot shorter than the timeout
   mentioned above (presumable around the existing timeout value) start
   generating push notifications

Regards,

  Guus

On Wed, 4 Nov 2020 at 12:48, Dave Cridland <dave at cridland.net> wrote:

> Hey all,
>
> We (that is, myself and others from Forward Clinical Ltd, my employer)
> have been doing some extensive work to support high latency networks such
> as Satellite Links, in relation to our work with UK Defence Medical
> Services. Our "long thin" links cover the C2S link.
>
> We believe these findings are more generally useful than just SATCOM - in
> particular, we think these will help with the adverse network conditions
> found in hospitals (where people keep putting in lifts and lots of cables,
> giving lots of blackspots), and general applicability with mobile use of
> XMPP.
>
> TL;DR: When the session has a ping timeout, do push notifications, but
> otherwise leave it open - mobile clients will often recover after several
> minutes have passed.
>
> We assume that established sessions may be in several connectivity states
> from the point of view of the server, typically:
>
> "Live" - a session is genuinely live and can be used for communication.
> "Unresponsive" - the session has a TCP connection associated with it, but
> it unresponsive to pings etc.
> "Resumable" - the session has no TCP session, but 198 resumption was
> negotiated and the session remains available.
>
> We expect that the majority of servers will immediately move a session
> detected as unresponsive into the resumable state by closing the TCP
> session, and starting a (relatively short) timeout.
>
> In the process of doing so, unacknowledged stanzas will be processed for
> push notifications etc as needed, and errors will be sent as appropriate.
>
> Due to network analysis (and "thanks" to a bug in the server which caused
> some useful logging), we were able to examine not only when sessions went
> into the unresponsive state, but also when the client subsequently sent
> traffic on that session. This often happened well after the session had
> fallen into the resumable state - this resulted in an error, as the session
> had been closed.
>
> Having seen the result of this in the logging of the server, we followed
> up by looking for the same logging output on the production system, where
> the majority of users are using WiFi or 4G within hospitals. Coverage is
> often poor, and the WiFi overused, so clinicians often operate on a weak 4G
> signal, or highly contented WiFi. Think FOSDEM.
>
> Again, we observed clients recovering sometimes well after the ping
> timeout had triggered. Had these clients been able to, they could have
> continued to use the same TCP session without any disruption (or, for that
> matter, any additional RTTs re-establishing).
>
> The usual approach here seems to be to increase the timeout required to
> move a session from "live" to "unresponsive" when pinged. However, this has
> the effect of delaying push notifications while the session is, in effect
> in limbo.
>
> Our proposal is that when a session is found to be unresponsive, the
> server starts sending push notifications for unacknowledged (and future)
> messages, but otherwise leaves the session live when resumable. Only after
> a significantly longer timeout should the TCP session be terminated (and at
> that point destroy the session entirely).
>
> This means that a client recovering network after several minutes will
> find the connection still live (in effect), whereas if it never recovers,
> it will still get the push notifications in a timely manner.
>
> There are likely to be downsides with this approach; particularly presence
> state will be badly affected. PSA could help here. Overall, though, we
> believe that this will substantially improve the effective performance of
> C2S over high latency, high contention links.
>
> I hope this is useful!
>
> Dave.
> _______________________________________________
> Standards mailing list
> Info: https://mail.jabber.org/mailman/listinfo/standards
> Unsubscribe: Standards-unsubscribe at xmpp.org
> _______________________________________________
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.jabber.org/pipermail/standards/attachments/20201104/244b7148/attachment.html>


More information about the Standards mailing list