[Standards] s2s and gracelessly broken streams
mridul at sun.com
Wed Mar 28 04:44:26 UTC 2007
Justin Karneges wrote:
> On Tuesday 27 March 2007 7:57 pm, Peter Saint-Andre wrote:
>> Mridul wrote:
>>> Is there any best practices on how to handle broken s2s connection ?
>>> Two primary cases come to mind :
>>> 1) Remote server went mia due to some transient network/other jitter and
>>> recovers after "some" time (assume enough delay to cause tcp close) : so
>>> contacts are still available.
>>> 2) Remote server went down and came back up after "some" time : so
>>> contacts are no longer available.
>>> (Another condition could be server closing the inbound/outbound socket
>>> after some delay.)
>>> In these cases, assume that unavailable presence was not sent.
>>> How is contact's presence expected to be treated on the user's server ?
>> Good question. I'd be curious to hear how current implementations handle
>> this. The options are not exactly appealing. :)
> It isn't clear from Mridul's question what exactly a broken s2s connection is.
I am refering to broken socket connection.
> Remember that s2s connections can come and go at any time, and unlike c2s
> connections, there is nothing implied when the connection goes away (such as
> any sort of presence unavailability notification). So, of his 1st and 2nd
> conditions, as well as the unnumbered parenthetical condition, none are error
> situations. In fact, closing s2s connections is a good strategy to prevent
> stanza loss (there's less likelyhood of tossing a stanza into a dead
> connection if all of your connections are relatively fresh).
Yes, which makes the whole problem slightly more complicated : there is
nothing implied about broken s2s connections. Hence nothing other than
the worst can be assumed about the presence health of the remote
contacts by the user's server.
> This does mean that if a server disappears from the internet, all of its sent
> presences will "stick" until the receivers decide to flush them away. For
> example, if you have a ghost presence of someone on your roster, a relogin
> will usually fix it (because your server then assumes everyone is unavailable
> again, and starts probing). However, if you never sign off, then depending
> on your client/server implementation and the lifespan of the earth, it is
> possible that the ghost contact could remain available for all eternity.
> This is also the cause of MUC ghosting (which can seemingly last forever).
> The best solution, as far as I can tell, would be for the server to perform
> presence probes on a periodic basis, rather than only on client login. IMO,
> it is rather silly that the client user can "force" the effect by doing a log
> off/on. I think it would be a lot better if the server handled this
> transparently, by probing routinely, and then the client should always "just
Presence traffic (especially over s2s) is already a bit high and
periodically doing this would just add to it significantly.
Especially if the remote contact's are still around and the remote
server broke connection just for sake of resource economy.
JD Conley had an interesting idea of 'probe all' being sent from one
server to another which looked quite interesting - coupled with ideas
from 198, it might be possible to do this in a more efficient manner
without too much overhead (except when it is really needed - the out of
> I'd also go as far as saying that we should decouple probing from being
> explicitly related to the login process. For example, if a client were to
> log off/on within a very short timeframe, I'd say that the server shouldn't
> probe the whole roster again.
Currently, not doing so would be breaking a bunch of (implied) MUST
requirements from xmpp right now.
Or you mean we can adapt this into bis spec ?
> I think we have the necessary protocol in place to solve this problem, it is
> just a matter of having well-implemented servers.
Yes, it is do-able right now by using probe : just that it tends to be a
Even that is ok as long as there is a generally accepted consensus of
how to go about addressing this problem - so that user's presence and
contact's presence are reflected to reasonable accuracy - even over s2s.
More information about the Standards