[Standards-JIG] proto-JEP: Smart Presence Distribution

Carlo v. Loesch CvL at mail.symlynX.com
Wed May 31 19:24:53 UTC 2006


Richard Dobson typeth:
| Yes but we also cannot have protocols that assume something that is not 
| true, i.e. that the network is reliable when it isn't, this is just 
| something that you are going to have to accept.

You either have reliability, or an error situation.

What I suspect is that Jabber is having an error situation
each time an idle connection is closed. This obviously makes
it impossible to operate along traditional network application
design lines.

Luckily the Wildfires and I have found a solution to that problem,
so if we fix all the servers, then the proto-JEP can continue
operating in a sane network protocol design fashion.

| > As long as there wasn't an error in transmissions, there is no reason to
| > presume any data is lost. It's like presuming the sun could rise from the
| > west tomorrow. 
| The reverse is also true, if you don't know if data is lost or not you 
| cannot presume that it was received either, when designing network 

This logic is adverse to the definition of TCP. TCP doesn't guarantee
you, that everything will always be fine, but it will always tell you
that something went wrong, unless of course you don't know how to close
a socket properly. Then it's your own mistake, but the TCP spec surely
told you that. Apparently several Jabber servers aren't using TCP according
to its spec.

| As many people stated in response to your previous proposal the roster 
| is certainly something that can get out of sync for various reasons, 

The primary reason being that the receiving server slashed down on your
socket while you were sending your presence. As I found out, jabber.com
and formerly jabber.org killed connections after only a few minutes -
which makes loss of messages and presence etc. very likely. So if we
fix these implementations soon, Jabber will discover true TCP reliability.

| even if you manage to detect that there was a problem and subsequently 
| reset the list, don't you think it would be better to have a protocol 
| that can recover from errors just like TCP can (i.e. retransmits the 
| lost packets) without having to start again from scratch?

No, because TCP already does that part for me. I have to deal with the
case when TCP fails, and it's not a solution to put another TCP on top.

In the case of Jabber it may instead be a solution, should the developers
of those faulty 4 implementations decide to keep it that way. Since they
make XMPP unreliable, you have to re-invent TCP to obtain reliability.  ;-)

| > A disconnection doesn't cause a loss of sync if it was properly terminated.
| > You have a </stream:stream> for that. The example 4.8 in RFC3920 shows
| > how the first server closes the stream and waits for the other side to
| > terminate the TCP link. So a clean shutdown of the connection doesn't
| > cause any loss. It's just how every sane protocol is designed.
| >   
| OK, good this is the sort of explanation we need, yes that helps, but 
| from looking at my S2S logs not many servers seem to do this when 
| disconnecting their S2S connections, which granted is probably a bug, 

Exactly.

| but even so something should be in place to account for this so that you 
| can detect that something has been lost, you cannot rely on every point 
| in the network doing the right thing and being bug free. What if there 
| is a bug in a server that is causing stanza's to get randomly dropped? 

Then you get an error back, that something went wrong.
At least you get to send back the queue of outgoing things, which
makes it likely that involved people will find out something went wrong.

A protocol has to design ways to handle transmission errors. In the case
of roster management, this is certainly not so trivial. In the case of
our context list, it is.

| You really need something that works reliably even in those kind of 
| situations, i.e. can detect if the list has gotten out of sync.

You cannot generalize that every application needs a TCP on top of TCP
to be safer. And certainly not a light-weight temporary context list.

| Its not reinventing TCP as TCP only guarantees the ordering the data 
| will be in once it reaches the other site, it doesn't guarantee the 
| delivery of everything sent to the socket, and plus when dealing with 

It does, or it returns an error.

| this kind of thing it goes outside of the boundary of the TCP connection 
| and into the server and you have no control over what might happen 
| there, this is how the jep-ack and related proposals work, they dont 
| just rely on TCP as that's the whole route of the problem in the first 
| place.

Don't put the blame on TCP. It is your style of slashing down sockets
which is causing most of the reliability problems of Jabber.
It is a fundamental design failure that XMPP doesn't clearly suggest a
sane way how to close a socket.

I have checked both source code and factual behaviour of jabberd 1.4,
jabberd 2.0 and ejabberd. All of them kill idle sockets with or without
warning, but certainly without letting the other side finish its sentence.
Luckily only in the case of ejabberd this also happens on the incoming
stream, where it has those dramatic effects. And in 3 proprietary
implementations, which shows that open source rulez, or at least is
more likely to.

| Yes I suppose we could design a UDP protocol, it would mean that 
| stanza's would be far less likely to get lost than they are at the 
| moment over TCP as you would need to build an acknowledgement mechanism 
| in and you would have far more control over it than over TCP so it would 
| be more reliable, bit more complex though.

Yes, but more reliable than using TCP in an unreliable way.  :-)

| > there is space for a hash in the to= field
| >   
| Yes, and... How would you suggest this work, again can you please 
| explain yourself, these sort of responses aren't very condusive to 
| understanding what you are trying to accomplish.

Michal made that suggestion, I would like to use his idea.

| that? And what's wrong with having something de-coupled if you can? 
| Please explain.

Yeah sure go ahead, de-couple one-to-many routing from the server core.
I just doubt you can, or it will be useful that way, but that's just me.
Only IM developers think one-to-many messaging can be an add-on feature.
Anyway, this is of far lesser relevance than the TCP handling bug.

| That's not a very good solution as it is potentially against the RFC as 
| there is the possibility that in between a presence broadcast could 
| sneak its way through before you deleted the directed presences, they 
| need to not get there in the first place.

Oh, you mean when writing the two stanzas in one write() operation to
the socket, there is a realistic chance something else may come between.
I can see you are really familiar with TCP technology.  :-)

| Not you cant use the fact that the presences are going to a resource or 
| a bare JID to determine if something is directed presence, that would 
| break the specification as I haven't seen any restrictions of that sort 
| in there meaning directed presence can be to a base JID or a resource.

Ok, so it has to be something else. Now that with the TCP fix rosters
are going to be a lot more reliable, we can use the recipients roster
to figure out if he's only getting directed presence. I mean, it's a
borderline case that someone would hack his roster in order to receive
follow-up presence on a server with other people getting it *and* talked
the sender into sending him directed presence first.

| Only the addressing information, that's very different from re-writing 
| the content of a message potentially changing what it means once it 
| reaches the other side

I could understand that argumentation if you were using a protocol capable
of framing. Then you would route the message contents without looking at
it. But you are routing XML, and you are forced to parse every packet
anyway. Looking into the DOM at that point is trivial, there is no efficiency
gain in forbidding applications from doing that. PSYC is a framing capable
protocol, so PSYC routers indeed aren't permitted to change the content of
the packet, because they aren't even parsing it, but hey, that's just PSYC.
In Jabber this kind of logic is futile.

| I'm sorry but you are guilty of exactly the same behaviour, lets just as 
| i've done here further explain our points, yes? Rather than simply 
| stating people are wrong as you have been in some of your most recent 
| emails, if you think they are at least try to explain why.

I can try, but it surely isn't motivating if people simply haven't read
the proto-JEP where the stuff is explained. At least some statements made
that impression. And no, I did not attack Mridul. I patiently answered
his questions, then suggested we should have a virtual beer.

| enough that you can honestly and truly say the jabber network now has 
| reliable delivery at every point as it requires every point in the 
| network to have implemented this enhancement. So until then whatever 

Yes, any routing improvement for Jabber will have to wait until
this absurd TCP bug is fixed. This makes all of our discussion on
reliability irrelevant, as there is no data or experience on how Jabber
operates once the bug is fixed.

Luckily our proto-JEP requires negotiation, so we can add another
requirement, that the TCP handling needs to be fixed before this
JEP can be used. Simple as that.   ;)

-- 
» Carlo v. Loesch » http://symlynX.com » psyc://ve.symlynX.com/~lynX
	    xmpp:lynX at ve.symlynX.com » irc://ve.symlynX.com/#symlynX
        CryptoChat » https://ve.symlynX.com:34443/LynX/?room=symlynX



More information about the Standards mailing list