[Standards] synchronized rosters (was: Re: Proposed XMPP Extension: Stanza Repeaters)
melo at simplicidade.org
Wed Mar 19 22:13:41 UTC 2008
On Mar 19, 2008, at 8:07 PM, Philipp Hancke wrote:
> Pedro Melo wrote:
>> On Mar 18, 2008, at 3:31 PM, Philipp Hancke wrote:
>>> Peter Saint-Andre wrote:
>>>>> The roster at the remote host could be used as the distribution
>>>>> list, no
>>>>> need for costly updates. Just reverse-query the roster if your
>>>>> DB system
>>>>> provides that (easy for SQL-based rosters, not as simple for
>>>> Rosters can get out of sync. The roster at the remote host is not
>>>> necessarily to be trusted.
>>> How is keeping the roster in sync more difficult than keeping the
>>> list at the repeater in sync?
>> And to keep the roster in sync, there is already a nice <presence
>> type="probe">. The replies of those can include remote subscription
> > status that we can compare to our local information to detect
> > roster-out-of-sync.
> Do you send a single probe to the remote domain and the remote
> domain replies with presence and subscription status for each
Yes. I though a bit about this last year.
Each domain would announce support for this protocol by including a
<rostersync xmlns='roster_sync_namespace' /> child in the <features />.
In that case the source domain (saramago.lit) could send a single
probe to the remote domain (pessoa.lit), including a child stanza
requesting roster sync:
<presence type='probe' to='pessoa.lit' from='jangada at saramago.lit'>
<rostersync xmlns='roster_sync_namespace' />
The remote server would use the reverse roster query do discover all
the contacts that have the source JID in their roster, and for each
one, it would send back the last presence, as per spec. This protocol
would add an extra child, <rostersync> with information of the roster
item status. The only attribute included would be the `subscription`
attribute. I don't have a final opinion about including the 'ask'
With this information, the source domain can decide if the roster is
out-of-sync and take proper measures to fix it.
There are a couple of things that I'm not happy with this.
First, the use of stream features to announce support for this.
Although it looks good on paper, by allowing two domain to negotiate
support for this, servers usually don't work this way. If the S2S
connection to the server is not available, most servers would send
the probe, and that would trigger the S2S connection to open. So an
alternative to this, is to just send the normal <presence
type="probe"> flood and include the <rostersync> child, without
knowing if the remote site supports it or not. Not pretty either.
Second, by using the remote roster, we are not finding out out-of-
sync in which the roster of fernando at pessoa.lit has a entry for
jangada at saramago.lit, but no such entry exists in the
jangada at saramago.lit. With the above protocol, the reverse query at
the remote server would not find the jangada at samago.lit entry, so no
result would be sent back. Sending each probe individually would
solve this, but in that case there is no case to run the reverse
roster query at saramago.lit.
In the end I think that the second problem should be ignored. That
type of out-of-sync would be taken care of by the probe of
jangada at saramago.lit to pessoa.lit...
The thing that makes this protocol work is the reverse roster query,
so as long you keep doing that, the rest should flow naturally.
> What I dislike about this is that the overhead grows with number of
> subscribers. For calculations let's assume 30 (?) additional bytes in
> each probe reply, I5 in the presence scaling analysis.
> The benefit is that you precisely know which elements are desync and
> resync should be quite cheap therefore.
Yes, the probe is bigger. But you are sending the extra
subscribe='state' inside the last presence of the user. So it might
not be that much difference in size (in percent). Also, you don't
send multiple probes, just one, so you send a lot less, and receive a
bit more. I would bet that in the end it would level out in terms of
Also, real world stats about stanza distribution don't measure the
difference between first presence (the one you get because of the
probe) and the rest.
An increases on the first presence only would be noticeable?
Please notice also that this increase is only between servers. The
local server would strip the presence of such information before
forwarding it to the client.
> If the sender attaches a hash (generated like in the repeater
> calculated each time that part of your roster affecting the remote
> server changes) to each 'presence broadcast' this is independent from
> the number of subscribers on the remote domain. This scales badly
> if the
> number of presence changes is high.
> Let's assume 60 additional bytes (sha1+xml stuff) overhead in C8.
> The cost of resyncing is considerably higher, as you don't know which
> elements have changed.
Sorry, I don't follow here. If the sender attaches an hash? what
hash? sorry if I missed something.
> What is making the comparison difficult: how often does a desync
That is part of my problem. I don't know if anybody has any numbers
about the quality of rosters in the real world.
I know that we have out-of-sync rosters. We have it at SAPO even
between @sapo.pt accounts (!!), probably due to bugs, or crashes at
the wrong time... But between servers is very difficult to know what
the status is, because there is no way to check for this.
Maybe someone could work with a couple of big domains and cross
hashed rosters entries to get some real numbers on this.
I know it happens, because it has happened to me. The usual symptom
is that some contact can see you online, but you cannot see him (or
vice-versa), and both have accepted the invitation.
> I've never seen a desync on our chatroom and their equally distributed
> memberlists, so I assume it happens rarely. Once a week? Once a month?
chatroom? I was talking about presence subscriptions and rosters, not
I don't run/used distributed chatrooms.
XMPP ID: melo at simplicidade.org
More information about the Standards