[Standards] Council on Stanza Repeaters without Multicast

Pedro Melo melo at simplicidade.org
Thu Apr 3 07:23:36 UTC 2008


On Apr 3, 2008, at 12:22 AM, Dave Cridland wrote:
> On Wed Apr  2 23:22:12 2008, CvL at mail.symlynX.com wrote:
>> <Dave> I'm not wholly convinced that, in the current design, it will
>>        actually result in "better" network usage.
>> <Dave> The single thing that worries me most on this is simply that
>>        nobody seems to have done any figures on whether this  
>> genuinely saves
>>        bandwidth.
>> I know Fippo's postings aren't the easiest read. He expects  
>> everyone to
>> be as deep in it as he is himself. He has used stpeter's model from
>> http://www.xmpp.org/internet-drafts/draft-saintandre-xmpp-presence- 
>> analysis-03.html
>> found a bug in the formulas, then with the corrected formula  
>> figured out
>> numbers for scenario 5.1 as follows:
>> 	580000000 for the "standard" behaviour of xmpp (also dubbed "rfc")
>> 	251200000 for repeaters XEP
>> 	197200000 for smart presence XEP from 2004
>> For completeness, smart presence is here:
>> http://www.xmpp.org/extensions/inbox/smartpresence.html
>> It basically does the same as repeaters, but focused on presence  
>> only.
> Right, but *NONE OF THESE* figures take into account two things:
>
> A) Compression
>
> It's crucial on two counts:
>
> 1) It's much simpler to implement, and2) Given that we are (or  
> should be) encrypting every S2S connection, then TLS is giving us  
> compression anyway, and moreover, it's cheaper to compress than not  
> to compress.
>
> We've been down the road before where on some admittedly fairly  
> simple figures I did a while back - and repeated for Peter's  
> scalability draft in the SIMPLE WG, I think - the existing presence  
> "splurges" compress really well, often better than the "optimized"  
> alternatives.
>
> Smart Presence was one proposal that beat pure compression, as I  
> recall, but it has security implications which I'm not going to  
> accept.

Bandwidth is not the only concern. If I have a very popular pubsub  
topic (like a Atom node for some high-profile blog or social site)  
with hundreds of thousands of subscribers, the cost of expanding in  
the source is very high, in terms of CPU also.

The mailing list people moved to sub-lists to deal with this.  
Repeaters are just like sub-lists. Please note that I'm talking of  
the *public* pubsub node use-case.


> B) RTT delays
>
> The introduction of Stanza Repeaters gives us RTT delays whenever a  
> list needs changing. That just strikes me as worrying - I don't  
> think anyone has clear figures on how much delay would be  
> introduced, but I'm sure you'll agree that additional latency does  
> not a performance improvement make.
>
> It may be possible to alter the protocol to remove these.

This is important if and only if list changes are the most frequent  
operation.

If sending messages through the repeaters is the most common  
operation, then the lower RTT of distribution that Stanza Repeaters  
give us, will over-shadow the delays with list changing.

Lowering list-changing RTT delays could be done in certain scenarios  
though. I don't know if a general solution will be possible here.

For example, take the use case of public pubsub nodes, like the ones  
we expect to see behind a Atom/RSS feed. We could do "multi-cast"  
delivery of these like this:

  1. a user knows the node he wants to subscribe to and knows that  
this is a public node;
  2. the users subscribes this node, but uses a local pubsub service:  
the users asks the local pubsub to subcribe to the node.
  3. the local pubsub service subscribes to the remote node once. It  
signals that this is a proxy subscription so that the origin can  
decide if we accepts those.
  4. the remote service accepts the proxy subscription. Form now on,  
it sends every notification to the local pubsub service.
  5. the local pubsub acks the user, the subscription is not active,  
and he will broadcast the notifications to his local list of  
subscribers.

The cost of updating this list is exactly the same as a normal  
subscribe in pubsub. A second user will only see steps 1, 2, and 5,  
same as a normal subscription. The local pubsub service might signal  
the remote pubsub service, but thats optional, can be done out-of- 
sync with the user flow, and its only required if the central node  
wants to have knowledge of the number of subscribers globally.


>> This isn't even taking into account what a huge gain repeaters
>> represent when applied to MUCs. large pubsubs also start having
>> a chance to actually scale.
> But you can do both MUC and PubSub way better if you design  
> something specific to them, and I think that's possible to do.

+1.

I would prefer to see a specific protocol for pubsub public nodes,  
for example.

I don't have a clear picture for MUC. The only scenario where I see  
repeaters in MUC as an interesting option is something that MUC  
doesn't even support, auditorium-mode: a small set of active  
participants, and a large number of "ghosts". ghosts are even less  
important than visitors: they receive messages and presence changes  
from members, but their own presence is not broadcasted.

For presence distribution, I like remote fan-out based on a reverse- 
roster better, but it doesn't interact properly with privacy lists.  
Stanza repeaters deal better with privacy lists, because you can  
update the remote list based on the privacy list changes. But I don't  
know if the complexity of all this privacy-list interaction will be  
worth it.

In the end, a generic stanza repeater implementation is something  
useful to have in theory, but in practice all of the services that it  
is trying to replace already have the subscriber list inside, and  
they could leverage that to add multi-cast capabilities.

Best regards,
-- 
Pedro Melo
Blog: http://www.simplicidade.org/notes/
XMPP ID: melo at simplicidade.org
Use XMPP!





More information about the Standards mailing list