[PubSub] Brussels report

Peter Saint-Andre stpeter at stpeter.im
Mon Apr 20 16:21:00 CDT 2009

Hash: SHA1

On 2/13/09 12:13 PM, Ralph Meijer wrote:
> On 2009-02-12 17:16, Seth Fitzsimmons wrote:
>>> Sometimes that works, and in fact most caching systems operate
>>> somewhere between the two scenarios. Currently, most conceptions of
>>> XMPP PubSub operate firmly in the latter scenario, hence the need for
>>> a conception of code-driven PubSub. Now, I don't think that the spec
>>> needs any changing at all in order to support this behaviour; all you
>>> end up doing is ignoring large swathes of specification. In fact, the
>>> Fire Eagle XMPP PubSub mechanism operates in exactly this way,
>>> interacting with ejabberd as a component, and using only the s2s
>>> functionality of ejabberd. It works well (I think? Seth? ;-)  ), and
>>> is only a few hundred lines of code.
>> Yes, Fire Eagle works this way, and it's very similar to the approach
>> that I've been thinking about for other projects.  One limitation in
>> the spec that I periodically bump into is that collection nodes
>> shouldn't have items associated with them.  The particular use-cases
>> are:
>> - node hierarchies (maybe), i.e. supporting <items/> requests on
>> parent nodes to retrieve all children
>> - making individual resources (in the REST sense) available as nodes
>> (/people/seth where "seth" is actually an item on the /people node).
> First of all, REST does not dictate that the identification of a
> particular resource needs to be done solely in the path component of an
> HTTP URI. This follows naturally from RFC 3986 (and its predecessors)
> describing URI generic syntax. I'll start with some background to base
> my response on, if only for readers not so entrenched in this stuff.
> In XMPP the identification of resources is essentially threefold: the
> JID of the service, the identifier of the node and the identifier of an
> item. Those can be combined into an XMPP URI to fully denote the
> resource in one identifier, although currently there is no XMPP URI
> query parameter defined for item identifiers.

I've added that:


The question is whther something like this is acceptable, too:


> Whereas the typical uses of HTTP URIs usually nicely maps on filesystem
> layouts with files and directories, where directories can contain both
> directories and files, XMPP publish subscribe nodes and items are just
> slightly different. If you would want to map those on filesystem
> semantics, items would be files and nodes would be directories. Leaf
> nodes only contain items ('files'), and collections can only contain
> other nodes ('directories').
> I don't think it is particularly needed to always map all resources to
> nodes. This totally depends on the 'subscribable unit' [1].
>> Semantics around collection nodes get weird; I publish to /people
>> with id=seth, I retrieve with an <items/> request on /people/seth
>> (since I can't retrieve a specific id from /people), I can either
>> update /people/seth OR re-publish with id=seth to /people, and I can
>> retract from /people with id=seth OR retract from /people/seth OR
>> purge /people/seth.
> If you want to allow entities to subscribe to updates on only particular
> individual resources (say 'seth'), you should provide them as a node.
> That this node then will only ever have one item in it, is not a
> problem. We have several uses of this in extended presence publishing
> via PEP, where you only ever publish the current mood, current location,
> etc.
> We use this model at Mediamatic Lab for sharing 'things' between anyMeta
>  sites in the Open-CI network [2]. Every thing (person, article,
> organization, event) has its own node where updates are published to as
> Atom entry documents. We currently use opaque node identifiers, but are
> likely moving to a more integrated solutions in the near future.
>> I have a rough XMPP-rack bridge here that demonstrates how PubSub
>> semantics are mapped to resources (the controller at the end of it is
>> a straightforward Rails resource controller (mars; usage notes are
>> there)):
>> http://github.com/mojodna/dovetail/blob/4ee9fcf930193c4dadd506005facec98d73b8ca8/lib/dovetail.rb
>> http://github.com/mojodna/mars/tree/master
> Interesting. I have to take a closer look, though.
>> Strictly speaking, /people should be a collection node and shouldn't
>> have items / be manipulable, but I can't get around this sufficiently.
> The use cases you present are probably covered by the following series
> of questions:
>  1. What are the things an end-user is going to subscribe to?
>  2. How do I represent feeds of updates to multiple things?
>  3. How do I retrieve items from a feed?
>  4. How do I publish and/or retract things?
> In what I've described above about what we do at Mediamatic Lab, or the
> things I wrote about node-as-code, I have not really addressed the
> publishing side of things. In that case, questions #1 and #2 become
> relatively easy. Objects just magically appear inside your backend.
> Every time a new (or updated) object appears in the backend, you can
> just send out notifications to subscribers of feeds like 'things by
> ralphm', 'things by all people', 'things by ralphm and his friends'.
> Each of those feeds would be a leaf node, with the item being
> 'published' to each of those.
> Retrieving items (#3) in this case is easy. As each node is a leaf node,
> you can just do a regular <items/> request.
> The advantage of this approach is that you don't need to explicitly keep
> the configuration of the hierarchy. In your example, new persons would
> just automatically be notified from the 'people' feed.
> On the other hand, when #4, publishing and/or retracting things, comes
> into the picture, things may become tricky. In this case, you likely
> want exactly one node to publish to, whereas the nodes-as-code would
> essentially be read only.
> I can see why collections might be attractive in this scenario.
> Collections are read only, and notifications will always have the node
> identifier where the item is 'home' and was published to. Like leaf
> nodes, the configuration of collection nodes might also be computed.
> Unfortunately, with how the protocol is defined at this moment, this
> doesn't play nice with #3, as you suggested.
> Basically, <items/> requests are not defined for collection nodes. I.e.
> the return format is not capable of conveying the node identifier of the
> original descendant node of the items it would return. Ordering might
> also be interesting, but that is an application level issue. As Blaine
> mentioned to me the other day, pubsub without the ability to retrieve
> items after the fact, can be very painful. I see two solutions here:
>  1. Add an optional 'node' attribute to the <item/> element in the
> http://jabber.org/protocols/pubsub namespace to convey the node this
> item was actually published to. I would define the semantics such that
> it overrides the 'node' attribute of the containing <items/> attribute.

This is in the http://jabber.org/protocols/pubsub namespace or the
http://jabber.org/protocols/pubsub#event namespace? Why would you need
this when publishing?

>  2. Don't return the results in the response, but instead send them
> asynchronously.
> I tend towards the first solution.
> Looking at the schema for the <item/> element in the
> http://jabber.org/protocols/pubsub#event namespace, I note that there is
> a 'node' attribute there. 

As far as I can see, this is not used currently.

> This is likely  a remnant of earlier version
> of the specification, since we currently use SHIM headers to communicate
> the node that was subscribed to. The reasoning behind that was that a
> particular notification can be the result of multiple subscriptions, to
> the same node (in which case you have multiple 'SubID' headers) and/or
> to different collection nodes (in which case you have multiple
> 'Collection' headers?). The latter is not explicitly defined in
> XEP-0248. How you should communicate which SubID belongs to which
> Collection, if both come in multiples is also not defined.

Yeah, that's messy. Well, all of XEP-0248 is messy.

>>  XEP 248 is much more complicated (and limiting) than I need for
>> node-as-code.
>> The implementation specifics of the black box should be hidden from
>> consumers and can either be exposed to publishers (for traditional
>> PubSub-style manipulation) or be embedded in application code, in
>> which case there's no benefit in exposing them.
>> Ignoring swaths of specs is one thing (i.e. not supporting them), but
>> implementing a service in such a way that behavior is contradictory is
>> another, so my preference would be for the spec to be more lenient to
>> support the whole node-as-code usecase.
> I understand this sentiment, but I'm not sure how to address this, yet.
> XEP-0248 needs a long hard look anyway, something Joe Hildebrand and I
> concluded at XMPP Summit #5. If you have any more notes on this,
> starting a new thread might be a good start.

Yes, I will look at XEP-0248 after I finish all the core spec fixes.

>> The component implementation is actually a giant pile of pain (this is
>> not PubSub-specific, but still relevant).  I'd rather *not* have to
>> deal with discovery, ping, roster handling (beyond setting a default
>> policy), or presence tracking by default.  There are cases where I may
>> decide that I need to, but I would much prefer if I had the option to,
>> rather than the requirement.  I think it also results in component
>> implementations of varying quality, because each developer needs to
>> almost entirely reinvent the wheel.  (That said, *having* a component
>> option is wonderful, but I long for a half-way solution).
> Implementing a generic publish-subscribe service by itself does not
> necessitate roster handling and presence tracking, unless you want to
> have presence based subscriptions or nodes-tied-to-user-accounts (like
> PEP) with its associated access models based on rosters. Idavoll is an
> example of this.
> I don't know why you would need to implement ping.
> The amount of Service Discovery that you MUST implement for XEP-0060 is
> limited to announcing the features of the service in disco#info. You
> don't necessarily have to return any results to disco#items requests.
> I'm not sure what you are getting at here, for components in general,
> though. Or do you mean a specific implementation?
>> I've been thinking about this recently and would like to see something
>> along the lines of a CGI, Rack, WSGI, or Servlet specification for
>> PubSub-specific functionality.  Somewhat along the lines of what
>> Julien Genestoux and Stephan Maka have done with Babylon
>> (http://github.com/julien51/babylon), but built into servers and with
>> a clearly defined, language agnostic interface.
> Yeah, I've more ideas along those lines around the Summit and am very
> interested in participating in such an effort.
>> It would involve routing/mapping of PubSub nodes (patterns, really) to
>> *something* (I'm leaning towards either queues or URIs (*not* limited
>> to HTTP), with different schemes supported by different servers).  The
>> mapping would either be static (application configuration) or dynamic
>> (processors register patterns that they know how to process).  The
>> payload (and response) would probably be XML or JSON; I think it needs
>> to contain more context (such as the known presence for the requesting
>> JID) than just the <pubsub/> stanza, but I haven't given this a whole
>> lot of thought yet.
> Indeed, something like this would be nice. I don't think it would be
> very hard to implement the 'server side' in Twisted, to be added to
> Idavoll. The 'client side' could be very much like a WSGI client.
>> If I have some time (ha!), I'd like to talk a bit more with Julien and
>> try to hack out a prototype to figure out what the issues are using
>> Wokkel or Tigase.
> Let me know when you do. A groupchat might be an idea?

I think it would be very productive to have a groupchat about pubsub
once we have a proposed spec to settle on. I'm just about done with my
first round of revisions, although I want to read the whole thing over
now so that I can figure out what else needs to be fixed, what parts we
can move to subsidiary specs, etc.

> ralphm
> PS. If you got all the way here, I congratulate you. It took me most of
> the day to write this.

And it took me 2 months to get around to reading it. :P


- --
Peter Saint-Andre

Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


More information about the PubSub mailing list