[standards-jig] Pub/Sub Explanation

DJ Adams dj.adams at pobox.com
Tue Apr 16 13:24:47 UTC 2002


On Mon, Apr 15, 2002 at 10:20:52PM -0600, Dave Smith wrote:
> I take personal responsibility for not being more forthcoming with details.
> I could blame it on any number of circumstances, but the fact of the matter
> is that I haven't taken the time and focus to layout the details of what I

Diz, dude, that's ok. The main thing is that we're discussing it. So 
thanks :). I'm also glad you took up my suggestion of continuing the
discussions here instead of offline (not only was it difficult to get
people from three continents toegther for a realtime chat, I also thought
it was better that we didn't exclude the rest of the list from this
discussion).

> that need to be addressed. In other words, the various incarnations of the
> pub/sub proposals all have slightly different ideas about what "pub/sub"
>[...]
> Secondly, let's be clear that it's completely possible that what JINC (or
> any other developer, company, community, etc) needs in a pub/sub system may
> be different enough to warrant multiple protocols. That's not bad -- it just
> means that different parties need different functionality. As a standards-
> based community we will not have failed if there _does_ turn out to be

This is an important point. We do need to see if we can bring together the
efforts amongst JEP 21, 24 and this new one, but at the same time, it would
be counterproductive in the long run to force them together despite any
fundamental differences. The Jabber world lives with different protocols
already - for example, the conferencing protocol (groupchat, conference,
and JINC's own). Diversity is sometimes as important as conformity.

> Before we can actively discuss use cases for this sub-system, we'll also
> need to agree on some basic terminology:
> 
> Topic -  A focal object which relates Jabber entities which provide
> 
> Category - An organizational unit which topics are associated with. The

Well, (we've discussed this already, but offline), both Piers and I 
don't see the need for such a distinction. A pub/sub system needs a way
of identifying the 'subject' (or topic) of subscription or publication.
The means of identification needs to be as flexible as possible, to allow
publisher and subscriber to independently come to a 'contract' where what
is to be published and subscribed to is described in that subject. Use of
a namespace (remember, a namespace can be *anything*, not just a colon-
separated list of keywords, or a URL) to specify the subject gives ultimate
flexibility. 

Secondly, (again, as discussed offline), there are implications (shown
below in your use cases, and in the example XML you've shown us) of 'topic 
management' by the pubsub component, that are linked to these two 
things 'topic' and 'category'. Pub/sub should be pub/sub, and topic
management should be done outside of the protocol (for example, having
the (potential) subscriber browse the publisher would enable the publisher
to autonomously manage his own subjects and arrange them into whatever
hierarchy (or otherwise) he desired. In my opinion, topic management 
does not belong in a pub/sub component. Indeed, to require administration
at the pubsub component to manage topic creation and those mechanisms
that you're proposing (again, discussed offline in part), does not bode
well for smooth operation. 

Furthermore, (again, as discussed briefly offline), the use of category 
implies a certain (time-based) management of published data stored by
a pubsub component, which I'm against - see below. So would there be 
need of a 'category' if the data storage was managed elsewhere?

> (Topic) Item - An instance of data stored within a topic. Each item may have
> a unique identifier which allows the item to be keyed for updates.

Well, I guess this is the lowest common denominator :-) This is 
the payload, which is fairly simple in concept. That said, a
publisher might want to send a <publish> packet in a certain namespace,
without actually sending any content (any payload). This would represent
an update 'ping'. I'm using this technique in the Jabber pubsub mechanism
which is part of the Radio Bridge in the Radio Userland / Jabber 
project right now (http://www.pipetree.com/testwiki/RadioJabber).

And yes, each item may have a unique identifier, but that should not 
be a concern of the pubsub protocol - the identifier belongs with the
payload itself, not with the packets of the surrounding pubsub framework.

> Durability - Lifetime of a topic, subscription, or item. Something is said
> to be "durable" if has a relatively long lifetime (multiple Jabber sessions)
> and is typically hard-stored. Along these lines, something which exists for

A piece of data published may have a TTL associated with it. Again,
this is independent of the pubsub protocol, as the TTL is related to 
the payload, not the pubsub framework. A subscriber, having receiving
a pushed payload will most likely want to manage that payload as an
independent piece of data - free of the no-longer-necessary shackles 
of pubsub packet wrapping. Here's an example of how that would work:

<iq type='set' ...>
  <query xmlns='pubsub'>
    <publish ns='subject'>
      <thepayload>
        ...
        <x xmlns='jabber:x:expire' seconds='10000'/>
      </thepayload>
    </publish>
  </query>
</iq>
 

> UC1 - Manage a Topic (create, delete)
> UC2 - Publish Items to a Topic
> UC3 - Get Published Items for a Topic (retrieve the last 100 items
> published)
> UC4 - Get Topic Information (subscriber list, durability info, header
> name/value pairs)
> UC5 - Discover Topics (find topics that a JID is subscribed to, etc.)
> UC6 - Subscribe to Topic (includes unsubscribe)

As I've already mentioned, I don't see topic management belonging to 
pubsub. It's independent. That therefore discounts use cases 1, 4, and
5. 

> One of the fundamental concepts here is that the pub/sub service is capable
> of storing items which are published. This is a weighty requirement -- it
> implies some sort of data storage. It is my belief, however, that this

This is one of the areas that I (and Piers) feel most strongly about - 
that pubsub is not data storage and should not be. Pubsub is about the
efficient coordination of data delivery according to pre-arranged 
receipt requirements. It is not about being a data warehouse, or a general
data storage mechanism. A combination of mechanisms, data storage and
retrieval (in the same service sense as mod_xml, for example), plus
pubsub for conveying change notifications, would be one way of allowing
publishers and subscribers to manage historic data and subscriptions to
that data. But pubsub mechanisms shouldn't be responsible for holding
that data and answering retrieval queries. Indeed, the description of use
case 3, "Get Published Items for a Topic", says nothing (to me) about
what pubsub -- publish (push) and subscribe (receive) -- is about. What
use case 3 says to me is data retrieval. 

Anyway, I'll wrap up this email now - the longer emails are, the less 
likely people are to read them :-)

FWIW, I've been using mechanisms based on JEP 24 already in various areas
recently (Radio Userland, Peerkat publishing, and some internal projects)
and it's working very well. The protocol was very easy to implement, and
got the job done. 

dj



More information about the Standards mailing list