[Standards-JIG] JEP-0060: Adjustments for content-based subscriptions

Ralph Meijer jabber.org at ralphm.ik.nu
Sun Jun 13 07:35:01 UTC 2004

On Sat, Jun 12, 2004 at 07:29:19PM -0400, Bob Wyman wrote:
> 	I'm really having trouble squeezing the content-based pubsub service
> that we provide at PubSub.com into the confines of JEP-0060.
> 	The problem is that JEP-0060 was clearly defined with "topic-based"
> pubsub in mind. i.e. There are nodes (topics) to which people can publish
> and subscribe and the expectation is that subscribers will get copies of
> everything that is published to the node.
> 	At PubSub.com, we implement a "content-based" service which assumes
> that clients *do not* want to receive everything that is published to any
> particular node. What you do in a content-based system is subscribe to a
> node/topic by defining a filter or selection-query that identifies what
> subset of published content that you are interested in. For some topics, it
> is likely that you will create multiple subscriptions -- each selecting a
> different sub-set of the messages being published via the topic/node. 
> 	For instance, we have a single node through which we publish a few
> million messages every day. The messages are extracted from over 2 million
> web logs and 50,000 NNTP newsgroups. Our typical user today has five or six
> different subscriptions which are all selecting different subsets of the
> messages that are published to the node. The following is a set of what
> might be "typical" subscriptions for any one user. All of these
> subscriptions are "filters" on a single topic or node.
> 	1. "Bob Wyman" OR "Robert Wyman"
> 	2. SOURCE:bobwyman.pubsub.com
> 	3. (pubsub OR "publish/subscribe" OR "pub/sub" OR "publish and
> subscribe")
> 	4. URI:pubsub.com
> 	5. URI:bobwyman.pubsub.com
> 	Publishing our data over JEP-0060 is easy. We do it now. What we do
> is create a Jabber node for every subscription to our service. This works
> fine as long as the subscriptions are being created on the PubSub website --
> not via Jabber. For us to be able to support subscriptions being created
> within Jabber, we would have to have a way to give a unique identifier to a
> subscription. But, in its current form, JEP-0060 provides no means to
> identify a subscription. I.e. subscriptions aren't "nodes". In JEP-0060, a
> "subscription" is really just part of your affiliation with a node. You're
> either subscribed to the node or you're not. JEP-0060 doesn't support
> multiple subscriptions to a single node.
> 	I originally thought I would get around this problem by having
> people create "subscriptions" by creating a node and passing the "topic id"
> as part of node configuration. But, this doesn't really work very well. The
> problem is that it means that we've got this additional set of resources
> called "topics" that aren't really nodes but that behave somewhat like
> nodes. It is also a problem that in Jabber the creators of nodes are allowed
> to publish to their nodes... But, you can't publish to a "subscription"! If
> you publish, you need to publish to the topic/node that the subscription
> filters, not the subscription itself.
> 	These and other problems lead me to the conclusion that we should be
> treating our "topics" as Jabber Nodes and extending Jabber so that it
> returns uniquely named subscriptions when people subscribe to topics/nodes.
> For instance, you would subscribe to a node using something like the
> following:
> <iq type="set"
>     from="sample_at_pubsub_dot_com at pubsub.com"
>     to="xmpp.pubsub.com"
>     id="sub1">
>   <pubsub xmlns="http://jabber.org/protocol/pubsub">
>     <subscribe
>         node="pubsub/topics/weblogs"
>         jid="sample_at_pubsub_dot_com at pubsub.com"/>
>       <options>
>         <x xmlns="jabber:x:data" type="submit">
>           <field var="FORM_TYPE" type="hidden">
>             <value>http://jabber.org/protocol/
>              pubsub#subscribe_options</value>
>           </field>
>           <field var="title">
>             <value>Mentions of RSS at PubSub.</value>
>           </field>
>           <field var="query-string">
>             <value>(SOURCE:pubsub.com AND "RSS")</value>
>           </field>
>         </x>
>       </options>
>     </subscribe>
>   </pubsub>
> </iq>
> If successful, the server would respond with something like:
> <iq type="result"
>     from="xmpp.pubsub.com"
>     to="sample_at_pubsub_dot_com at pubsub.com"
>     id="sub1">
>   <pubsub xmlns="http://jabber.org/protocol/pubsub">
>     <entity subid="39AB3990989098088323"
>             node="pubsub/topics/weblogs"
>             jid="sample_at_pubsub_dot_com at pubsub.com"
>             affiliation="none"
>             subscription="subscribed">
>       <options>
>         <x xmlns="jabber:x:data" type="submit">
>           <field var="FORM_TYPE" type="hidden">
>             <value>http://jabber.org/protocol/pubsub
>                #subscribe_options</value>
>           </field>
>           <field var="title">
>             <value>Mentions of RSS at PubSub.</value>
>           </field>
>           <field var="query-string">
>             <value>(SOURCE:pubsub.com AND "RSS")</value>
>           </field>
>           <field var="xmlLink"><value>http://rss.pubsub.com/22/b7/
>           d1e9845b330137935cf3384bd7.xml</value></field>
>         </x>
>       </options>
>     </entity>
>   </pubsub>
> </iq>
> 	In the example above, the key difference from JEP-0060 as it stands
> is that a "subid" is returned. The subid allows the user and system to keep
> track of multiple subscriptions to a single topic/node. The creation of this
> subid has a few implications throughout the rest of the system. 
> 	The most critical impact is, I think, on the messages that get
> published to clients. The problem is that a single message may satisfy more
> than one subscription and, given that we send very large messages, we don't
> want to be forced to send multiple copies of the message. Thus, we need to
> be able to list multiple subids for a single message. Although it isn't
> pretty, I think this is best done something like the following:
>    <message to='sample_at_pubsub_dot_com at xmpp.pubsub.com' 
>      from='pubsub-delivery at xmpp.pubsub.com' >
>      <event xmlns='http://www.jabber.org/protocol/pubsub#event'>
>        <items node=' pubsub/topics/weblogs'>
>          <item id='6802'>
>            <subscription>
>              <subid>7098709860970897</subid>
>              <subid>098789790987888</subid>
>            </subscription>
>            <pubsub-message xmlns="http://www.pubsub.com/xmlns">
>              Message content goes here... This element would be 
>              omitted if sending notifications only.
>            </pubsub-message>
>          </item>
>        </items>
>      </event>
>    </message>
> 	In many other places in the spec, there would also need to be
> support provided for subids. For instance, when unsubscribing, the user
> would need to specify not only the nodeID but also the subID. If a subID was
> not specified, then it would be assumed that *all* subscriptions for the
> specified node should be deleted.
> 	I'm still working on a few issues. For instance, item deletion is a
> bit of problem since it would require that all deleted items be matched
> against outstanding subscriptions to determine which subscriptions should
> receive the item deleted messages defined in JEP-0060 at Section 8.1.3.
> Unfortunately, while this is reasonably practical if the filtering/query
> language is reasonably simple, it isn't very practical if notifications
> based on event patterns, inter-message dependencies, or context external to
> a message are supported. (i.e. it may not be possible at the moment of
> retraction to determine what would have matched a subscription at some
> earlier moment in time.) 
> 	Comments? Am I missing some obvious easier solution to the problems
> outlined above?

Hi Bob,

Since we did not have to opportunity to chat last week, I started on a blog
entry that comments on a few parts of your proposal. I hope to send
it out later today, but I think it does address this issue. When I update
my blog, I'll send another reply, here.



More information about the Standards mailing list