[Standards-JIG] pubsub: multiple inheritance?

Bob Wyman bob at wyman.us
Mon Feb 6 20:15:33 UTC 2006


On 2/2/06, Peter Saint-Andre <stpeter at jabber.org> wrote:
> 2. However, it is RECOMMENDED to allow only one parent collection node 
> for each leaf node and therefore it is RECOMMENDED to specify a type 
> of "text-single" for the "pubsub#collection" field.

	I support this suggestion. Discussion follows:

	It is important to note that support for Collections, Collections of
Collections and support for leaf nodes in more than one Collection is
important to implementers of "Topic Based" pubsub systems which, unlike
content-based systems, use "topic name" or "topic hierarchy" as the means of
selection and aggregation rather than attributes of a message's content.
Many existing enterprise based pubsub systems (like TIBCO, etc.) make heavy
use of the topic-based method of expressing subscriptions. 
	A topic-based method of subscribing to stock-market data might allow
me to find data about IBM by looking in one of "stocks.companies.ibm",
"stocks.computer-manufacturers" or "stocks.fortune500". I could then
subscribe either to IBM or to one of the collections within which it
appears.
	There are two problems being addressed here. The first is simply one
of communicating from the server to the client the structure of the topics
supported by the system. Personally, I think this is something which should
be done via some other method... The second function is one of specifying a
multi-topic or "Multi-leaf" subscription via the collection. Such aggregated
subscriptions have some value -- although JEP-0060 is rather underspecified
in handling this case... (The main problem is that for such a multi-leaf
subscription to make sense in most applications, one must typically have a
common schema across all elements of the collection.)
	My personal feeling is that "topic-based" or "name-based"
aggregation is an artifact of old implementations and methods of doing
publish/subscribe. Today, the preferred method of creating an "aggregate"
subscription would be to use a content-based subscription. In the case
above, one might provide the subscriber with a single "stocks" topic and
then allow the creation of a content-based subscription that filtered on
attributes of messages. Thus, all stock quotes might be carried on the
"stock-quote" topic/leaf and the schema for a stock-quote would permit
assigning attributes to a quote. Thus, an IBM quote might have a number of
attributes such as "computer-manufacturer" and "fortune500". Then, if I
wanted to subscribe to just IBM, I would subscribe to "stock-quotes" leaf
and specify "name=IBM". If I wanted all stocks in the fortune500, I would
subscribe to "stock-quotes" and specify "attribute=fortune500". If I only
wanted computer-manufacturers in the Fortune500, I could of course, say:
"attribute=fortune500 AND attribute=computer-manufacturer." Using
content-based methods, the equivalence of arbitrarily complex topic/leaf
relationships can be achieved. (This is very similar to discussions of
"tagging" in the search world...)
	Folk often suggest that content-based subscriptions are "harder"
than topic-based. However, topic-based systems usually tend to grow
exceptionally complex as people build ever-more-complex hierarchies of
topic/leaf names and structures... Thus, the apparent simplicity of the
topic-based approach is often not realized.
	I would support encouraging folk not to include a single leaf in
more than one collection. Frankly, I would go further and recommend
avoidance of "collections of collections" and subscriptions to
collections...

	bob wyman






More information about the Standards mailing list