[Standards-JIG] pubsub: cache-last-item

Peter Saint-Andre stpeter at jabber.org
Sat Feb 18 04:14:18 UTC 2006

Hash: SHA1

Right now the text in version 1.8pre7 has "SHOULD cache" but for the
generic JEP-0060 cases this probably needs to be MAY (with the proviso
that non-generic profiles of pubsub, such as PEP, can define other
requirements regarding this feature).


Bob Wyman wrote:
> Peter Saint-Andre wrote on 30-Jan-2006:
>> I want to make sure that the proposed "cache-last-item" functionality
>> is clearly understood before we move forward with revisions to JEP-0060.
> 	I think this caching business needs a great deal more thought.
> 	Caching the last message published would can, I think, only serve
> one of two purposes:
> 	1) Tell you that data has actually been published in the past.
> 	2) Provide the last state of some resource.
> 	The first motivation isn't, I think, sufficiently strong to require
> such an expensive feature. The second motivation is compelling only in that
> limited set of cases where the node only provides information about a single
> resource and then, it is only going to be useful in cases where historical
> information about the resource is useful. Also, the second motivation may
> require exceptionally expensive operations in the case of content-based
> subscriptions that filter node events.
> 	As Peter Millard mentioned in a recent reply, required caching can
> place a significant resource burden on the server. At PubSub.com, our
> JEP-0060 implementation currently services a couple million subscriptions.
> Would this require that we cache the last message for each of those millions
> of subscriptions? That is a pretty major cost in exchange for some
> "simplicity..."
> 	What is the interaction between this requirement and the ability to
> subscribe to Collections? Imagine that I have a stock-quote server that
> aggregates nodes for each "Fortune 500" stock in a collection that can be
> subscribed to. What do I send when someone subscribes to the collection?
> Would I send just the last item published to any of the nodes in the
> collection or would I publish the potentially large set of items that are
> the last items sent to any member of the collection? If I should only
> publish one item -- the last item published to the collection -- then how is
> that useful to a subscriber? What good does it do you to have one item out
> of potentially hundreds?
> 	In the case of content-based subscriptions, one can have a single
> node that carries data about many different resources. For instance, at
> PubSub.com, our JEP-0060 server offers nodes that carry all updates to over
> 20 million blogs. We publish millions of items every day. It is unlikely
> that the last item published to the "weblogs" node will match a newly
> created subscription. In order to model the apparently desired behavior
> (i.e. always deliver a single result when a subscription is created) we
> would have to maintain a retrospective search engine that did a
> retrospective search of all items published in history in order to find the
> last item that would have matched... Certainly, there are reasons why this
> might be desirable -- however, it is unreasonable to require it.
> 	In the case of a system which issues fire alarms, it may be that two
> years ago the system issued such an alarm but has not published any alarm
> data since that time. Should a new subscriber to the fire-alarm node be
> presented with the fire-alarm from two years ago? (Yes, they should notice
> that it was a long time ago... One might even suggest that a fire-alarm
> should eventually be followed by an "all clear." However, why would this
> complexity be necessary?)
> 	I believe there are cases where "last message" is useful. However,
> there are many, many cases where it is not. It appears to me that most of
> the folk who are commenting so far are thinking only of those cases where it
> does make sense. Given that there are many cases where it doesn't make
> sense, it would be reasonable to avoid general "should" statements in the
> protocol specification and instead rely on application protocols to define
> the "SHOULD" cases on a use-case by use-case basis. Thus, a definition of
> the use of PubSub for presence applications or in some gaming applications
> might say that you "SHOULD" cache the last message -- but the base protocol
Version: GnuPG v1.4.1 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3641 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://mail.jabber.org/pipermail/standards/attachments/20060217/b59ca162/attachment.bin>

More information about the Standards mailing list