[standards-jig] JEP-0024: Publish/Subscribe - Topic based messaging

Iain Shigeoka iainshigeoka at yahoo.com
Sun Apr 28 17:25:27 UTC 2002


--- Dave Turner <jabber at figroll.com> wrote:
> On Wed, Apr 24, 2002 at 02:40:33PM -0700, Iain Shigeoka wrote:
> 
> I think the issue here is defining at what level the structure needs to be
> defined.  We need to consider the publisher, subscriber, and brokers'
> interpretation of the topic.  The broker is especially important because
> it needs to route messages in a smart way.

Agreed.

> One thing that I was intersted in finding/developing a robust pub/sub
> messaging
> system was to play with some ideas I had for efficient routing of messages
> over a federated network of brokers.
[snip]
> If there is some minimal structure that the broker can rely on then this
> set should calculatable.  Sorry, I've not had enough coffee yet to think
> up an interesting example.

Does this have to be done via the topic or even topic metadata?  Without
putting much thought into it, my gut reaction would be that the brokers could
get all this information during the binding of subscribers to queues.  I may be
missing something though.

One major efficiency boost can be had by filtering messages at the publisher
rather than at the subscriber endpoint of the queue.  This means pushing the
filter information from the subscriber to the publisher end of the broker
chain.  The problem being security.  If the 2 brokers don't necessarily trust
each other, it is risky to run code (the filter) that you obtain from your
untrusted fellow broker.  Even sand boxed, you can conduct denial of service
attacks this way making these types of optimizations probably unrealistic.

This would be equivalent to moving your email filters to the sender's email
server rather than on your receiving server or email client. it would save a
ton of bandwidth and storage but is probably impractical...

> > Another way to define would be an arbitrary text name, and then
> > meta-information that we can attach for browse/search/filtering purposes.
> 
> Hmm... that could be interesting.  The Dublin Core Metadata Initiative [1]
> might have some interesting material on this.  I have been working with
> some of the DCMI elements.  There might be scope there.

Interesting.  I'll check it out.

> > the server though.  You'd never be sure when a message is done and can be
> > discarded from a queue because someone can always come along and request
> the
> > last x messages.
> 
> I was only thinking of keeping the last 1 message at the broker.  The premise
> of event based pub/sub messaging is that each message supercedes the previous
> one on a given topic.  This is of course from the topic based pub/sub point
> of view, so maybe multiple queued messages is the correct generalization.

Right.  In general messaging, we really like to have "store and forward"
capabilities so that the queue acts as a message mailbox when clients are
detached.  If there can be more than one subscriber to a queue, the server may
be storing different numbers of messages for each user in each queue.

> I'm interested in the point you raise about determining when a message is
> 'done'.  Obviously, the broker isn't the place to decide this.  The publisher
> however, having best understanding of the data, can set a TTL on the message
> and the broker can cull messages when they become too old.  That would be
> a nice option to have.  If a TTL isn't defined the message can persist
> forever
> making this transparent to the clients if they don't support it.

Time outs on messages is a good idea.  My scenario though is a bit different. 
Imagine we have a queue with 3 subscribers online.  A message is published. 
The server/broker pushes the message out to all three.  Now one subscriber goes
offline but remains subscribed.  A message is published.  2 subscribers are
online, the messages is pushed out to the two subscribers, and stored for the
other subscriber.  One of the 2 subscribers disconnects but remains subscribed.
 Another message is published.  It is directly delivered to the remaining
online subscriber, but is stored for delivery to the 2 offline subscribers. 
The first message in the queue is waiting for one subscriber to take it.  the
second message in the queue is waiting for 2 subscribers to take it.

The server must be able to track and manage these "reference counts" on
messages waiting to be delivered.  In most large servers, I would imagine you
wouldn't make copies, but instead make links between subscribers and their
pending messages to avoid redundant data.

The problem becomes complicated if we allow you to subscribe and bind and
request the last x messages published to the queue.  If you request the last 3
messages in the above example, the server is in a bind.  It should normally
throw out the first message published to the queue as it was delivered to the 3
subscribers and had no more delivery requirements.  However, now this new
subscriber is asking for it.  The server won't know when it can safely get rid
of a message and will have to save all messages indefinitely.

We can limit the number of  messages you can request but a lot of the power in
requesting past published messages would be in getting a log of the
conversation up to the current point in time (really useful for conferencing
for example).

This is one of those really nice features that could come with significant
performance costs for the server.  I think it is a good idea so I think some
balance may be the best solution...

-iain

__________________________________________________
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com



More information about the Standards mailing list