[standards-jig] JEP-0024: Publish/Subscribe - Topic based messaging
jabber at figroll.com
Wed Apr 24 10:13:26 UTC 2002
I have a few comments/ideas to add regarding the pub/sub proposal.
I have experience with IBM's MQ and MQe middleware products, specifically
in a SCADA  context, and I draw my ideas mainly from this. Forgive me
if I make some point that is specific to SCADA; I have tried to generalise
the points to those that I think a good pub/sub system should provide but
if you think that something is too specific let me know.
I pretty much agree with the handling of the publish and subscribe events
as they are defined in the JEP, however I think that there is room for
1. I realised that subscribing to all messages from all publishers
can generate a lot of traffic but I don't think that it is the place of the
protocol to restrict the ability to do this. There may be some valid
reason to want to subscribe to everything, a logging application for example.
It should be down to the programmer of the application to take care not
to do this if the system isn't going to be able to cope.
2. I think that the notion of using namespaces for published messages puts
restrictions on the protocol. Maybe it's just my misunderstanding of how
it would work. I'll describe my ideas for topics and then let's see
if we can clear the picture up afterwards. I use the term topic to
refer to the same part of the <publish> tag but with different semantics
which I will now describe.
I think that topics should be heirachical in the same way as a directory
name, for example. So I could publish messages on topics such as:
I think that these topics are pretty self-explanatory in that they are
for an imaginary environmental system. The publishing event is
fairly straightforward to understand. By publishing on a particular
topic we describe what the data represents in some structured way. An
example message would be (following the JEP):
<iq type='set' to='pubsub.localhost' from='publisher.localhost' id='s1'>
As a side note, in this sort of situation it is almost certainly beneficial
to acutally publish XML data rather than raw data, such as:
so as not to lose meaning in the data. But that's not a pub/sub thing so
I'll ignore that from now on.
With structured topics the subscription process is then more interesting.
Firstly, I think that with topic based pub/sub the need to subscribe to
a particular publisher is diminished but the JEP allows for this. The
biggest advantage of topics is wildcard subscriptions.
a. My subscription client may be interested in only the temparature in the
London office and would subscribe to the /london/office/temparature
topic. This is the boring case.
b. The client may be interested in all data from London and, using wildcard
subscriptions, could subscribe to the topic /london/*. It would then
receive messages published on any topic that starts with /london/.
c. The client may be interested in temperature data from london only and
subscribe to /london/*/temperature. (This is probably the hardest
case to implement efficiently. FSAs are you friend.)
Ok, I think I've made the point that I wanted to about wildcards. Now, this
may well be possible using the namespaces and I just didn't see it right.
Feel free to shoot me down... just don't use real flames..
3. On subscription to a topic it would be convenient to elect to receive
the last message that was published on that topic, if one is available.
<iq type='set' from='subscriber' to='pubsub' id='s1'>
would subscribe to the two topics but request that the last message
from /london/office/temperature be sent back.
This is important when considering the premis of pub/sub that messages
are only sent when something interesting has happened or the data needs
updating. When subscribing one presumes that the last message is still
relavent and interesting so it would be good to send it to the subscriber
and get them up-to-date as quickly as possible rather than waiting around
for the next message.
This should also be complemented by an attribute in the publish message
to tell the broker NOT to honour requests for the last message.
4. My last idea, and this might take some waffling to make my point.
Simply put it is handy to be able to set a 'Will' message that a publisher
can have published on their behalf (so that it seems to have come from
them) if they have stopped responding. Let me explain why with a SCADA
The publisher is usually a very lightweight device, running in a remote
location with a possibly intermittant connection to the broker. If that
publisher is monitoring some critical metric in a system it is usually
important to be able to define the failsafe situation.
A somewhat contrived example would be for the temperature monitor in
the London office, I'll skip the actually message content until I've
convinced people that this is really useful... ;o)
Publisher connects to the broker and gives it a message to publish
in the event of the publisher's untimely death -- a will message.
With this message it will also be necessary to specify a timeout
that if the broker hasn't heard from the publisher within that time
then it publishes the will message.
Now, the publisher can operate as normal, publishing messages when
something interesting happends. However it must be careful that it
sends messages to the broker with an interval shorter than the timeout
period. If there is no useful data to publish this could be satisified
with a ping message just to inform the broker that it is still alive.
So the publisher connects and sets a will message with a timeout of
60 seconds to publish the content 'n/a' on the topic /london/office/temperature.
It then happily publishes temperature readings everytime the temperature
changes by more than 0.5 degrees. If the reading doesn't change
within 60 seconds then it sends a ping message (or just resends the
current reading.. although if the message content is large this wastes
If someone disconnects this publisher, the broker will timeout for
that publisher and publish 'n/a' as though it came from the publisher.
All subscribers will show sensible data. Of course, if this were in
a nuclear reactor and the temperature reading was no longer available
the controlrod subscriber could then shut down the reactor as a failsafe.
(told you it was a little contrived)
The timeout probably wants to be time + x for the broker to account
for network lag and the like. Not really thought about that bit.
I have a few more points but they generally depend on understanding of
these first. So let me know if they make sense or are just pure fantasy.
More information about the Standards