[standards-jig] JEP-0024: Publish/Subscribe - Topic based messaging

Dave Turner jabber at figroll.com
Wed Apr 24 10:13:26 UTC 2002


Hi,

I have a few comments/ideas to add regarding the pub/sub proposal.
I have experience with IBM's MQ and MQe middleware products, specifically
in a SCADA [1] context, and I draw my ideas mainly from this.  Forgive me
if I make some point that is specific to SCADA; I have tried to generalise
the points to those that I think a good pub/sub system should provide but
if you think that something is too specific let me know.

I pretty much agree with the handling of the publish and subscribe events
as they are defined in the JEP, however I think that there is room for
extra levarage.



1.  I realised that subscribing to all messages from all publishers
    can generate a lot of traffic but I don't think that it is the place of the
    protocol to restrict the ability to do this.  There may be some valid
    reason to want to subscribe to everything, a logging application for example.
    It should be down to the programmer of the application to take care not
    to do this if the system isn't going to be able to cope.

2.  I think that the notion of using namespaces for published messages puts
    restrictions on the protocol.  Maybe it's just my misunderstanding of how
    it would work.  I'll describe my ideas for topics and then let's see
    if we can clear the picture up afterwards.  I use the term topic to
    refer to the same part of the <publish> tag but with different semantics
    which I will now describe.

    I think that topics should be heirachical in the same way as a directory
    name, for example.  So I could publish messages on topics such as:

        /london/office/temperature
        /london/apartment/temperature
        /boston/office/temperature
        /boston/office/humidity

    I think that these topics are pretty self-explanatory in that they are
    for an imaginary environmental system.  The publishing event is
    fairly straightforward to understand.  By publishing on a particular
    topic we describe what the data represents in some structured way.  An
    example message would be (following the JEP):

        <iq type='set' to='pubsub.localhost' from='publisher.localhost' id='s1'>
          <query xmlns='jabber:iq:pubsub'>
            <publish topic='/london/office/temperature'>14</publish>
          </query>
        </iq>

    As a side note, in this sort of situation it is almost certainly beneficial
    to acutally publish XML data rather than raw data, such as:

        <temperature scale="centigrade">14</temperature>

    so as not to lose meaning in the data.  But that's not a pub/sub thing so
    I'll ignore that from now on.


    With structured topics the subscription process is then more interesting.
    Firstly, I think that with topic based pub/sub the need to subscribe to
    a particular publisher is diminished but the JEP allows for this.  The
    biggest advantage of topics is wildcard subscriptions.

    a.  My subscription client may be interested in only the temparature in the
        London office and would subscribe to the /london/office/temparature
        topic.  This is the boring case.

    b.  The client may be interested in all data from London and, using wildcard
        subscriptions, could subscribe to the topic /london/*.  It would then
        receive messages published on any topic that starts with /london/.

    c.  The client may be interested in temperature data from london only and
        subscribe to /london/*/temperature.  (This is probably the hardest
        case to implement efficiently.  FSAs are you friend.)


    Ok, I think I've made the point that I wanted to about wildcards.  Now, this
    may well be possible using the namespaces and I just didn't see it right.
    Feel free to shoot me down... just don't use real flames..


3.  On subscription to a topic it would be convenient to elect to receive
    the last message that was published on that topic, if one is available.

    eg.

        <iq type='set' from='subscriber' to='pubsub' id='s1'>
          <query xmlns='jabber:iq:pubsub'>
            <subscribe>
              <topic last='1'>/london/office/temperature</topic>
              <topic>/boston/office/temperature</topic>
            </subscribe>
          </query>
        </iq>

        would subscribe to the two topics but request that the last message
        from /london/office/temperature be sent back.

    This is important when considering the premis of pub/sub that messages
    are only sent when something interesting has happened or the data needs
    updating.  When subscribing one presumes that the last message is still
    relavent and interesting so it would be good to send it to the subscriber
    and get them up-to-date as quickly as possible rather than waiting around
    for the next message.

    This should also be complemented by an attribute in the publish message
    to tell the broker NOT to honour requests for the last message.


4.  My last idea, and this might take some waffling to make my point.

    Simply put it is handy to be able to set a 'Will' message that a publisher
    can have published on their behalf (so that it seems to have come from
    them) if they have stopped responding.  Let me explain why with a SCADA
    example.

    The publisher is usually a very lightweight device, running in a remote
    location with a possibly intermittant connection to the broker.  If that
    publisher is monitoring some critical metric in a system it is usually
    important to be able to define the failsafe situation.

    A somewhat contrived example would be for the temperature monitor in
    the London office, I'll skip the actually message content until I've
    convinced people that this is really useful... ;o)

    Publisher connects to the broker and gives it a message to publish
    in the event of the publisher's untimely death -- a will message.
    With this message it will also be necessary to specify a timeout
    that if the broker hasn't heard from the publisher within that time
    then it publishes the will message.

    Now, the publisher can operate as normal, publishing messages when
    something interesting happends.  However it must be careful that it
    sends messages to the broker with an interval shorter than the timeout
    period.  If there is no useful data to publish this could be satisified
    with a ping message just to inform the broker that it is still alive.

    So the publisher connects and sets a will message with a timeout of
    60 seconds to publish the content 'n/a' on the topic /london/office/temperature.

    It then happily publishes temperature readings everytime the temperature
    changes by more than 0.5 degrees.  If the reading doesn't change
    within 60 seconds then it sends a ping message (or just resends the
    current reading.. although if the message content is large this wastes
    bandwidth).

    If someone disconnects this publisher, the broker will timeout for
    that publisher and publish 'n/a' as though it came from the publisher.
    All subscribers will show sensible data.  Of course, if this were in
    a nuclear reactor and the temperature reading was no longer available
    the controlrod subscriber could then shut down the reactor as a failsafe.
    (told you it was a little contrived)

    The timeout probably wants to be time + x for the broker to account
    for network lag and the like.  Not really thought about that bit.



I have a few more points but they generally depend on understanding of
these first.  So let me know if they make sense or are just pure fantasy.

Cheers.


[1] http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?SCADA

-- 
Dave Turner
http://figroll.com/



More information about the Standards mailing list