[Standards] XEP-0060, offline modifications and efficient synchronization

Matthieu Rakotojaona matthieu.rakotojaona at gmail.com
Sat Sep 5 14:22:16 UTC 2015


Hey everyone,

I'm currently toying with XEP-0060 and have found what I think is a
shortcoming with the spec (or possibly with my knowledge)

Consider this use case:

* Patrick is a publisher to a given node, Sandra is a subscriber to the
  same node
* Sandra is online
* Patrick publishes something, the notification goes to Sandra (either
  with or without payload, Sandra is able to get the item)
* Sandra disconnects
* Patrick modifies the item and publishes 7 other (that's an arbitrary
  number)
* Because the pubsub service is configured like that, no notification is
  sent to Sandra
* Sandra connects and gets the last published item

At that point Sandra sees that there is at least 1 new item (the last
one). However she doesn't know which one have changed. She can request
the latest items through the max_items query, but she doesn't know how
many; the only sure way to make sure she's up to date is to query ALL
items.

She could use XEP-0313 (MAM), but:
- Not all pubsub implementations allow MAMing them (prosody is one of
  them)
- MAM only allows something like "give me everything since this moment",
  however using dates is always problematic (clocks are not
  synchronized, time can drift, ...)

Basically the problem here is when you have a persisting node where
items can be modified. You need to get some "checkpoints" and be able to
query the pubsub service from that checkpoint to get everything that
happened.

Has anyone had this issue ? If you did, how did you solve it ?

I think there are some little changes to XEP-0060 that could make it
possible to solve this problem:

- On top of ItemIDs, we define SeqIDs
- SeqIDs are NEVER set by clients but ALWAYS by the pubsub service
- A SeqID is specific to a node
- When an item is published, whether it existed or not, it is assigned a
  SeqID
- SeqID must NEVER be reused. A modification or a deletion of an item is
  always assigned a new SeqID
- SeqIDs have no semantic meaning, but they have an ordering that is
  remembered ONLY by the pubsub service. ie it may be an
  autoincrementing integer, but in a cluster implementation it may be
  something closer to vector clocks

Here's how it would look like on stanzas:

Patrick publishes an item as usual

<iq type='set'
    from='patrick at example.com'
    to='pubsub.example.com'
    id='publish1'>
  <pubsub xmlns='http://jabber.org/protocol/pubsub'>
    <publish node='princely_musings'>
      <item>Just won 1 Million, how awesome !</item>
    </publish>
  </pubsub>
</iq>

Service confirms, with a seq id...

<iq type='result'
    from='pubsub.example.com'
    to='patrick at example.com'
    id='publish1'>
  <pubsub xmlns='http://jabber.org/protocol/pubsub'>
    <publish node='princely_musings'>
      <item id='ae890ac52d0df67ed7cfdf51b644e901' seqid='afb7b61b71b0756a690033eb92b192bf3e972499'/>
    </publish>
  </pubsub>
</iq>

... and notifies sandra

<message from='pubsub.example.com' to='sandra at example.com' id='foo'>
  <event xmlns='http://jabber.org/protocol/pubsub#event'>
    <items node='princely_musings'>
      <item id='ae890ac52d0df67ed7cfdf51b644e901' seqid='afb7b61b71b0756a690033eb92b192bf3e972499'/>
      </item>
    </items>
  </event>
</message>

Sandra can then save this seqid as the last message that was seen on
this node, and if it ever changes it means she misses some items

At this point Sandra goes offline. Patrick realizes he made a mistake,
modifies his item...

<iq type='set'
    from='patrick at example.com'
    to='pubsub.example.com'
    id='publish2'>
  <pubsub xmlns='http://jabber.org/protocol/pubsub'>
    <publish node='princely_musings'>
      <item id='ae890ac52d0df67ed7cfdf51b644e901'>Just won 1 Billion, how even more awesome !</item>
    </publish>
  </pubsub>
</iq>

And is confirmed with a new seqid

<iq type='result'
    from='pubsub.example.com'
    to='patrick at example.com'
    id='publish2'>
  <pubsub xmlns='http://jabber.org/protocol/pubsub'>
    <publish node='princely_musings'>
      <item id='ae890ac52d0df67ed7cfdf51b644e901' seqid='83f9088962bf5044455b219a9601ce25bbeaf241'/>
    </publish>
  </pubsub>
</iq>

When Sandra comes back online, whether she receives the last published
item or not, she can query the node with all items since the given
seqid:

<iq type='get'
    from='sandra at example.com'
    to='pubsub.example.com'
    id='items2'>
  <pubsub xmlns='http://jabber.org/protocol/pubsub'>
    <items node='princely_musings' since='afb7b61b71b0756a690033eb92b192bf3e972499'/>
  </pubsub>
</iq>

and the service can answer with the latest modification:

<iq type='result'
    from='pubsub.example.com'
    to='sandra at example.com'
    id='items2'>
  <pubsub xmlns='http://jabber.org/protocol/pubsub'>
    <items node='princely_musings'>
      <item id='ae890ac52d0df67ed7cfdf51b644e901' seqid='83f9088962bf5044455b219a9601ce25bbeaf241'/>Just won 1 Billion, how even more awesome !/item>
    </items>
  </pubsub>
</iq>

Alternatively, this id can be used as paging ids for RSM. So Sandra
would query ALL items, with RSM stating that she wants everything AFTER
this id:

<iq type='get'
    from='francisco at denmark.lit/barracks'
    to='pubsub.shakespeare.lit'
    id='items1'>
  <pubsub xmlns='http://jabber.org/protocol/pubsub'>
    <items node='princely_musings'/>
    <set xmlns='http://jabber.org/protocol/rsm'>
      <max>10</max>
      <after>83f9088962bf5044455b219a9601ce25bbeaf241</after>
    </set>
  </pubsub>
</iq>

(I like this better because it reuses what is already existing instead
of introducing a new attribute; on the other hand it introduces some coupling
between pubsub and its use of RSM)

What do you think of it ? Did I miss something glaringly obvious ?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 343 bytes
Desc: not available
URL: <http://mail.jabber.org/pipermail/standards/attachments/20150905/a78f5adb/attachment.sig>


More information about the Standards mailing list