[Standards] Proposed XMPP Extension: Entity Versioning

Dave Cridland dave at cridland.net
Tue Sep 1 22:35:41 UTC 2015

On 1 September 2015 at 22:07, Sam Whited <sam at samwhited.com> wrote:

> On Tue, Sep 1, 2015 at 2:35 PM, Dave Cridland <dave at cridland.net> wrote:
> > So I think that as far as rosters go, this is duplicating XEP-0237 in a
> > considerably less efficient form.
> The main thing to keep in mind is that it can be used to diff
> arbitrary lists (rosters and MUC disco#items are specified, but you
> could equally use it for caching entity caps or feature lists, or just
> about any other arbitrary list your server felt like versioning).
> > XEP-0237§4 gives a fairly intense trip down implementation options, and
> none
> > of them require multiple versions of the roster as claimed by this
> ProtoXEP.
> > I have a personal preference for §4.3, though it gets complex with shared
> > groups and so on. Nevertheless, it is possible to get perfect efficiency
> if
> > you're willing to store tombstones.
> In §4.2 Exact-match Conformance you must store multiple versions of
> the roster (or at least the current version of the roster and any
> pending pushes, maybe I should rephrase that statement in the XEP)
> unless you want to re-sync the entire roster every time there's a
> change and the user isn't online to receive a push. Eg. if the user
> signs in and fetches the roster (with a digest version), then signs
> out and a new user is added to his roster, then the user signs back in
> and sends up the digest the server must have cached that new user to
> send a roster push back down. If your new user is added to many
> peoples rosters (but you can't guarantee that it's added to a whole
> groups roster) you now have to store that roster push for every single
> person who's roster it needs to be pushed to (as opposed to a single
> version token in the users database table or somewhere that can be
> diffed against).
> In §4.3 Add-only Conformance the assumption is that deletions are rare
> (since this will trigger an entire roster invalidation). This is not
> an assumption that can be made in many environments (eg. large
> organizations where shared rosters may constantly have people being
> deleted as people leave the company, contractors rotate in and out
> etc.). The combined approach that's also described in this section is
> somewhat better, but still requires that we store new additions in
> many places (eg. once for every user that should get the push, or for
> every group that shoud get the push, or both. This starts to
> complicate the data model.)
> There are further workarounds for most of the issues I've just
> described, but mostly they just lead to more rabbit holes and more
> problems, and end up resulting in a very complicated solution. Entity
> versioning just does this in a simpler way that works better with our
> data model and distributed architecture (and potentially with others
> architectures as well). We can also then re-use the exact same
> semantics for other lists as previously discussed (instead of
> maintaining two different syncrhonization and diffing mechanisms).
I think most (or all) of the above only applies if you have rosters that
are computed on demand, rather than managed by users via clients.

Otherwise all you need on a simple roster (no shared groups) is a counter
for the version, the value of the latest tombstone *not* retained (ie, the
last delete if there are no tombstones), and per item, the value of the
last change, and if it's deleted (ie, if it's a tombstone). No multiple
versions of anything. Tombstones are optional; but without them it means
it's only efficient for adds.

> There is actually a part 2 to this XEP which I hadn't submitted yet
> (because we haven't implemented it yet and I didn't want to submit
> until we at least had an implementation on our roadmap) where small
> chunks of an entity list can be diffed (eg. so that you can say "give
> me all changes to this subsection of the list") and then use a
> "search" feature to get more list items later. This lets you receive a
> subset of your roster (eg. if your roster has 10,000 users, you can
> receive 1000 users that your server thinks you need at first, and then
> use the search endpoints eg. if you go to start a chat and want to
> list more users later via an "auto complete" mechanism). This would
> make it so that you can slowly ramp up to full roster consistency
> (note that I say roster a lot, but again, this is for any list). Maybe
> I should go ahead and start working on that and submit it, because
> with this second phase the benefits become more aparent.
I agree.

> > Rosters are a particularly simple case for synchronization, because
> there is
> > a single view; disco#items has potentially one view per user, and as
> such is
> > more complex.
> >
> > In particular, assuming a room has configuration A, and then changes to
> > configuration A' - while we can tell if A' is visible to a user U --
> let's
> > call this V(A',U) -- we cannot tell if V(A,U) == V(A',U) without having
> A;
> > and given we don't always know which older configuration needs to be
> stored
> > to make that comparison, things can get complex fast.
> >
> > As such, a '237 style approach would probably be limited in practise to
> > having a §4.2 approach of hashing the entire list.
> >
> > This ProtoXEP tackles this problem by having the client upload its view
> for
> > comparison, although it also includes an exact-match mechanism.
> >
> > However, it's not clear from the specification how a server can signal
> > removal (or lack of visibility)
> That was an oversight on my part; I appear to have dropped our
> mechanism for that somehow when compiling this from our internals
> system into an XEP. I'll update soon. Thanks.
> > nor what advantages a client has in
> > exchanging the download of a large amount of data with the upload of a
> large
> > amount of data.
> In addition to the issues I mentioned before, the upload (in our case)
> is considerably less than the download because we send a lot of
> metadata with rooms and rosters (via a custom namespaced metadata
> element on the disco item or roster element). Eg. for a muc metadata
> might include id, topic, acl info, owner, number of paricpiants, guest
> url for unauthenticated web access, last active time, etc. Leaving
> this meatadata off is not an option, because we'll just have to query
> for it to display it anyways and we don't want to make another round
> trip to do so. The combination of this with aggregate token checking
> ensures that we don't have to upload anything if nothing has changed,
> and don't have to download much if only a few things have changed
> (rarely do we actually trigger an entire roster download, and the
> uploads don't send any metadata, so they're still relatively small).
So you're saying you added a bunch of stuff for efficiency, and then had to
add an efficient synch mechanism due to the inefficiency it caused? ;-)

Amazingly, the simple mechanism I detailed above still works for items
containing metadata, incidentally. As I said before, the difficulty is in
dealing with multiple views; I think MUC room listing has those, and I
don't have a solution - at least, not without a changelog.

In the meantime, I'll reserve judgement until I've seen a bit more on this.

> —Sam
> --
> Sam Whited
> pub 4096R/54083AE104EA7AD3
> https://blog.samwhited.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.jabber.org/pipermail/standards/attachments/20150901/029eee55/attachment.html>

More information about the Standards mailing list