[Standards] Proposed XMPP Extension: Entity Versioning

Sam Whited sam at samwhited.com
Tue Sep 1 21:07:54 UTC 2015

On Tue, Sep 1, 2015 at 2:35 PM, Dave Cridland <dave at cridland.net> wrote:
> So I think that as far as rosters go, this is duplicating XEP-0237 in a
> considerably less efficient form.

The main thing to keep in mind is that it can be used to diff
arbitrary lists (rosters and MUC disco#items are specified, but you
could equally use it for caching entity caps or feature lists, or just
about any other arbitrary list your server felt like versioning).

> XEP-0237§4 gives a fairly intense trip down implementation options, and none
> of them require multiple versions of the roster as claimed by this ProtoXEP.
> I have a personal preference for §4.3, though it gets complex with shared
> groups and so on. Nevertheless, it is possible to get perfect efficiency if
> you're willing to store tombstones.

In §4.2 Exact-match Conformance you must store multiple versions of
the roster (or at least the current version of the roster and any
pending pushes, maybe I should rephrase that statement in the XEP)
unless you want to re-sync the entire roster every time there's a
change and the user isn't online to receive a push. Eg. if the user
signs in and fetches the roster (with a digest version), then signs
out and a new user is added to his roster, then the user signs back in
and sends up the digest the server must have cached that new user to
send a roster push back down. If your new user is added to many
peoples rosters (but you can't guarantee that it's added to a whole
groups roster) you now have to store that roster push for every single
person who's roster it needs to be pushed to (as opposed to a single
version token in the users database table or somewhere that can be
diffed against).

In §4.3 Add-only Conformance the assumption is that deletions are rare
(since this will trigger an entire roster invalidation). This is not
an assumption that can be made in many environments (eg. large
organizations where shared rosters may constantly have people being
deleted as people leave the company, contractors rotate in and out
etc.). The combined approach that's also described in this section is
somewhat better, but still requires that we store new additions in
many places (eg. once for every user that should get the push, or for
every group that shoud get the push, or both. This starts to
complicate the data model.)

There are further workarounds for most of the issues I've just
described, but mostly they just lead to more rabbit holes and more
problems, and end up resulting in a very complicated solution. Entity
versioning just does this in a simpler way that works better with our
data model and distributed architecture (and potentially with others
architectures as well). We can also then re-use the exact same
semantics for other lists as previously discussed (instead of
maintaining two different syncrhonization and diffing mechanisms).

There is actually a part 2 to this XEP which I hadn't submitted yet
(because we haven't implemented it yet and I didn't want to submit
until we at least had an implementation on our roadmap) where small
chunks of an entity list can be diffed (eg. so that you can say "give
me all changes to this subsection of the list") and then use a
"search" feature to get more list items later. This lets you receive a
subset of your roster (eg. if your roster has 10,000 users, you can
receive 1000 users that your server thinks you need at first, and then
use the search endpoints eg. if you go to start a chat and want to
list more users later via an "auto complete" mechanism). This would
make it so that you can slowly ramp up to full roster consistency
(note that I say roster a lot, but again, this is for any list). Maybe
I should go ahead and start working on that and submit it, because
with this second phase the benefits become more aparent.

> Rosters are a particularly simple case for synchronization, because there is
> a single view; disco#items has potentially one view per user, and as such is
> more complex.
> In particular, assuming a room has configuration A, and then changes to
> configuration A' - while we can tell if A' is visible to a user U -- let's
> call this V(A',U) -- we cannot tell if V(A,U) == V(A',U) without having A;
> and given we don't always know which older configuration needs to be stored
> to make that comparison, things can get complex fast.
> As such, a '237 style approach would probably be limited in practise to
> having a §4.2 approach of hashing the entire list.
> This ProtoXEP tackles this problem by having the client upload its view for
> comparison, although it also includes an exact-match mechanism.
> However, it's not clear from the specification how a server can signal
> removal (or lack of visibility)

That was an oversight on my part; I appear to have dropped our
mechanism for that somehow when compiling this from our internals
system into an XEP. I'll update soon. Thanks.

> nor what advantages a client has in
> exchanging the download of a large amount of data with the upload of a large
> amount of data.

In addition to the issues I mentioned before, the upload (in our case)
is considerably less than the download because we send a lot of
metadata with rooms and rosters (via a custom namespaced metadata
element on the disco item or roster element). Eg. for a muc metadata
might include id, topic, acl info, owner, number of paricpiants, guest
url for unauthenticated web access, last active time, etc. Leaving
this meatadata off is not an option, because we'll just have to query
for it to display it anyways and we don't want to make another round
trip to do so. The combination of this with aggregate token checking
ensures that we don't have to upload anything if nothing has changed,
and don't have to download much if only a few things have changed
(rarely do we actually trigger an entire roster download, and the
uploads don't send any metadata, so they're still relatively small).


Sam Whited
pub 4096R/54083AE104EA7AD3

More information about the Standards mailing list