[Standards] Entity Capabilities 2.0

Jonas Wielicki jonas at wielicki.name
Wed Feb 14 20:44:00 UTC 2018

On Montag, 12. Februar 2018 09:10:47 CET Jonas Wielicki wrote:
> On Montag, 12. Februar 2018 00:41:54 CET Christian Schudt wrote:
> > - Generally I am unsure if using the "xml:lang" and „name" from the
> > identities is a good idea at all, because these two attributes should not
> > change the capabilities of an entity. Name and language is just for
> > humans.
> > I.e. if a server sends german identities for one user and english
> > identities for the next user (depending on their client settings/stream
> > header), the server still has the same identities, which should result in
> > the same verification string, shouldn’t it?
> First of all, I think previously, an entity answering a disco#info request
> always sent all translated identities, so that would not have been an issue.
> You’re touching on a more general thing though which I’d like to discuss. We
> could separate the hash into three hashes, one for identities, one for
> features and one for forms (or maybe two: identities and forms+features).
> This has the upside that human readable identifiers don’t interfere with
> protocol data (features/forms) in many cases (I think the identities are
> more rarely used in protocols, but I might be wrong). The obvious downside
> is that we need to transfer more data in the presence (twice or thrice the
> amount for ecaps2).
> I’d like to know what you people think of it. Since this is still
> Experimental, I’d be fine with bumping the namespace and getting this done.
> But I’m afraid that the bandwidth costs will outweigh the advantages. We
> have ~100 bytes for a 256 bit hashsum (including wrapper XML). We would end
> up with more than half a kilobyte (~0.6 kB) for ecaps2 if we split the
> hashes and assume that each entity uses two hash functions with 256 bits
> each (which I think is a reasonable assumption). If we have caps
> optimization, the impact would probably be neglectible, but I’m not sure if
> we can assume that.
> I’d like to get input from you folks on that.

I had some off-list input on this. First, Evgeny pointed out that the work 
which is in progress on MUC bare-presence [1] has uncovered that caps don’t 
really work well for the MUC case. A MUCs disco#info contains for example the 
number of occupants currently in the room, which may fluctuate a lot (thus 
causing lots of <presence/> traffic if caps are used completely) [2].

Second, Florian Schmaus questioned my approach of splitting the hashes and 
asked for use-cases where this makes sense. I think I can come up with two use 
cases off the top of my head, both with varying relevance depending on which 
metric you want to optimize.

- The MUC use case from above. Granted, this isn’t in any spec yet, but it 
  would be great to have. Daniel noted that having the disco#info form of
  MUCs is useful to detect (a part of) the configuration which is relevant
  to (IMO reasonable) UX choices in Conversations.

  However, obviously if the occupant count is in there, the use of a caps
  hash is rather defeated in this case.

- Clients sometimes include XEP-0232 (Software Information) and other forms
  in their disco#info. This might be high-cardinality information which 
  may thrash (overloads/fills) entity caches.

  I used the (a bit dated) capsdb [3] and ran the numbers:

  Total items in capsdb: 1602
  Distinct hashes: 1558 (i.e. XEP-0115/XEP-0390 as-is)
  Distinct identity+features: 1140
  Distinct forms: 450

  This is less of a saving than I expected; however, the capsdb is rather 
  dated. I wonder whether the saving is larger nowadays if there are more 
  clients which implement XEP-0232 or other similar things.

Splitting the hashes could also allow entities to explicitly opt-out of one of 
the two hashes; an entity with a disco#info form which changes in real-time 
could opt-out of sending the form hash altogether (instead of sending a hash 
equivalent to "no form"); thus signalling to peers that if disco#info form 
data is desired, it needs to be queried freshly.

All over all, I’m not sure if those two use-cases warrant the increase of 
bandwidth use by a factor of approximately two for caps2.

I’m still hoping for more feedback on this, thanks!

kind regards,

   [1]: The idea is to let MUCs emit a presence from the bare JID after the
        client joined to send them caps and avatar info etc.
   [2]: They work around that currently by not including the form in the caps
        and omitting the form data from disco#info queries against caps 
        disco#info nodes.
   [3]: https://github.com/xnyhps/capsdb/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.jabber.org/pipermail/standards/attachments/20180214/e76fda91/attachment.sig>

More information about the Standards mailing list