[Standards] What is the size limit of node and item ids in XEP-0060: Publish-Subscribe?

Timothée Jaussoin edhelas at movim.eu
Tue Mar 27 16:15:15 UTC 2018


Hi everyone, 

Just bouncing back this discussion again. I'd like to see what we can decide at the moment based on the information that we have here.
I'll add a couple of information there, those are simple technical limitation that will guide our decisions regarding this problem.

ejabberd is restraining the size of the JIDs, node IDs and many other things to varchar(191) for MySQL, I'm doing similar things in
Movim regarding the key size limit in MySQL (it's less a problem for the other SQL databases).

So we already have in the wild some servers that will not accept those long JIDs and IDs.

Some web app that are using XMPP as a backend are mapping Pubsub resources to URLs, like Movim or SàT (afaik), here's an example https:
//nl.movim.eu/?node/pubsub.movim.eu/Movim/a-new-release-is-coming-help-up-with-the-translations-WM4Yrf. On my side I'm slugifying
things to make those node and item ids easier to read but I'm expecting to have some escaping problems for some cases.

Related to that, we have Bookmark 2 that is in discussion https://xmpp.org/extensions/inbox/bookmarks2.html. This XEP defines that
"Each item SHALL have, as item id, the Room JID of the chatroom". This means that Pubsub item ids have the same definition as JIDs?

On my side I'd propose to restrict JIDs to something shorter (like 128 UTF-8 characters) to be sure that those can be stored and
intexed properly in databases and to define that all the Pubsub/PEP IDs are having the same definition as JIDs. 

Regards,

Timothée Jaussoin

Le mercredi 07 mars 2018 à 09:20 -0700, Peter Saint-Andre a écrit :
> On 3/6/18 1:02 AM, Jonas Wielicki wrote:
> > Hi Peter,
> > 
> > Thank you very much for the clarification, comments inline.
> > 
> > On Dienstag, 6. März 2018 02:59:04 CET Peter Saint-Andre wrote:
> > > On 3/5/18 12:17 AM, Jonas Wielicki wrote:
> > > > On Sonntag, 4. März 2018 19:42:39 CET Peter Saint-Andre wrote:
> > > > > On 3/4/18 10:54 AM, Jonas Wielicki wrote:
> > > > > > On Sonntag, 4. März 2018 17:02:07 CET Peter Saint-Andre wrote:
> > > > > > > If we want to specify this, I would recommend the UsernameCaseMapped
> > > > > > > profile defined in RFC 8265.
> > > > > > > 
> > > > > > > However, there's a twist: if a node ID can be a full JID, then do we
> > > > > > > want to apply the normal rules of RFC 7622 to all the JID parts,
> > > > > > > instead
> > > > > > > of one uniform profile such as UsernameCaseMapped to the entire node
> > > > > > > ID?
> > > > > > > For instance, the resourcepart of a JID is allowed to contain a much
> > > > > > > wider range of Unicode characters than is allowed by the
> > > > > > > UsernameCaseMapped profile of the PRECIS IdentifierClass (which we use
> > > > > > > for the localpart).
> > > > > > > 
> > > > > > > Given that a node ID can be used for authorization decisions, I think
> > > > > > > it's better to be conservative in what we accept (specifically, not
> > > > > > > allow the wider range of characters in a resourcepart because
> > > > > > > developers, and attackers, could get too "creative").
> > > > > > 
> > > > > > I would argue that adding those restrictions / any kind of string
> > > > > > prepping
> > > > > > to XEP-0060 or XEP-0030 nodes is (a) too late and (b) ambiguous at
> > > > > > least,
> > > > > > as you mentioned (depending on the data).
> > > > > 
> > > > > I would argue that not specifying normalization rules is a security hole
> > > > > (e.g., allowing an attacker to gain unauthorized access to a node). Just
> > > > > because we should've done this years ago doesn't mean we can fix it now.
> > > > 
> > > > Hm, okay, I don’t seem to understand the attack vector. Could you spell it
> > > > out more clearly to me?
> > > 
> > > Here's a true, non-XMPP example: I have the account stpeter at gmail.com.
> > > However, Google ignores "." in the localpart. Therefore I receive some
> > > email messages intended for st.peter at gmail.com. I could probably reset
> > > passwords (via email-based authentication) and take over other accounts
> > > associated with st.peter at gmail.com.
> > > 
> > > Similarly, let's say you create a node "foo2" at pubsub.example.com. If
> > > I know that this service decomposes superscript characters to their
> > > compatibility equivalents, I could create a node "foo²" (the last
> > > character is U+00B2 = SUPERSCRIPT TWO) and the service would consider it
> > > to be the same as "foo2". Now I can publish notifications to your node
> > > without ever trying to take over your account - I just use my "foo²" node.
> > 
> > Okay, that all makes sense, but it seems to me that this is due to the 
> > *presence* of a normalization, not the absence. 
> 
> Actually, incomplete or incorrect normalization.
> 
> > That’s where my confusion came 
> > from. I think the absence of a normalization (or specifying that absence) is 
> > not going to do us harm. 
> 
> Never assume that harm can't happen when computers are involved. :-)
> Especially when internationalized characters are used. If we said that a
> node could only use characters from the ASCII range then we'd be safe,
> but that's not the case - people want to use JIDs as nodes, which means
> we're inheriting everything from internationalized domain names (please
> read RFC 5890), internationalized usernames (please read RFC 7613), and
> internationalized "free-form" strings (please read RFC 7613 again), and
> their combination in XMPP (please read RFC 7622). Handling all of those
> strings correctly requires normalization of some kind, end of story.
> 
> > That is what I was trying to say when I said that 
> > "I’d also argue that nodes aren’t shown or typed into a field by users 
> > normally, so I would not worry about that kind of normalization here.": Since 
> > users aren’t confronted with them, lookalikes etc. should not be an issue and 
> > do not need to be normalized.
> 
> This is not just about user-facing "confusable characters", but
> machine-generated and machine-processed characters as well. And in any
> case do you think that a pubsub application will *never* show the node
> name to an end user? These things inevitably leak out to userland (e.g.,
> for a user to manage subscriptions, for a node owner to manage users, etc.).
> 
> > If we’re going to specify that "node names etc. need to be taken as-is and 
> > compared codepoint-by-codepoint [I can’t look up the name of that collation 
> > right now] and must not be normalized in any way by the service", that makes 
> > sense to me; 
> 
> There's your problem: you think this internationalization stuff makes
> sense. :-) Abandon hope, all ye who enter here! If I had more time, I'd
> write a book entitled "Internationalization: A Guide for the Perplexed".
> 
> Comparing two strings for an octet-for-octet match is the last step, but
> if you don't properly enforce various rules before then (including
> normalization), bad things will happen. Especially if we're allowing
> things like JIDs to be nodes - or, even worse, any Unicode code point
> (how do we handle combining characters, zero-width spaces, and all the
> other madness?). Authorization decisions will be wrong, etc.
> 
> > I think most services, if not all, already operate this way.
> 
> What *exactly* are services doing?
> 
> > Otherwise, I think we’ll have to think hard about the implications of 
> > introducing a normalization/preparation method this far into deployment and 
> > how to handle unnormalized input [1]. XEP-0030 is Final and used ~everywhere, 
> > XEP-0060 is Draft and a key dependency to a few modern features (via PEP). 
> > Having the ecosystem move from "no preparation" to "some preparation" feels 
> > like it’s bound to introduce exactly the type of bugs you were talking about.
> 
> Correct handling of internationalized characters feels a lot safer to me
> than incorrect handling.
> 
> > Add to that the trickiness if we want to use JIDs as node names, I’d argue 
> > that a "don’t touch this" directive to the server makes sense. If a protocol 
> > has specific requirements for node names specifically in PubSub, I think it 
> > could still specify that.
> > 
> > Does this make sense?
> 
> See above on making sense.
> 
> Peter
> 
> 
> _______________________________________________
> Standards mailing list
> Info: https://mail.jabber.org/mailman/listinfo/standards
> Unsubscribe: Standards-unsubscribe at xmpp.org
> _______________________________________________


More information about the Standards mailing list