[Standards] What is the size limit of node and item ids in XEP-0060: Publish-Subscribe?

Jonas Wielicki jonas at wielicki.name
Tue Mar 6 08:02:48 UTC 2018

Hi Peter,

Thank you very much for the clarification, comments inline.

On Dienstag, 6. März 2018 02:59:04 CET Peter Saint-Andre wrote:
> On 3/5/18 12:17 AM, Jonas Wielicki wrote:
> > On Sonntag, 4. März 2018 19:42:39 CET Peter Saint-Andre wrote:
> >> On 3/4/18 10:54 AM, Jonas Wielicki wrote:
> >>> On Sonntag, 4. März 2018 17:02:07 CET Peter Saint-Andre wrote:
> >>>> If we want to specify this, I would recommend the UsernameCaseMapped
> >>>> profile defined in RFC 8265.
> >>>> 
> >>>> However, there's a twist: if a node ID can be a full JID, then do we
> >>>> want to apply the normal rules of RFC 7622 to all the JID parts,
> >>>> instead
> >>>> of one uniform profile such as UsernameCaseMapped to the entire node
> >>>> ID?
> >>>> For instance, the resourcepart of a JID is allowed to contain a much
> >>>> wider range of Unicode characters than is allowed by the
> >>>> UsernameCaseMapped profile of the PRECIS IdentifierClass (which we use
> >>>> for the localpart).
> >>>> 
> >>>> Given that a node ID can be used for authorization decisions, I think
> >>>> it's better to be conservative in what we accept (specifically, not
> >>>> allow the wider range of characters in a resourcepart because
> >>>> developers, and attackers, could get too "creative").
> >>> 
> >>> I would argue that adding those restrictions / any kind of string
> >>> prepping
> >>> to XEP-0060 or XEP-0030 nodes is (a) too late and (b) ambiguous at
> >>> least,
> >>> as you mentioned (depending on the data).
> >> 
> >> I would argue that not specifying normalization rules is a security hole
> >> (e.g., allowing an attacker to gain unauthorized access to a node). Just
> >> because we should've done this years ago doesn't mean we can fix it now.
> > 
> > Hm, okay, I don’t seem to understand the attack vector. Could you spell it
> > out more clearly to me?
> Here's a true, non-XMPP example: I have the account stpeter at gmail.com.
> However, Google ignores "." in the localpart. Therefore I receive some
> email messages intended for st.peter at gmail.com. I could probably reset
> passwords (via email-based authentication) and take over other accounts
> associated with st.peter at gmail.com.
> Similarly, let's say you create a node "foo2" at pubsub.example.com. If
> I know that this service decomposes superscript characters to their
> compatibility equivalents, I could create a node "foo²" (the last
> character is U+00B2 = SUPERSCRIPT TWO) and the service would consider it
> to be the same as "foo2". Now I can publish notifications to your node
> without ever trying to take over your account - I just use my "foo²" node.

Okay, that all makes sense, but it seems to me that this is due to the 
*presence* of a normalization, not the absence. That’s where my confusion came 
from. I think the absence of a normalization (or specifying that absence) is 
not going to do us harm. That is what I was trying to say when I said that 
"I’d also argue that nodes aren’t shown or typed into a field by users 
normally, so I would not worry about that kind of normalization here.": Since 
users aren’t confronted with them, lookalikes etc. should not be an issue and 
do not need to be normalized.

If we’re going to specify that "node names etc. need to be taken as-is and 
compared codepoint-by-codepoint [I can’t look up the name of that collation 
right now] and must not be normalized in any way by the service", that makes 
sense to me; I think most services, if not all, already operate this way.

Otherwise, I think we’ll have to think hard about the implications of 
introducing a normalization/preparation method this far into deployment and 
how to handle unnormalized input [1]. XEP-0030 is Final and used ~everywhere, 
XEP-0060 is Draft and a key dependency to a few modern features (via PEP). 
Having the ecosystem move from "no preparation" to "some preparation" feels 
like it’s bound to introduce exactly the type of bugs you were talking about.

Add to that the trickiness if we want to use JIDs as node names, I’d argue 
that a "don’t touch this" directive to the server makes sense. If a protocol 
has specific requirements for node names specifically in PubSub, I think it 
could still specify that.

Does this make sense?

kind regards,

   [1]: Given the lack of even resourceprep validation in current servers,
        I’d also not put my money on "servers will validate and reject any
        invalid node names".
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.jabber.org/pipermail/standards/attachments/20180306/f889918e/attachment.sig>

More information about the Standards mailing list