[Standards] What is the size limit of node and item ids in XEP-0060: Publish-Subscribe?

Peter Saint-Andre stpeter at stpeter.im
Wed Mar 7 16:20:37 UTC 2018

On 3/6/18 1:02 AM, Jonas Wielicki wrote:
> Hi Peter,
> Thank you very much for the clarification, comments inline.
> On Dienstag, 6. März 2018 02:59:04 CET Peter Saint-Andre wrote:
>> On 3/5/18 12:17 AM, Jonas Wielicki wrote:
>>> On Sonntag, 4. März 2018 19:42:39 CET Peter Saint-Andre wrote:
>>>> On 3/4/18 10:54 AM, Jonas Wielicki wrote:
>>>>> On Sonntag, 4. März 2018 17:02:07 CET Peter Saint-Andre wrote:
>>>>>> If we want to specify this, I would recommend the UsernameCaseMapped
>>>>>> profile defined in RFC 8265.
>>>>>> However, there's a twist: if a node ID can be a full JID, then do we
>>>>>> want to apply the normal rules of RFC 7622 to all the JID parts,
>>>>>> instead
>>>>>> of one uniform profile such as UsernameCaseMapped to the entire node
>>>>>> ID?
>>>>>> For instance, the resourcepart of a JID is allowed to contain a much
>>>>>> wider range of Unicode characters than is allowed by the
>>>>>> UsernameCaseMapped profile of the PRECIS IdentifierClass (which we use
>>>>>> for the localpart).
>>>>>> Given that a node ID can be used for authorization decisions, I think
>>>>>> it's better to be conservative in what we accept (specifically, not
>>>>>> allow the wider range of characters in a resourcepart because
>>>>>> developers, and attackers, could get too "creative").
>>>>> I would argue that adding those restrictions / any kind of string
>>>>> prepping
>>>>> to XEP-0060 or XEP-0030 nodes is (a) too late and (b) ambiguous at
>>>>> least,
>>>>> as you mentioned (depending on the data).
>>>> I would argue that not specifying normalization rules is a security hole
>>>> (e.g., allowing an attacker to gain unauthorized access to a node). Just
>>>> because we should've done this years ago doesn't mean we can fix it now.
>>> Hm, okay, I don’t seem to understand the attack vector. Could you spell it
>>> out more clearly to me?
>> Here's a true, non-XMPP example: I have the account stpeter at gmail.com.
>> However, Google ignores "." in the localpart. Therefore I receive some
>> email messages intended for st.peter at gmail.com. I could probably reset
>> passwords (via email-based authentication) and take over other accounts
>> associated with st.peter at gmail.com.
>> Similarly, let's say you create a node "foo2" at pubsub.example.com. If
>> I know that this service decomposes superscript characters to their
>> compatibility equivalents, I could create a node "foo²" (the last
>> character is U+00B2 = SUPERSCRIPT TWO) and the service would consider it
>> to be the same as "foo2". Now I can publish notifications to your node
>> without ever trying to take over your account - I just use my "foo²" node.
> Okay, that all makes sense, but it seems to me that this is due to the 
> *presence* of a normalization, not the absence. 

Actually, incomplete or incorrect normalization.

> That’s where my confusion came 
> from. I think the absence of a normalization (or specifying that absence) is 
> not going to do us harm. 

Never assume that harm can't happen when computers are involved. :-)
Especially when internationalized characters are used. If we said that a
node could only use characters from the ASCII range then we'd be safe,
but that's not the case - people want to use JIDs as nodes, which means
we're inheriting everything from internationalized domain names (please
read RFC 5890), internationalized usernames (please read RFC 7613), and
internationalized "free-form" strings (please read RFC 7613 again), and
their combination in XMPP (please read RFC 7622). Handling all of those
strings correctly requires normalization of some kind, end of story.

> That is what I was trying to say when I said that 
> "I’d also argue that nodes aren’t shown or typed into a field by users 
> normally, so I would not worry about that kind of normalization here.": Since 
> users aren’t confronted with them, lookalikes etc. should not be an issue and 
> do not need to be normalized.

This is not just about user-facing "confusable characters", but
machine-generated and machine-processed characters as well. And in any
case do you think that a pubsub application will *never* show the node
name to an end user? These things inevitably leak out to userland (e.g.,
for a user to manage subscriptions, for a node owner to manage users, etc.).

> If we’re going to specify that "node names etc. need to be taken as-is and 
> compared codepoint-by-codepoint [I can’t look up the name of that collation 
> right now] and must not be normalized in any way by the service", that makes 
> sense to me; 

There's your problem: you think this internationalization stuff makes
sense. :-) Abandon hope, all ye who enter here! If I had more time, I'd
write a book entitled "Internationalization: A Guide for the Perplexed".

Comparing two strings for an octet-for-octet match is the last step, but
if you don't properly enforce various rules before then (including
normalization), bad things will happen. Especially if we're allowing
things like JIDs to be nodes - or, even worse, any Unicode code point
(how do we handle combining characters, zero-width spaces, and all the
other madness?). Authorization decisions will be wrong, etc.

> I think most services, if not all, already operate this way.

What *exactly* are services doing?

> Otherwise, I think we’ll have to think hard about the implications of 
> introducing a normalization/preparation method this far into deployment and 
> how to handle unnormalized input [1]. XEP-0030 is Final and used ~everywhere, 
> XEP-0060 is Draft and a key dependency to a few modern features (via PEP). 
> Having the ecosystem move from "no preparation" to "some preparation" feels 
> like it’s bound to introduce exactly the type of bugs you were talking about.

Correct handling of internationalized characters feels a lot safer to me
than incorrect handling.

> Add to that the trickiness if we want to use JIDs as node names, I’d argue 
> that a "don’t touch this" directive to the server makes sense. If a protocol 
> has specific requirements for node names specifically in PubSub, I think it 
> could still specify that.
> Does this make sense?

See above on making sense.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 873 bytes
Desc: OpenPGP digital signature
URL: <http://mail.jabber.org/pipermail/standards/attachments/20180307/11c96f42/attachment.sig>

More information about the Standards mailing list