[Standards] JID Escaping

Mridul Muralidharan mridul at sun.com
Tue Jul 24 21:52:04 UTC 2007

Hi Robin,

You should analyze as to who will actually need to encode or decode nodes.

1) Gateway's.
For the purpose of mapping foreign uid's to xmpp nodes.
In this case, ONLY the gateway, will need to do encode/decode.
All other entities will treat is as a JID - without any transformation 
to node/domain/resource.
It will encode a foreign id to node so that it becomes a valid xmpp JID.
It will decode node from an xmpp 'to' for obtaining the destination in 
the foreign network.

2) Client's and server's.
When the node is same as uid, and the uid of the user contains 
prohibited characters. In this case, when client authorizes to server, 
it would encode the uid to obtain the node - and then form the JID to be 
used (Only the client encodes it as part of auth, and once authorization 
is over, encoding of the JID is opaque to the client).

At the server, it would decode the node to obtain the actual uid and use 
that (backend store update, etc) - note that for purpose of routing, the 
server would treat the jid as-is, only decode it to identify the backend 
entry (database, ldap, file, etc) to which it has to persist/query data.
(This is one way in which you would model things - servers can ofcourse 
come up with other alternate ways to do this).

Other than for these two, I cant really envision any other important 
usecase which requires any entity (client, gateway, component or server) 
to encode or decode - the above are directly related to and required for 
xmpp routing.

Hopefully, from this point of view, the xep might make more sense.

More inline ...

Robin Redeker wrote:
> On Tue, Jul 24, 2007 at 10:10:45PM +0530, Mridul Muralidharan wrote:
>> Robin Redeker wrote:
>>> On Sat, Jul 21, 2007 at 08:17:19PM -0600, Peter Saint-Andre wrote:
>>>> Robin Redeker wrote:
>>>>> On Sat, Jul 21, 2007 at 09:20:27AM +0200, Mats Bengtsson wrote:
>>>>>>> I think the whole XEP should be renamed to something like:
>>>>>>>  XEP-0106 - JID Mapping for Gateways
>>>>>> This would be better. But it breaks the generic usage of JIDs for both 
>>>>>> users
>>>>>> and gateways. It will create a lot of trouble.
>>>>> The XEP seems to already create a lot of trouble. Just remind me to
>>>>> register '\20stpeter at jabber.org' when every client unescapes JIDs ;-)
>>>> No problem. The spec says:
>>>> "The character sequence \20 MUST NOT be the first or last character of
>>>> an escaped node identifier."
>>>> But of course you can violate the spec if desired. ;-)
>>> I don't violate the RFC here. I violate some optional extension.
>> You cant specify the characters mentioned in the xep in the node without 
>> escaping them - so trying to do so otherwise is a violation of the rfc.
>> Ofcourse you can come up with a custom encoding (as we used to have), 
>> but this does not allow any arbitrary client/server combination to 
>> interoperate.
> Of course I can? With an old client? And it's even allowed and sane and
> valid? Or do I miss something here?

You cannot use prohibited characters - whether they are old clients or 
bad clients.
The first conforming xmpp recipient (typically the server) would kick 
your stream out with a stream error.

>>> The XEP-0106 has to exclude the JIDs which start or end with '\20' in the
>>> nodepart from the escaping AND unescaping transformations.
>> This is already present.
> Great, JIDs with '\20' in the beginning and end have been deprecated then?
> Shouldn't the RFC be changed then?

No, this is not related to the rfc.
The rfc specifies list of prohibited characters - and they MUST NOT be 
in the node.
The xep allows a way of encoding these characters into the JID because 
of requirements like what I mentioned above.

Note that the xep just standardizes one way of how to encode, you can 
always come up with any other form of encoding which makes the resulting 
jid conforment with the rfc (urlencoding, etc).

>>> At the moment the paragraph says that it MUST NOT be first or last
>>> in the node part, but it doesn't say WHAT to do when this perfectly
>>> fine JID arrives from the line. Should the JID not be unescaped at all?
>>> Should only the parts after and before '\20' be unescaped?
>>> Should the client close the connection?
>> It depends on who is doing what.
>> If the recipient is expected to 'parse' the node, then it would return 
>> an error, else it would pass it on (directed packets through server for 
>> example).
> To who will it return an error? Will it throw a pop-up at the user
> "Someone with an invalid JID sent you something!"?
> Or back to the sender? Why should it do that for a perfectly fine JID?

Most entities in the xmpp network would route/process stanza's 
irrespective of whether it is encoded node or not - and they are 
agnostic to it.
Only those entities which need to be aware of it to decode it (relevant 
gateway's and server's) would typically check the semantics of the 
decoded node - and potentially return an error back. I would assume it 
would be a stanza error and not a stream error - though the authors can 
clarify it (I dont have access to xmpp.org right now).

>>> Do I miss something in the XEP? (If I do so please ignore the rest of
>>> the mail.)
>>> Please also note the nice, but maybe not so important collision that
>>> here happens when the client just doesn't unescape:
>>>   unescape ("\5c20foobar\5c20") => "\20foobar\20"
>>>   unescape ("\20foobar\20")     => "\20foobar\20"
>>> This is of course not really an important JID, and who cares about a few
>>> optical collisions in clients which confuse the user. And these only 
>>> happens
>>> once someone else decides to put '\20' at the beginning or end
>>> of his name and why would someone do that?
>>> Hey, we could add security notes to all clients which tell the user:
>>>   "Never attach '\20' to the beginning or end of your name, it is unsafe!"
>>> The U.S. Army will love this! (One might think of a case where they 
>>> actually
>>> name their units by enumerating them with a \ in the end:
>>>   Unescaped:             Escaped:             Unescaped:
>>>   "Tank\1"               "Tank\5c1"           "Tank\1"
>>>   "Tank\20"              "Tank\20"            "Tank\20"
>>>   "Tank\22"              "Tank\5c22"          "Tank\22"
>>>                          "Tank\5c20"          "Tank\20" ... oooops
>>> Ah... never... why would they do that... :-)
>> These sort of problems are common to any form of encoding - example 
>> urlencoding.
>> It is obviously expected that the client/gateway would do the right thing.
> Yea, usually everyone does unescape and compare the unescaped stuff,
> then no collisions happen (and urlencoding does not have this problem
> because it doesn't compare escaped URLs afaik).

I am not sure why anyone would unescape.
Most entities should treat the JID as a routing construct and not try to 
guess information out of it.

> The problem here is that the server treats "Tank\20" and "Tank\5c20" as
> different nodes, and thats of course the completly right thing to do,
> because they are different.
> If you meant before that "Tank\20" should be rejected by the client,
> then I'm still wondering why something should reject a perfectly fine
> JID?
> Should I send a disco to the sending client and look whether it knows
> JID escaping and _after_ I know that perform the JID unescaping?
> And if that client sends me a JID with '\20' in the beginning or end
> should I send him a error back or blame my user that he has bad people
> speaking to him?
> Does that mean that people with a JID like "Tank\20 at nasa.gov" won't be
> able to send messages from their JID escaping enabled clients?
> Do we want to deprecate JIDs from the RFC which start and end with '\20'?

Please see above ... hopefully I have addressed this.

>>> I propose to rename the XEP to make clear that this escaping/unescaping 
>>> should
>>> only happen in very rare cases (only at gateways or heavily specialized 
>>> client
>>> frontends). And that the terms 'escaping' and 'unescaping' are replaced by
>>> 'mapping' and 'unmapping', because thats what is happening here.
>> Not really very specialized clients - even common clients will need it.
>> Example, if uid's used in the server are mailid's for example.
>> Then the client will need to escape the node for the bind.
> Then we have to solve these issues IMO.

I think I address this above.


> Robin

More information about the Standards mailing list