[Standards] JID Escaping

Mridul Mridul.Muralidharan at Sun.COM
Wed Jul 25 11:43:54 UTC 2007

Robin Redeker wrote:
> On Wed, Jul 25, 2007 at 03:22:04AM +0530, Mridul Muralidharan wrote:
>> Hi Robin,
>> You should analyze as to who will actually need to encode or decode nodes.
>> 1) Gateway's.
> I know that JID escaping is for gateways, and it makes sense to define a
> mapping. But not for ordinary 'I want a JID with an @ in my
> name!'-user-usage.

Unfortunately, there are a lot of deployments where '@' figures out in
the uid of the user. That was the initial motivation for which we added
support for this xep into our api, client and server.


>> For the purpose of mapping foreign uid's to xmpp nodes.
>> In this case, ONLY the gateway, will need to do encode/decode.
>> All other entities will treat is as a JID - without any transformation 
>> to node/domain/resource.
>> It will encode a foreign id to node so that it becomes a valid xmpp JID.
>> It will decode node from an xmpp 'to' for obtaining the destination in 
>> the foreign network.
> Makes complete sense and seems to me as the only sane application.
>> 2) Client's and server's.
>> When the node is same as uid, and the uid of the user contains 
>> prohibited characters. In this case, when client authorizes to server, 
>> it would encode the uid to obtain the node - and then form the JID to be 
>> used (Only the client encodes it as part of auth, and once authorization 
>> is over, encoding of the JID is opaque to the client).
>> At the server, it would decode the node to obtain the actual uid and use 
>> that (backend store update, etc) - note that for purpose of routing, the 
>> server would treat the jid as-is, only decode it to identify the backend 
>> entry (database, ldap, file, etc) to which it has to persist/query data.
>> (This is one way in which you would model things - servers can ofcourse 
>> come up with other alternate ways to do this).
>> Other than for these two, I cant really envision any other important 
>> usecase which requires any entity (client, gateway, component or server) 
>> to encode or decode - the above are directly related to and required for 
>> xmpp routing.
>> Hopefully, from this point of view, the xep might make more sense.
> I completly agree.
>> Robin Redeker wrote:
>>> On Tue, Jul 24, 2007 at 10:10:45PM +0530, Mridul Muralidharan wrote:
>>>> Robin Redeker wrote:
>>>>> On Sat, Jul 21, 2007 at 08:17:19PM -0600, Peter Saint-Andre wrote:
>>>>>> Robin Redeker wrote:
>>>>>>> On Sat, Jul 21, 2007 at 09:20:27AM +0200, Mats Bengtsson wrote:
>>>>>>>>> I think the whole XEP should be renamed to something like:
>>>>>>>>> XEP-0106 - JID Mapping for Gateways
>>>>>>>> This would be better. But it breaks the generic usage of JIDs for 
>>>>>>>> both users
>>>>>>>> and gateways. It will create a lot of trouble.
>>>>>>> The XEP seems to already create a lot of trouble. Just remind me to
>>>>>>> register '\20stpeter at jabber.org' when every client unescapes JIDs ;-)
>>>>>> No problem. The spec says:
>>>>>> "The character sequence \20 MUST NOT be the first or last character of
>>>>>> an escaped node identifier."
>>>>>> But of course you can violate the spec if desired. ;-)
>>>>> I don't violate the RFC here. I violate some optional extension.
>>>> You cant specify the characters mentioned in the xep in the node without 
>>>> escaping them - so trying to do so otherwise is a violation of the rfc.
>>>> Ofcourse you can come up with a custom encoding (as we used to have), 
>>>> but this does not allow any arbitrary client/server combination to 
>>>> interoperate.
>>> Of course I can? With an old client? And it's even allowed and sane and
>>> valid? Or do I miss something here?
>> You cannot use prohibited characters - whether they are old clients or 
>> bad clients.
>> The first conforming xmpp recipient (typically the server) would kick 
>> your stream out with a stream error.
> But I am right that '\20fooobar at jabber.org' is a completly valid JID.
> And that every server should accept it?
>>>>> The XEP-0106 has to exclude the JIDs which start or end with '\20' in the
>>>>> nodepart from the escaping AND unescaping transformations.
>>>> This is already present.
>>> Great, JIDs with '\20' in the beginning and end have been deprecated then?
>>> Shouldn't the RFC be changed then?
>> No, this is not related to the rfc.
>> The rfc specifies list of prohibited characters - and they MUST NOT be 
>> in the node.
>> The xep allows a way of encoding these characters into the JID because 
>> of requirements like what I mentioned above.
> Those two special cases you described there are not really pointed
> out in the XEP. The XEP makes it sounds as if the escaping is for the
> usual IM user that wants a ' in his nodepart of the JID.
>    XEP-0106: JID Escaping
>    This document specifies a mechanism that enables the display of Jabber
>    Identifiers (JIDs) with characters disallowed by the Nodeprep profile of
>    stringprep.
> The XEP should be renamed to not make it sound like we want enable our
> regular IM user to expand the allowed characterset of his name in the
> nodepart of the JID.
>>>>> At the moment the paragraph says that it MUST NOT be first or last
>>>>> in the node part, but it doesn't say WHAT to do when this perfectly
>>>>> fine JID arrives from the line. Should the JID not be unescaped at all?
>>>>> Should only the parts after and before '\20' be unescaped?
>>>>> Should the client close the connection?
>>>> It depends on who is doing what.
>>>> If the recipient is expected to 'parse' the node, then it would return 
>>>> an error, else it would pass it on (directed packets through server for 
>>>> example).
>>> To who will it return an error? Will it throw a pop-up at the user
>>> "Someone with an invalid JID sent you something!"?
>>> Or back to the sender? Why should it do that for a perfectly fine JID?
>> Most entities in the xmpp network would route/process stanza's 
>> irrespective of whether it is encoded node or not - and they are 
>> agnostic to it.
>> Only those entities which need to be aware of it to decode it (relevant 
>> gateway's and server's) would typically check the semantics of the 
>> decoded node - and potentially return an error back. I would assume it 
>> would be a stanza error and not a stream error - though the authors can 
>> clarify it (I dont have access to xmpp.org right now).
> Yes, that sounds sane to me in the two usecases you specified in the
> beginning of the mail.
> [.snip.
>>>> These sort of problems are common to any form of encoding - example 
>>>> urlencoding.
>>>> It is obviously expected that the client/gateway would do the right thing.
>>> Yea, usually everyone does unescape and compare the unescaped stuff,
>>> then no collisions happen (and urlencoding does not have this problem
>>> because it doesn't compare escaped URLs afaik).
>> I am not sure why anyone would unescape.
>> Most entities should treat the JID as a routing construct and not try to 
>> guess information out of it.
> The XEP makes it sound like every client should implement the XEP,
> because it's so 'generally useful'. Eg. if psi or gaim.. err. pidgin
> or tkabber now decide to support the XEP in the context of ordinary
> IM usage, they will want to unescape it and display it to the user.
> 'JID Escaping' sounds like it's generally useful. Webbrowsers implement
> URL escaping, so why should your XMPP client doesn't want to implement
> JID escaping? If the XEP was named 'JID Mapping for Special Occasions'
> developers would never come to the conclusion they need to implement it.
> Maybe I'm alone with this confusion...
>    1. Introduction
>    ...
>    The escaped JID is unescaped only for presentation to a human user
>    (typically by an XMPP client) or ...
> I agree that the XEP doesn't say that everyone should implement it,
> but it also doesn't say that only clients that HAVE TO KNOW about
> escaping (eg. when they want to map uids like you said or want to be
> some special 'email via XMPP' client) should implement it.
> But for ordinary IM usage clients should not use it... err.. MUST NOT!
> By ordinary IM usage I mean something like this:
>    Aunt tillie shoots up pidgin, enters her name "Aunt'Tille_is_only\20"
>    and server "jabber.org" and then clicks register and wants to write
>    with 'stpeter at jabber.org' about his lost socks in the washing
>    machine.
> I mean in those cases, there MUST NOT be performed escaping and our dear
> aunt has to remove the ' from her name.
> [.snip.]
>> I think I address this above.
> Yep, you did I think.
> Robin

More information about the Standards mailing list