[Standards] JID Escaping

Robin Redeker elmex at x-paste.de
Wed Jul 25 09:31:12 UTC 2007

On Wed, Jul 25, 2007 at 03:22:04AM +0530, Mridul Muralidharan wrote:
> Hi Robin,
> You should analyze as to who will actually need to encode or decode nodes.

> 1) Gateway's.

I know that JID escaping is for gateways, and it makes sense to define a
mapping. But not for ordinary 'I want a JID with an @ in my

> For the purpose of mapping foreign uid's to xmpp nodes.
> In this case, ONLY the gateway, will need to do encode/decode.
> All other entities will treat is as a JID - without any transformation 
> to node/domain/resource.
> It will encode a foreign id to node so that it becomes a valid xmpp JID.
> It will decode node from an xmpp 'to' for obtaining the destination in 
> the foreign network.

Makes complete sense and seems to me as the only sane application.

> 2) Client's and server's.
> When the node is same as uid, and the uid of the user contains 
> prohibited characters. In this case, when client authorizes to server, 
> it would encode the uid to obtain the node - and then form the JID to be 
> used (Only the client encodes it as part of auth, and once authorization 
> is over, encoding of the JID is opaque to the client).
> At the server, it would decode the node to obtain the actual uid and use 
> that (backend store update, etc) - note that for purpose of routing, the 
> server would treat the jid as-is, only decode it to identify the backend 
> entry (database, ldap, file, etc) to which it has to persist/query data.
> (This is one way in which you would model things - servers can ofcourse 
> come up with other alternate ways to do this).
> Other than for these two, I cant really envision any other important 
> usecase which requires any entity (client, gateway, component or server) 
> to encode or decode - the above are directly related to and required for 
> xmpp routing.
> Hopefully, from this point of view, the xep might make more sense.

I completly agree.

> Robin Redeker wrote:
> >On Tue, Jul 24, 2007 at 10:10:45PM +0530, Mridul Muralidharan wrote:
> >>Robin Redeker wrote:
> >>>On Sat, Jul 21, 2007 at 08:17:19PM -0600, Peter Saint-Andre wrote:
> >>>>Robin Redeker wrote:
> >>>>>On Sat, Jul 21, 2007 at 09:20:27AM +0200, Mats Bengtsson wrote:
> >>>>>>>I think the whole XEP should be renamed to something like:
> >>>>>>>
> >>>>>>> XEP-0106 - JID Mapping for Gateways
> >>>>>>This would be better. But it breaks the generic usage of JIDs for 
> >>>>>>both users
> >>>>>>and gateways. It will create a lot of trouble.
> >>>>>>
> >>>>>The XEP seems to already create a lot of trouble. Just remind me to
> >>>>>register '\20stpeter at jabber.org' when every client unescapes JIDs ;-)
> >>>>No problem. The spec says:
> >>>>
> >>>>"The character sequence \20 MUST NOT be the first or last character of
> >>>>an escaped node identifier."
> >>>>
> >>>>But of course you can violate the spec if desired. ;-)
> >>>I don't violate the RFC here. I violate some optional extension.
> >>You cant specify the characters mentioned in the xep in the node without 
> >>escaping them - so trying to do so otherwise is a violation of the rfc.
> >>Ofcourse you can come up with a custom encoding (as we used to have), 
> >>but this does not allow any arbitrary client/server combination to 
> >>interoperate.
> >
> >Of course I can? With an old client? And it's even allowed and sane and
> >valid? Or do I miss something here?
> You cannot use prohibited characters - whether they are old clients or 
> bad clients.
> The first conforming xmpp recipient (typically the server) would kick 
> your stream out with a stream error.

But I am right that '\20fooobar at jabber.org' is a completly valid JID.
And that every server should accept it?

> >
> >>>The XEP-0106 has to exclude the JIDs which start or end with '\20' in the
> >>>nodepart from the escaping AND unescaping transformations.
> >>This is already present.
> >
> >Great, JIDs with '\20' in the beginning and end have been deprecated then?
> >Shouldn't the RFC be changed then?
> No, this is not related to the rfc.
> The rfc specifies list of prohibited characters - and they MUST NOT be 
> in the node.
> The xep allows a way of encoding these characters into the JID because 
> of requirements like what I mentioned above.

Those two special cases you described there are not really pointed
out in the XEP. The XEP makes it sounds as if the escaping is for the
usual IM user that wants a ' in his nodepart of the JID.

   XEP-0106: JID Escaping

   This document specifies a mechanism that enables the display of Jabber
   Identifiers (JIDs) with characters disallowed by the Nodeprep profile of

The XEP should be renamed to not make it sound like we want enable our
regular IM user to expand the allowed characterset of his name in the
nodepart of the JID.

> >>>At the moment the paragraph says that it MUST NOT be first or last
> >>>in the node part, but it doesn't say WHAT to do when this perfectly
> >>>fine JID arrives from the line. Should the JID not be unescaped at all?
> >>>Should only the parts after and before '\20' be unescaped?
> >>>Should the client close the connection?
> >>It depends on who is doing what.
> >>If the recipient is expected to 'parse' the node, then it would return 
> >>an error, else it would pass it on (directed packets through server for 
> >>example).
> >
> >To who will it return an error? Will it throw a pop-up at the user
> >"Someone with an invalid JID sent you something!"?
> >Or back to the sender? Why should it do that for a perfectly fine JID?
> Most entities in the xmpp network would route/process stanza's 
> irrespective of whether it is encoded node or not - and they are 
> agnostic to it.
> Only those entities which need to be aware of it to decode it (relevant 
> gateway's and server's) would typically check the semantics of the 
> decoded node - and potentially return an error back. I would assume it 
> would be a stanza error and not a stream error - though the authors can 
> clarify it (I dont have access to xmpp.org right now).

Yes, that sounds sane to me in the two usecases you specified in the
beginning of the mail.

> >>These sort of problems are common to any form of encoding - example 
> >>urlencoding.
> >>It is obviously expected that the client/gateway would do the right thing.
> >
> >Yea, usually everyone does unescape and compare the unescaped stuff,
> >then no collisions happen (and urlencoding does not have this problem
> >because it doesn't compare escaped URLs afaik).
> I am not sure why anyone would unescape.
> Most entities should treat the JID as a routing construct and not try to 
> guess information out of it.

The XEP makes it sound like every client should implement the XEP,
because it's so 'generally useful'. Eg. if psi or gaim.. err. pidgin
or tkabber now decide to support the XEP in the context of ordinary
IM usage, they will want to unescape it and display it to the user.

'JID Escaping' sounds like it's generally useful. Webbrowsers implement
URL escaping, so why should your XMPP client doesn't want to implement
JID escaping? If the XEP was named 'JID Mapping for Special Occasions'
developers would never come to the conclusion they need to implement it.

Maybe I'm alone with this confusion...

   1. Introduction

   The escaped JID is unescaped only for presentation to a human user
   (typically by an XMPP client) or ...

I agree that the XEP doesn't say that everyone should implement it,
but it also doesn't say that only clients that HAVE TO KNOW about
escaping (eg. when they want to map uids like you said or want to be
some special 'email via XMPP' client) should implement it.

But for ordinary IM usage clients should not use it... err.. MUST NOT!

By ordinary IM usage I mean something like this:

   Aunt tillie shoots up pidgin, enters her name "Aunt'Tille_is_only\20"
   and server "jabber.org" and then clicks register and wants to write
   with 'stpeter at jabber.org' about his lost socks in the washing

I mean in those cases, there MUST NOT be performed escaping and our dear
aunt has to remove the ' from her name.

> I think I address this above.

Yep, you did I think.


More information about the Standards mailing list