[Standards] JID Escaping

Robin Redeker elmex at x-paste.de
Fri Jul 20 10:42:01 UTC 2007


I've been reading through the JID escaping once again.  The escaping collisions
are gone (with the \5c exception being gone) comparing to the last time I've
read it.

I think it should be clarified that the JID escaping should _only_ be used in
gateways which want to map outside-strings to a JID node. The reason why I come
to this conclusion is that it's not practical for regular IM clients to
unescape a JID, see below:

It should only be used in a E-Mail client that sends and retrieves mails via
XMPP. All other clients will run into a name-spoofing problem (see below).

In section '2. Requirements' is said:

   It MUST NOT be possible for clients to use this escaping mechanism to avoid
   the goal of stringprep; namely, that JIDs that look alike should have same
   character representation after being processed by stringprep.

The whole purpose of the mechanism described in XEP-0106 is to _avoid_ the goal
of stringprep (to be more exact, it avoids the goal of the nodeprep profile of

I understand that paragraph like the following: The JID escaping should not be
used to allow users to enter node parts for the JID which contain invalid
characters (like &, @, ...). It's only thought as a mapping between eg. e-mail
addresses and JIDs for gateways.

If that is not true and it actually is for using invalid characters in the node
part of a JID, and clients unescape _every_ JID they see we will run into
problems (see below).

For escaping following rule has to be followed:

   * Note: The character sequence \20 MUST NOT be the first or last character
      of an escaped node identifier.

This can only be seen as advice to the client authors not to allow spaces
at the beginning and ending of names.

But please observe that JIDs like this are still valid:

   \20elmex at jabber.org

If now a client comes and unescapes that JID it will collide (visually) with
'elmex at jabber.org'.  There is no rule that unescaping such JIDs is not allowed,
and such a rule would make no sense as it would break the display of perfectly
fine JIDs.

This is the main reason why escaping end unescaping MUST only be done for gateway
applications, and there on the side of the gateway and the client.

Unescaping should NOT apply for regular XMPP messaging in any form. When receiving
a message from a gateway the client could check of course "is this i a email gateway?"
and then perform unescaping of the node part and display the source email address
the message came from.

Regular clients, which don't implement any gateway specialcases, should NOT handle
escaping at all.

Please also note that the example '5.1 Jabber Identifiers' in the XEP
is misleading:

#       User Input                      Escaped JID                             Client Display
1       space cadet at example.com         space\20cadet at example.com               space cadet at example.com
2       call me "ishmael"@example.com   call\20me\20\22ishmael\22 at example.com   call me "ishmael"@example.com

A user CAN'T input a broken JID, the client can't parse broken JIDs like that
or should at least scream out loud if a user tries to enter such a JID.
For escaping the user should get a form like this:

   Node:     space cadet
   Domain:   example.com

The example is misleading because client author could think that they
should implement a heuristic to parse broken JIDs and escape them if they
are not valid JIDs.

I think the whole XEP should be renamed to something like:

   XEP-0106 - JID Mapping for Gateways

Maybe I got everything wrong, but this is the only way I am able to
make sense of that XEP. A true escaping mechanism must be also understood
by the server. And servers then have to unescape the JIDs to compare them.


More information about the Standards mailing list