[Standards] JID Escaping
elmex at x-paste.de
Mon Jul 23 02:47:08 UTC 2007
On Sat, Jul 21, 2007 at 08:17:19PM -0600, Peter Saint-Andre wrote:
> Robin Redeker wrote:
> > On Sat, Jul 21, 2007 at 09:20:27AM +0200, Mats Bengtsson wrote:
> >>> I think the whole XEP should be renamed to something like:
> >>> XEP-0106 - JID Mapping for Gateways
> >> This would be better. But it breaks the generic usage of JIDs for both users
> >> and gateways. It will create a lot of trouble.
> > The XEP seems to already create a lot of trouble. Just remind me to
> > register '\20stpeter at jabber.org' when every client unescapes JIDs ;-)
> No problem. The spec says:
> "The character sequence \20 MUST NOT be the first or last character of
> an escaped node identifier."
> But of course you can violate the spec if desired. ;-)
I don't violate the RFC here. I violate some optional extension.
The XEP-0106 has to exclude the JIDs which start or end with '\20' in the
nodepart from the escaping AND unescaping transformations.
At the moment the paragraph says that it MUST NOT be first or last
in the node part, but it doesn't say WHAT to do when this perfectly
fine JID arrives from the line. Should the JID not be unescaped at all?
Should only the parts after and before '\20' be unescaped?
Should the client close the connection?
Do I miss something in the XEP? (If I do so please ignore the rest of
Please also note the nice, but maybe not so important collision that
here happens when the client just doesn't unescape:
unescape ("\5c20foobar\5c20") => "\20foobar\20"
unescape ("\20foobar\20") => "\20foobar\20"
This is of course not really an important JID, and who cares about a few
optical collisions in clients which confuse the user. And these only happens
once someone else decides to put '\20' at the beginning or end
of his name and why would someone do that?
Hey, we could add security notes to all clients which tell the user:
"Never attach '\20' to the beginning or end of your name, it is unsafe!"
The U.S. Army will love this! (One might think of a case where they actually
name their units by enumerating them with a \ in the end:
Unescaped: Escaped: Unescaped:
"Tank\1" "Tank\5c1" "Tank\1"
"Tank\20" "Tank\20" "Tank\20"
"Tank\22" "Tank\5c22" "Tank\22"
"Tank\5c20" "Tank\20" ... oooops
Ah... never... why would they do that... :-)
I propose to rename the XEP to make clear that this escaping/unescaping should
only happen in very rare cases (only at gateways or heavily specialized client
frontends). And that the terms 'escaping' and 'unescaping' are replaced by
'mapping' and 'unmapping', because thats what is happening here.
More information about the Standards