[Standards-JIG] LAST CALL: JEP-0106 (JID Escaping)

Matt Tucker matt at jivesoftware.com
Wed May 4 00:08:30 UTC 2005


Hey all,

Peter, thanks for the detailed summary of the debate. I still don't
quite understood why people care about '%' vs. '#'. The '#' character
seems perfectly fine. In any case, for those that need Java code for JID
escaping, here are a couple of methods. They aren't incredibly carefully
tested or optimized, but the code seems to work. The code can be
considered "public domain".

Regards,
Matt

---------------------------

    /**
     * Escapes the node portion of a JID according to "JID Escaping"
(JEP-0106).
     * Escaping replaces characters prohibited by node-prep with escape
sequences,
     * as follows:<p>
     *
     * <table border="1">
     * <tr><td><b>Unescaped Character</b></td><td><b>Encoded
Sequence</b></td></tr>
     * <tr><td><space></td><td>#20;</td></tr>
     * <tr><td>"</td><td>#22;</td></tr>
     * <tr><td>#</td><td>#23;</td></tr>
     * <tr><td>&</td><td>#26;</td></tr>
     * <tr><td>'</td><td>#27;</td></tr>
     * <tr><td>/</td><td>#2f;</td></tr>
     * <tr><td>:</td><td>#3a;</td></tr>
     * <tr><td><</td><td>#3c;</td></tr>
     * <tr><td>></td><td>#3e;</td></tr>
     * <tr><td>@</td><td>#40;</td></tr>
     * </table><p>
     *
     * This process is useful when the node comes from an external
source that doesn't
     * conform to nodeprep. For example, a username in LDAP may be "Joe
Smith". Because
     * the <space> character isn't a valid part of a node, the
username should
     * be escaped to "Joe#20;Smith" before being made into a JID (e.g.
"joe#20;smith at example.com"
     * after case-folding, etc. has been applied).<p>
     *
     * @param node the node.
     * @return the escaped version of the node.
     */
    public static String escapeNode(String node) {
        if (node == null) {
            return null;
        }
        StringBuilder buf = new StringBuilder(node.length() + 8);
        for (int i=0, n=node.length(); i<n; i++) {
            char c = node.charAt(i);
            switch (c) {
                case '"': buf.append("#22;"); break;
                case '#': buf.append("#23;"); break;
                case '&': buf.append("#26;"); break;
                case '\'': buf.append("#27;"); break;
                case '/': buf.append("#2f;"); break;
                case ':': buf.append("#3a;"); break;
                case '<': buf.append("#3c;"); break;
                case '>': buf.append("#3e;"); break;
                case '@': buf.append("#40;"); break;
                default: {
                    if (Character.isWhitespace(c)) {
                        buf.append("#20;");
                    }
                    else {
                        buf.append(c);
                    }
                }
            }
        }
        return buf.toString();
    }

    /**
     * Un-escapes the node portion of a JID according to "JID Escaping"
(JEP-0106).<p>
     * Escaping replaces characters prohibited by node-prep with escape
sequences,
     * as follows:<p>
     *
     * <table border="1">
     * <tr><td><b>Unescaped Character</b></td><td><b>Encoded
Sequence</b></td></tr>
     * <tr><td><space></td><td>#20;</td></tr>
     * <tr><td>"</td><td>#22;</td></tr>
     * <tr><td>#</td><td>#23;</td></tr>
     * <tr><td>&</td><td>#26;</td></tr>
     * <tr><td>'</td><td>#27;</td></tr>
     * <tr><td>/</td><td>#2f;</td></tr>
     * <tr><td>:</td><td>#3a;</td></tr>
     * <tr><td><</td><td>#3c;</td></tr>
     * <tr><td>></td><td>#3e;</td></tr>
     * <tr><td>@</td><td>#40;</td></tr>
     * </table><p>
     *
     * This process is useful when the node comes from an external
source that doesn't
     * conform to nodeprep. For example, a username in LDAP may be "Joe
Smith". Because
     * the <space> character isn't a valid part of a node, the
username should
     * be escaped to "Joe#20;Smith" before being made into a JID (e.g.
"joe#20;smith at example.com"
     * after case-folding, etc. has been applied).<p>
     *
     * @param node the escaped version of the node.
     * @return the un-escaped version of the node.
     */
    public static String unescapeNode(String node) {
        if (node == null) {
            return null;
        }
        char [] nodeChars = node.toCharArray();
        StringBuilder buf = new StringBuilder(nodeChars.length);
        for (int i=0, n=nodeChars.length; i<n; i++) {
            compare: {
                char c = node.charAt(i);
                if (c == '#' && i+3<n) {
                    char c2 = nodeChars[i+1];
                    char c3 = nodeChars[i+2];
                    char c4 = nodeChars[i+3];
                    if (c2 == '2' && c4 == ';') {
                        switch (c3) {
                            case '0': buf.append(' '); i+=3; break
compare;
                            case '2': buf.append('"'); i+=3; break
compare;
                            case '3': buf.append('#'); i+=3; break
compare;
                            case '6': buf.append('&'); i+=3; break
compare;
                            case '7': buf.append('\''); i+=3; break
compare;
                            case 'f': buf.append('/'); i+=3; break
compare;
                        }
                    }
                    else if (c2 == '3' && c4 == ';') {
                        switch (c3) {
                            case 'a': buf.append(':'); i+=3; break
compare;
                            case 'c': buf.append('<'); i+=3; break
compare;
                            case 'e': buf.append('>'); i+=3; break
compare;
                        }
                    }
                    else if (c2 == '4' && c4 == ';') {
                        if (c3 == '0') {
                            buf.append("@");
                            i+=3;
                            break compare;
                        }
                    }
                }
                buf.append(c);
            }
        }
        return buf.toString();
    } 

> -----Original Message-----
> From: standards-jig-bounces at jabber.org 
> [mailto:standards-jig-bounces at jabber.org] On Behalf Of Peter 
> Saint-Andre
> Sent: Tuesday, May 03, 2005 4:24 PM
> To: Jabber protocol discussion list
> Subject: Re: [Standards-JIG] LAST CALL: JEP-0106 (JID Escaping)
> 
> On Tue, May 03, 2005 at 06:19:58PM -0500, Peter Saint-Andre wrote:
> > On Thu, Apr 21, 2005 at 10:43:11PM +0100, Richard Dobson wrote:
> > > >Well, as pointed out in this morning's Jabber Council meeting, I 
> > > >was looking at the transformations only in one direction. It is 
> > > >perfectly valid to have domain names that start with 20, 22, 26, 
> > > >27, 2f, 3a, 3c, 3e, and 40. Consider the case of an MSN 
> user whose 
> > > >email address is up at 3am.com ... once transformed by an 
> MSN gateway, 
> > > >that person's JID might be:
> > > >
> > > >up%3am.com at msn.example.com
> > > >
> > > >However, the characters %3a are now ambiguous: do they 
> signify "@3a"
> > > >through an MSN gateway or ":" as decoded in JID 
> escaping? Thus an 
> > > >an application would have no programmatic way of distinguishing 
> > > >between the following interpretations of that JID:
> > > >
> > > >1. an entity whose decoded node identifier is "up at 3am.com"
> > > >
> > > >2. an entity whose decoded node identifier is "up:m.com"
> > > >
> > > >Ambiguity is bad because it breaks things. And one of 
> our cardinal 
> > > >rules is not to break things.
> > > >
> > > >Therefore the Council has decided to retain the #xx; escaping 
> > > >mechanism for the 9 code points (and only for the 9 code points) 
> > > >that are explicitly disallowed in the Nodeprep profile of 
> > > >stringprep. While this prevents conforming applications from 
> > > >re-using existing URI-processing libraries for the 
> purpose of JID 
> > > >escaping, the Council decided that that's slight hardship when 
> > > >special-casing the 9 code points in the node identifier 
> portion of 
> > > >JIDs, and to proceed with advancement of JEP-0106 as-is 
> (actually, 
> > > >with some slight wording changes that I am working on now).
> > > >
> > > >A transcript of the Council discussion is here:
> > > >
> > > 
> >http://jabber.org/muc-logs/council@conference.jabber.org/2005-04-21
> > > >.html
> > > >
> > > >Feedback is welcome as always.
> > > 
> > > I would have to argue that its an equally slight hardship 
> to alter 
> > > the existing MSN transports so we can just use the internet 
> > > standard, plus how many MSN users actually have addresses 
> that could 
> > > present a problem in any case?? Certainly on my MSN contact list 
> > > which is about 92 people 86 have @hotmail.com or @msn.com 
> addresses 
> > > and of the remaining ones none of them have domains that 
> could cause a problem.
> > > 
> > > Overall I truely fail to see the problem here from a real world 
> > > point of view and as far as I can see it seems an entirely 
> > > theoretical problem and thus shouldnt be holding us back 
> from doing things properly.
> 
> Oops, my message did not come through correctly (bad mutt!). 
> I meant to
> say:
> 
> ******
> 
> The question is: what is the proper thing to do?
> 
> Some feel that percent-encoding is the proper approach. That 
> is the approach used in transforming disallowed characters in 
> URLs/URIs.
> However, JIDs are not URIs, so while it might be nice to use 
> existing URI encoding rules for JIDs, that is by no means 
> necessary, and to assume that percent-encoding is the Right 
> Thing for JIDs is, I think, misguided. Maybe that's the right 
> approach, but we can't assume so.
> 
> The main problem space we care about for JID escaping is the 
> transformation of existing non-XMPP addresses into JIDs. And 
> here the most common problem is re-using existing email 
> addresses as JIDs. Now, RFC 2822 (and before that 822) 
> specifies that the following characters disallowed in JIDs 
> are allowed in email addresses: & ' /
> 
> Note that other characters are also allowed in email 
> addresses while not disallowed in JIDs, for example, the % character.
> 
> So some interesting email addresses could be, for example:
> 
> etcetera&c at example.com
> d'artagnan at example.com
> slash/.dot at example.com
> cr%zyguy66 at example.com
> 
> As mailto: URIs, those would be:
> 
> mailto:etcetera%26c at example.com
> mailto:d%27artagnan at example.com
> mailto:slash%2f.dot at example.com
> mailto:cr%25zyguy66 at example.com
> 
> As JIDs converted using percent-encoding, those would be:
> 
> etcetera%26c at example.com
> d%27artagnan at example.com
> slash%2f.dot at example.com
> cr%25zyguy66 at example.com
> 
> As JIDs converted using JEP-0106, those would be:
> 
> etcetera#26;c at example.com
> d#27;artagnan at example.com
> slash#2f;.dot at example.com
> cr%zyguy66 at example.com
> 
> (No need to transform % since it is allowed in XMPP node identifiers.)
> 
> Now, you might say, aha, this proves the point -- let's use 
> percent-encoding! Look, those JIDs are *different*!
> 
> Not so fast. Exactly why is it a good thing for the 
> special-cased transformations for the disallowed characters 
> (as defined in JEP-0106) to use "standard" percent-encoding? 
> The whole issue here is that these characters (code points) 
> are special-cased just for JIDs. To my mind, it's more 
> problematic to use percent-encoding here because we are 
> talking only about the 9 code points that are disallowed in 
> XMPP node identifiers. So you can't generally apply URI 
> escaping logic to an address that you want to transform into 
> a JID -- many of the characters you would transform using URI 
> rules MUST NOT be transformed when creating a JID. So now you 
> have to special-case your URI encoding on a 
> character-by-character basis, no? You can't transform the 
> entire input using URI rules and have a proper JID come out 
> the other side, instead you need to feed the URI encoder one 
> character at a time and have your standard algorithm convert 
> only SP " # & ' / : < > @. At that point it seems to me that 
> the great advantage of using standard URI rules is no longer 
> so wonderful, because you're still doing special-casing. So 
> what is the big deal about special-casing for those 9 code 
> points but converting them using #xx; rather than %xx? I 
> understand the desire for complete consistency and protocol 
> hygiene, I really do. But I don't see how that is going to 
> make JID escaping any easier for implementors in this 
> situation. It's a simple switch statement, for Pete's sake!
> 
> About the existing MSN gateways, yes, they would need to be 
> modified to handle percent-encoding. And so would existing 
> rosters! We have a lot of deployed code in production systems 
> that uses % to escape @ in MSN JIDs (which in itself is 
> ambiguous, since % is allowed in email addresses).
> To simply say "tough luck, time to upgrade" is not very 
> friendly. Why break things on the network if we don't have 
> to, all in the pursuit of full consistency with the URI specs 
> even though JIDs are not URIs? I just don't see the logic, 
> and neither did anyone else on the Council when we discussed 
> this in the April 21 meeting.
> 
> Furthermore, adding a service discovery feature for JID 
> escaping (and one is already included in JEP-0106, see 
> Section 4 of the JEP) is not going to solve the problem of 
> deployed gateways and all of their associated roster items 
> (believe me, as a server admin for jabber.org, I can tell you 
> that there are millions of such roster items out there).
> 
> In my judgement and in the judgement of the four Council 
> members who have voted on this JEP so far (Thomas Muldowney 
> has yet to vote), the desire for consistency with the URI 
> specs is simply not compelling enough to break existing 
> deployments or even risk doing so (e.g., by introducing 
> rather serious migration issues).
> 
> As noted, Thomas Muldowney has yet to vote on this JEP but 
> will probably do so on the Council list or in the next 
> Council meeting (May 12). Feel free to try to convince him to 
> vote -1 before then and to make your case further on this 
> list. A truly compelling argument might even convince Council 
> members who have already voted +1 to change their votes to -1. 
> So far, such a compelling argument has not appeared, at least 
> not in the opinion of those who do the voting (i.e., elected 
> Council memembers).
> 
> Personally I think it's time to move on to more important 
> matters and to accept that we have rough consensus on the JID 
> escaping rules defined in version 0.5 of JEP-0106. But far be 
> it from me to shut down debate, so feel free to keep arguing 
> about the matter on this list.
> 
> Peter
> 
> --
> Peter Saint-Andre
> Jabber Software Foundation
> http://www.jabber.org/people/stpeter.shtml
> 
> _______________________________________________
> Standards-JIG mailing list
> Standards-JIG at jabber.org
> http://mail.jabber.org/mailman/listinfo/standards-jig
> 



More information about the Standards mailing list