[standards-jig] JIDs (JEP-0029)

Thomas Muldowney temas at box5.net
Tue Apr 30 16:37:08 UTC 2002

That's all I was looking for.  I just wanted a rational explanation of
the lengths as you specified it.  This works for me.  My thoughts on
characters were irrational, and ignored encoding, sorry.


On Tue, 2002-04-30 at 11:24, Craig wrote:
> Dave wrote:
> > <snip>
>  > 256 or so characters (not bytes - it's silly to
> > penalize international users just so devs don't have to allocate 5x as
> > much storage for resources 
> You and Temas have both brought this up so it is clear that some 
> clarification for why I chose bytes is in order.  Once we specify 
> encoding (UTF-8), then we can map characters to bytes.  The XML parser 
> is good at doing this for us.  Let's not duplicate that work in the jid 
> parsing routine which for any server implementation to be remotely 
> scalable must be fast.  By specifying a lower bound on _characters_, you 
> are forcing all implementers to interpret the encoding, which is silly.
> I'll tell you how I came up with 256 bytes.  I started with how many han 
> ideographs (our worst-case encoding) is reasonable for a username and 
> resource.  64 characters is more than enough since so much more 
> information can be encoded in those characters than in US ASCII.  Okay, 
> so 64 han ideographs translates to 256 bytes in a UTF-8 encoding. 
> That's how I arrived at that byte number -- by asking what should be the 
> minimum number of characters allowed and accomodating that number. 
> (Incidently, at least as of 1.2, and probably currently (haven't looked 
> for a while), the lower bound on the number of characters allowed in the 
> open source server for a username was 8 since it validates the first 64 
> _bytes_ without regard to UTF-8 translation before truncating.)
> So, by saying that we should specify in terms of characters, not bytes, 
> you and Temas are asking for the following additional behavior.  You are 
> requiring all jid parsing to translate UTF-8 encoding into character 
> representation so that you can _punish_ ASCII users -- effectively 
> hurting performance solely to further limit latin speaking users.  Wrong.
> --C
> _______________________________________________
> Standards-JIG mailing list
> Standards-JIG at jabber.org
> http://mailman.jabber.org/listinfo/standards-jig

More information about the Standards mailing list