[Standards] RTT, take 2

Dave Cridland dave at cridland.net
Fri Jun 24 08:08:08 UTC 2011

On Fri Jun 24 02:54:12 2011, Peter Saint-Andre wrote:
> On 6/23/11 12:41 AM, Mark Rejhon wrote:
> > Opinion?
> On the wire is no such thing as a code point, there are only code  
> points
> that are encoded using an encoding form like UTF-8 or UTF-16. For
> details, see:
> http://tools.ietf.org/html/draft-ietf-appsawg-rfc3536bis-02
> Given that XMPP is pure UTF-8, I don't see a compelling reason to  
> count
> UTF-16-encoded code points or UTF-32-encoded code points.
I think UTF-16 and UTF-32 encodings would both be a bad idea; XMPP is  
purely UTF-8 as you say.

However, I don't think that we should refer to UTF-8 octets either,  
here, for a number of reasons:

1) Processing software may have decoded the UTF-8 into "something",  
making it awkward to manage.

2) Referring to UTF-8 octets means we have silly states where we  
could edit inside characters. It's even possible this may be used  
intentionally, in some languages.

So I'd say that we should refer to characters in a string, and deal  
with Unicode code-points in the abstract. I'd expect that  
implementations would convert this internally into whatever made  
sense for them.

Dave Cridland - mailto:dave at cridland.net - xmpp:dwd at dave.cridland.net
  - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
  - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade

More information about the Standards mailing list