[Standards] RTT, take 2
dave at cridland.net
Fri Jun 24 08:08:08 UTC 2011
On Fri Jun 24 02:54:12 2011, Peter Saint-Andre wrote:
> On 6/23/11 12:41 AM, Mark Rejhon wrote:
> > Opinion?
> On the wire is no such thing as a code point, there are only code
> that are encoded using an encoding form like UTF-8 or UTF-16. For
> details, see:
> Given that XMPP is pure UTF-8, I don't see a compelling reason to
> UTF-16-encoded code points or UTF-32-encoded code points.
I think UTF-16 and UTF-32 encodings would both be a bad idea; XMPP is
purely UTF-8 as you say.
However, I don't think that we should refer to UTF-8 octets either,
here, for a number of reasons:
1) Processing software may have decoded the UTF-8 into "something",
making it awkward to manage.
2) Referring to UTF-8 octets means we have silly states where we
could edit inside characters. It's even possible this may be used
intentionally, in some languages.
So I'd say that we should refer to characters in a string, and deal
with Unicode code-points in the abstract. I'd expect that
implementations would convert this internally into whatever made
sense for them.
Dave Cridland - mailto:dave at cridland.net - xmpp:dwd at dave.cridland.net
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade
More information about the Standards