[Standards] Proposed XMPP Extension: Character counting in message bodies

Sam Whited sam at samwhited.com
Tue Dec 8 22:13:08 UTC 2020

The XML library I use does not give me a string or slice of code points,
it gives me a slice of bytes because that's the level I'm operating at.
Even at the higher level if I decode the bytes into a string (A Go
string in this case), that is still just a slice of UTF-8 bytes (it does
not decode them, ensure they're valid, and turn them into a slice of
code points, that is a very expensive operation that it avoids until you
need it or explicitly do it yourself).

I don't understand how this is part of the XML data model. Do you mean
that only Unicode encodings are supported by XML? If so, that's fair and
removes one of my arguments, I did not know that was the case. However,
I still think the data on the wire should describe the other data on the
wire, not some higher- level "decoded" representation that many XML
libraries may not even use.


On Tue, Dec 8, 2020, at 21:32, Jonas Schäfer wrote:
> But all implementations which want to be XMPP and XML 1.0 compliant
> need to have some way to convert or offer access to code points, as
> that’s the XML data model. Let’s build on that.
> Easy choice.
> Much easier than writing 20 emails on this topic, and that just in
> this thread.

More information about the Standards mailing list