[Standards] Proposed XMPP Extension: Character counting in message bodies

Sam Whited sam at samwhited.com
Wed Dec 9 16:09:37 UTC 2020


I don't think this is true. XML is defined as UTF-8 (in this case),
which is a collection of bytes. They don't have to be separated out and
transformed into some higher representation of code points. Just because
Python et al. convert things into UTF-32 strings first doesn't mean
everything has to.

Regardless of what language you're using it's trivial to deal with this
as a UTF-8 byte stream, it is not always trivial to handle this as a UTF-
32 integer stream as the example shows.

—Sam

On Wed, Dec 9, 2020, at 14:03, Tedd Sterr wrote:
> The decoding _should_ be done upfront - that's how you get a valid XML
> document.


More information about the Standards mailing list