[Standards] Proposed XMPP Extension: Character counting in message bodies
Sam Whited
sam at samwhited.com
Wed Dec 9 16:09:37 UTC 2020
I don't think this is true. XML is defined as UTF-8 (in this case),
which is a collection of bytes. They don't have to be separated out and
transformed into some higher representation of code points. Just because
Python et al. convert things into UTF-32 strings first doesn't mean
everything has to.
Regardless of what language you're using it's trivial to deal with this
as a UTF-8 byte stream, it is not always trivial to handle this as a UTF-
32 integer stream as the example shows.
—Sam
On Wed, Dec 9, 2020, at 14:03, Tedd Sterr wrote:
> The decoding _should_ be done upfront - that's how you get a valid XML
> document.
More information about the Standards
mailing list