[Standards] Proposed XMPP Extension: Character counting in message bodies

Tedd Sterr teddsterr at outlook.com
Wed Dec 9 14:03:31 UTC 2020


Sam, your argument appears to be "I want to handle everything as bytes without doing any string decoding, so any other option would be more effort (less efficient) for me."

XML is defined as a sequence of characters, not bytes - those characters subsequently need to be transformed into bytes for the purpose of storage/transmission, and that's defined by the encoding scheme (UTF-8 in this case.) Bytes is convenient for you, but not for everyone else using a language that does the decoding upfront. The decoding _should_ be done upfront - that's how you get a valid XML document.

If you're trying to handle XML without first decoding from UTF-8 so you can save a few clock-cycles, that's cool, but you are going to run into awkward annoyances when it comes to trying to handle such alien concepts as characters. The reason you can mostly get away with not decoding is because the lower half of ASCII is represented the same way when using UTF-8, so you can pretend the XML tags are encoded as ASCII characters and just treat any Unicode strings as opaque binary blobs - but that is only a convenient hack. If everyone else is to go along with your convenient hack, that only means they will have to deal with their own awkward annoyances because they made the terrible decision to decode strings before handling them (as if that's what you're actually supposed to do.)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.jabber.org/pipermail/standards/attachments/20201209/cc3614a1/attachment.html>


More information about the Standards mailing list