[Standards] Proposed XMPP Extension: Character counting in message bodies

Dave Cridland dave at cridland.net
Fri Dec 18 18:17:03 UTC 2020


On Wed, 9 Dec 2020 at 19:21, Sam Whited <sam at samwhited.com> wrote:

> I believe this is a mischaracterization of my argument. My argument is
> "everything will have a way to get at the underlying bytes, not
> everything will have them pre-converted into code points".


I think this, in particular, is not correct.

The counter-argument - that everything can obtain a sequence of codepoints,
but might not be able to get at a sequence of octets - is more accurate.

In particular, I think anything based on Python would only receive text
nodes as `str` objects, which are codepoint-based, and the {de|en}coding to
UTF-8 is part and parcel of the XML [de]serialization.

If we're counting codepoints and we only have the UTF-8, though, this
should be fairly easy without formal decoding, assuming we do not require
normalization.


> Also "this
> gives us the option to do certain optimizations on systems that support
> them, but using code points doesn't so we should do the thing that is
> the most flexible".
>

Oh, I agree with this, as a broad principle. But I don't think it's viable
in this case.


>
> —Sam
>
> On Wed, Dec 9, 2020, at 19:09, Tedd Sterr wrote:
> > Regardless, your argument is still "bytes is more convenient for me,
> > so everyone else should do what's best for me." I don't think that's a
> > good argument.
> _______________________________________________
> Standards mailing list
> Info: https://mail.jabber.org/mailman/listinfo/standards
> Unsubscribe: Standards-unsubscribe at xmpp.org
> _______________________________________________
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.jabber.org/pipermail/standards/attachments/20201218/45d6b747/attachment.html>


More information about the Standards mailing list