[Standards] Proposed XMPP Extension: Character counting in message bodies

Sam Whited sam at samwhited.com
Wed Dec 9 12:30:32 UTC 2020


To try and show why I'm pushing back on this so hard here is an example
of doing this three different ways: one assuming the references are
bytes, two assuming the references are code points.

https://play.golang.org/p/kKbr2hXd56U

The third one I was forgetting I can do, and it looks quite nice (if we
ignore the performance cost as people seem to want to do) but we can't
do any error handling for reasons explained in the comments. If we're a
client this may not matter, it's not the end of the world if we show the
user a reference that starts or ends with an ugly error character box or
something, if we're the server this might matter more, either way, I
think having a sane way to do error handling on bad references is a
requirement:

Of course, this is Go specific but the solutions probably look similar
in other C-like languages. I should also note that this is using a
higher level decoding API than I am using, but it doesn't matter since
the extra boilerplate required to do this at the lower- level where you
get byte slices out would look the same for the first two examples.
However it would require extra work for me to do the third example
(because it would give me []byte, not a string) which makes it even less
practical and the third example isn't a convenience that exists in eg.
C, so generally it's worth just ignoring.

If I'm having to pick between the code in the first and second example,
please let me pick the first.

—Sam

On Tue, Dec 8, 2020, at 22:13, Sam Whited wrote:
> The XML library I use does not give me a string or slice of code
> points, it gives me a slice of bytes because that's the level I'm
> operating at. Even at the higher level if I decode the bytes into a
> string (A Go string in this case), that is still just a slice of UTF-8
> bytes (it does not decode them, ensure they're valid, and turn them
> into a slice of code points, that is a very expensive operation that
> it avoids until you need it or explicitly do it yourself).
>
> I don't understand how this is part of the XML data model. Do you mean
> that only Unicode encodings are supported by XML? If so, that's fair
> and removes one of my arguments, I did not know that was the case.
> However, I still think the data on the wire should describe the other
> data on the wire, not some higher- level "decoded" representation that
> many XML libraries may not even use.
>
> —Sam
>
> On Tue, Dec 8, 2020, at 21:32, Jonas Schäfer wrote:
> > But all implementations which want to be XMPP and XML 1.0 compliant
> > need to have some way to convert or offer access to code points, as
> > that’s the XML data model. Let’s build on that.
> >
> > Easy choice.
> >
> > Much easier than writing 20 emails on this topic, and that just in
> > this thread.
> _______________________________________________
> Standards mailing list Info:
> https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: Standards-
> unsubscribe at xmpp.org
> _______________________________________________
>

-- 
Sam Whited


More information about the Standards mailing list