[Standards] Proposed XMPP Extension: Character counting in message bodies
jonas at wielicki.name
Sat Dec 21 18:23:53 UTC 2019
On Mittwoch, 18. Dezember 2019 17:27:04 CET Jonas Schäfer wrote:
> On Mittwoch, 18. Dezember 2019 16:40:42 CET Marvin W wrote:
> > [inline]
> > On 12/18/19 3:22 PM, Andrew Nenakhov wrote:
> > > In the end we have settled for counting characters of escaped string, so
> > This sounds like a terrible idea. In encoded XML, ">", ">", ">"
> > and "<!CDATA[>]]>" are equivalent. I just tried it out and servers
> > indeed do convert all of those to their shortest well-formed variant
> > (which is ">") so you cannot rely on their reference length at all.
> > Servers may at their discretion convert non-ascii characters to their
> > character reference form (starting with &#). I have seen this at least
> > once happening with emojis.
> I’m 100% with Marvin (and Ralph) here. Counting before escaping makes no
> sense, because the character data of XML is codepoints after escaping, not
> before on a theoretical level and for the reasons noted by Marvin on a
> practical level.
Sorry, this statement was confusing. I was thinking on the *receiving* end,
where before the escaping handling would mean to count the codepoint U+0026
(&) as five codepoints (since it would still be encoded as "&").
On the sending side, you most definitely want to count *before* escaping.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 833 bytes
Desc: This is a digitally signed message part.
More information about the Standards