[Standards] Proposed XMPP Extension: Character counting in message bodies

Jonas Schäfer jonas at wielicki.name
Sat Dec 21 18:23:53 UTC 2019

On Mittwoch, 18. Dezember 2019 17:27:04 CET Jonas Schäfer wrote:
> On Mittwoch, 18. Dezember 2019 16:40:42 CET Marvin W wrote:
> > [inline]
> > 
> > On 12/18/19 3:22 PM, Andrew Nenakhov wrote:
> > > In the end we have settled for counting characters of escaped string, so
> > 
> > This sounds like a terrible idea. In encoded XML, ">", ">", ">"
> > and "<!CDATA[>]]>" are equivalent. I just tried it out and servers
> > indeed do convert all of those to their shortest well-formed variant
> > (which is ">") so you cannot rely on their reference length at all.
> > Servers may at their discretion convert non-ascii characters to their
> > character reference form (starting with &#). I have seen this at least
> > once happening with emojis.
> I’m 100% with Marvin (and Ralph) here. Counting before escaping makes no
> sense, because the character data of XML is codepoints after escaping, not
> before on a theoretical level and for the reasons noted by Marvin on a
> practical level.

Sorry, this statement was confusing. I was thinking on the *receiving* end, 
where before the escaping handling would mean to count the codepoint U+0026 
(&) as five codepoints (since it would still be encoded as "&").

On the sending side, you most definitely want to count *before* escaping.

kind regards,
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.jabber.org/pipermail/standards/attachments/20191221/2de130e6/attachment.sig>

More information about the Standards mailing list