[Standards] Proposed XMPP Extension: Character counting in message bodies

Dave Cridland dave at cridland.net
Fri Dec 20 15:33:17 UTC 2019

On Fri, 20 Dec 2019 at 14:43, Andrew Nenakhov <
andrew.nenakhov at redsolution.com> wrote:

> пт, 20 дек. 2019 г. в 17:53, Dave Cridland <dave at cridland.net>:
>> On Fri, 20 Dec 2019 at 12:15, Andrew Nenakhov <
>> andrew.nenakhov at redsolution.com> wrote:
>>> You have sent a string '>>>>>', which was escaped to
>>> '>>>>>' before sending to the server.
>> Well, maybe. XML doesn't require you to escape '>' in text, only in
>> attribute values.
> I must be using different XML from you. Documentation for the version we
> are using is here: https://www.w3.org/TR/REC-xml/#syntax
> Quote:
>> The ampersand character (&) and the left angle bracket (<) *MUST NOT*
>> appear in their literal form, except when used as markup delimiters, or
>> within a comment <https://www.w3.org/TR/REC-xml/#dt-comment>, a processing
>> instruction <https://www.w3.org/TR/REC-xml/#dt-pi>, or a CDATA section
>> <https://www.w3.org/TR/REC-xml/#dt-cdsection>. If they are needed
>> elsewhere, they *MUST* be escaped
>> <https://www.w3.org/TR/REC-xml/#dt-escape> using either numeric
>> character references <https://www.w3.org/TR/REC-xml/#dt-charref> or the
>> strings " & " and " < " respectively. The right angle bracket (>)
>> may be represented using the string " > ", and *MUST*, for
>> compatibility <https://www.w3.org/TR/REC-xml/#dt-compat>, be escaped
>> using either " > " or a character reference when it appears in the
>> string " ]]> " in content, when that string is not marking the end of a CDATA
>> section <https://www.w3.org/TR/REC-xml/#dt-cdsection>.
> I don't see any exceptions that allow '>' in XML.
Unless I'm missing something obvious, this says you can use > unescaped
everywhere except for the explicit case of "]]>". So again, ">>>>" as text
content can be sent as-is, and requires no escaping. (But I was wrong about
the attribute value case, where you can, in fact, send ">>>>>" unescaped

>> Presumably, in order to calculate the referencing, one would need to know
>> precisely how this string was to be serialized? Does that mean it needs
>> to... what? Hardcode that knowledge based on the library used? Seems
>> astonishingly fragile, especially if you're working in an environment where
>> the XML serialization is provided by the platform. Like a web browser.
> So far we managed it rather well on four different platforms with five
> lauguages. This way we have precise references to resulting stanza text.
> Not some 'ideal' 'abstract' unicode string, but to a formed piece of XML
> document, that's not going to be changed or modified anymore before
> sending. This is the most stable way to solve this problem.

I think we've just conclusively proven it does get changed during sending.
We certainly cannot rely on it not being changed, since absolutely nothing
in XML or XMPP prevents it being changed.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.jabber.org/pipermail/standards/attachments/20191220/f1034aa6/attachment.html>

More information about the Standards mailing list