[Standards] Proposed XMPP Extension: Character counting in message bodies

Andrew Nenakhov andrew.nenakhov at redsolution.com
Fri Dec 20 14:42:35 UTC 2019

пт, 20 дек. 2019 г. в 17:53, Dave Cridland <dave at cridland.net>:

> On Fri, 20 Dec 2019 at 12:15, Andrew Nenakhov <
> andrew.nenakhov at redsolution.com> wrote:
>> You have sent a string '>>>>>', which was escaped to
>> '>>>>>' before sending to the server.
> Well, maybe. XML doesn't require you to escape '>' in text, only in
> attribute values.

I must be using different XML from you. Documentation for the version we
are using is here: https://www.w3.org/TR/REC-xml/#syntax


> The ampersand character (&) and the left angle bracket (<) *MUST NOT*
> appear in their literal form, except when used as markup delimiters, or
> within a comment <https://www.w3.org/TR/REC-xml/#dt-comment>, a processing
> instruction <https://www.w3.org/TR/REC-xml/#dt-pi>, or a CDATA section
> <https://www.w3.org/TR/REC-xml/#dt-cdsection>. If they are needed
> elsewhere, they *MUST* be escaped
> <https://www.w3.org/TR/REC-xml/#dt-escape> using either numeric character
> references <https://www.w3.org/TR/REC-xml/#dt-charref> or the strings "
> & " and " < " respectively. The right angle bracket (>) may be
> represented using the string " > ", and *MUST*, for compatibility
> <https://www.w3.org/TR/REC-xml/#dt-compat>, be escaped using either " >
> " or a character reference when it appears in the string " ]]> " in
> content, when that string is not marking the end of a CDATA section
> <https://www.w3.org/TR/REC-xml/#dt-cdsection>.

I don't see any exceptions that allow '>' in XML.

> Presumably, in order to calculate the referencing, one would need to know
> precisely how this string was to be serialized? Does that mean it needs
> to... what? Hardcode that knowledge based on the library used? Seems
> astonishingly fragile, especially if you're working in an environment where
> the XML serialization is provided by the platform. Like a web browser.

So far we managed it rather well on four different platforms with five
lauguages. This way we have precise references to resulting stanza text.
Not some 'ideal' 'abstract' unicode string, but to a formed piece of XML
document, that's not going to be changed or modified anymore before
sending. This is the most stable way to solve this problem.

Andrew Nenakhov
CEO, redsolution, OÜ
https://redsolution.com <http://www.redsolution.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.jabber.org/pipermail/standards/attachments/20191220/09eb3db9/attachment-0001.html>

More information about the Standards mailing list