[Standards] Proposed XMPP Extension: Character counting in message bodies
xmpp at larma.de
Thu Dec 19 19:48:55 UTC 2019
On 12/19/19 1:59 PM, Andrew Nenakhov wrote:
> Is it really any better than escaped XML text?
Yes. Any sane implementation of XML parsers would resolve references as
part of the parsing, so you would have to do extra work to find out what
references were in the text before.
> Plus, when doing the web client this means an additional
> escaping - deescaping routine every time when something is
> sent-displayed, cause browsers require their own escaping.
I hope that any web client would not use innerHtml or similar techniques
to display the message body, but instead rely on
document.createTextNode() which expects a string without references.
Similarly inputElement.value and element.textContent give you their
strings without references. In generally HTML/JS do their best to
abstract away from references, because why should an application
developer deal with that?
Also HTML uses a different set of predefined references then XML and has
different requirements - ä is valid in HTML but not in XML (without
it being defined as an entity in a DTD).
> Why should standard be concerned about different server implementations
> converting anything? If a server does some converting for some reason
> from one way of escaping XML to another, of course it should recalculate
> all references.
On the XML layer (which is what XMPP build on) this "conversion" does
not change anything (the texts stay the same), that's why it is
perfectly valid for a server to do it. The protocol on top of XML (and
subsequently XMPP) should not deal with references, they are resolved on
the layer below. That's why it is a bad idea to assume specific
characters to be represented using certain references, because you can't
control that (you can only assume things).
So I tried with Xabber/xabber.org and either your server or the client
(I guess it's the server) seems to fail to properly do what you just
said it should: When sending the message
<reference xmlns='urn:xmpp:reference:0' begin='1' end='1'
<reference xmlns='urn:xmpp:reference:0' begin='3' end='3'
it is displayed as
with g and ; in bold.
> So far our 'non-standard' way of using
> references is in fact way more 'standard' than what is currently
> suggested by this mish-mash of different XEPs.
I guess we have different definitions of a standard. These mish-mash of
different XEPs is a publicly viewable standard proposal. I am not aware
of a documentation of what Xabber is doing
> Not really cool, right?
What's bad about that? I would say that having "0..0 bold" is pretty
weird, because it sounds like an empty range (it starts and ends at the
same point, so it must be empty).
> The second integer represents the location of the first non-URL
> character occurring after the URL *(or the end of the string if the
> URL is the last part of the Tweet text)*
I think you are misunderstanding them here. I am pretty sure "the end of
the string" is *after* the last character, not the last character.
> Cited example of programming languages is valid only in part. Yes, it is
> so in java or python, but not so in swift, obj-c or erlang. The last
> three use index of the first character and length, which is actually my
> favourite approach.
I don't think it really makes sense to discuss which programming
language is the one that matters most, but:
- Swift has two operators "ABCDE"[2...4] = "CDE" and "ABCDE"[2..<4] = "CD"
- Objective-C substring functions require index and length
- Erlang uses 1-based indices, string:sub_string("ABCDE", 2, 4) = "BCD",
thus is equivalent to python [1:4]
Also when you prefer index of first char and length, why not use <ref
begin="2" length="2" /> then? For languages that take string length, you
currently have to calculate length = end+1-begin (because you chose to
have end one less than everyone else does).
> ср, 18 дек. 2019 г. в 21:59, Marvin W <xmpp at larma.de
> <mailto:xmpp at larma.de>>:
> I don't think it really is a "change", in XEP-394 it is already defined
> this way ("the last affected codepoint is the one just before end" )
> and the example in XEP-372  also counts that way (char 72 is the "J"
> of and char 78 is the space after "Juliet"). Only the text misleadingly
> says "An end attribute is similarly used for the index of the last
> character of the reference.", so this may need a clarification.
> Well. I strongly object.
Either we need to change the text in XEP-372 slightly or we have to
change the examples in XEP-372 and the text and examples in XEP-394
(because both should do the same). I see you have a strong opinion on
the one side for some reason.
> ( Btw, did anyone but us implement this XEP at all? )
Converse has an implementation of XEP-372 for mentions (the only usecase
that is properly defined in that XEP IMO).
> On 'already defined' 394. As we have learned from 0071 debacle, even
> widely implemented XEPs can be deprecated with vague reasoning, so
> deprecating a contradictory XEP that, to my knowledge, wasn't even
> implemented anywhere, shouldn't be too much of an issue.
Sure, we could deprecate XEP-394, but I don't see a proper replacement
for it yet. I consider the thing Xabber is doing more like a misuse of
XEP-372, which according to its abstract defines a method for one XMPP
stanza to provide references to another entity, such as mentioning
users, HTTP resources, or other XMPP resources - not a way for putting
markup everywhere. I'd rather like to get rid of XEP-372 (which has a
lot of unclear things and pending TODOs in it) then XEP-394 (which of
course can surely be improved).
More information about the Standards