[Standards] Need sanity check on an example in XEP-0393: Message Styling

Tedd Sterr teddsterr at outlook.com
Sat Nov 7 00:44:40 UTC 2020

The difference comes from deciding what to do once you discover that the first two asterisks don't constitute a valid span.
In your case, you say: the first asterisk IS an open, so now I MUST find a close to match; and then you search the rest of the string trying to find one. Whereas I say: the first asterisk CAN'T BE an open (it's followed immediately by a close), so treat it as text and move onto the next character.
I don't think there is a rule violation in either case precisely because this isn't specified; you'll need to specify how to deal with this case - keep or reject the potentially invalid open directive.

I'm not sure there is anything in the current rules that would require the extra lookahead; you should be able to parse the string once to identify all of the potential directives and then construct the spans using that list. Repeat searching trying to find the best match doesn't sound very lazy.

The possible-directive characters that might appear within spans are ones which would not otherwise be identified as a valid open or close.
Consider the following examples (braces mark span start & end):
1. {*text*}text*
2. {*text*} text*
3. {*text *text*}
Example 1 is the same as in your examples; 2 is basically the same (the space comes after the close); but for 3 the space comes before the second asterisk which invalidates it as a close, thus making it a possible-but-not-directive-present-between-two-directives. That's easily identified without searching the whole string, and no rules are broken.

Anyway, let's not continue to spam the mailing list - sorry everyone! - we can continue this debate elsewhere if necessary.

The best thing to do is write some code that follows the rules and see what that leads to - that should also allow you to identify where the rules are underspecified and generate consistent examples.

On Fri, Nov 6, 2020, at 15:59, Tedd Sterr wrote:
>  The way you're suggesting requires unbounded lookahead

I've been meaning to send an email about this for ages too, but this
*is* how it works right now. You definitely have to do this with the
current rules (regardless of how this particular situation works). It's
not really a problem in XMPP land though because the server will enforce
a max message length. I would have liked to fix this, but it would have
made it a lot more likely to have false positives and the user
experience isn't as good (which is why I suspect Slack/Watsapp/etc. do
something similar to what I've done). I probably should have put in an
explicit max-span-length though. Either way, it's not likely to be a
problem even if someone sends you tons of messages with 4k (or whatever
the server allows you to send) spans in them. These are small
(relatively) messages, not documents or serialization formats.

We can chat about this in another thread at some point though

> Don't try to be overly clever with the parsing, a lookahead of one
> character should be sufficient to identify directives. (Whether they
> are active and demark spans depends on matching pairs of directives.)

I understand what you're saying and how your parsing rules
work. What I'm trying to figure out is what the text says right now, and
I'm not sure if it matches what you're describing (which is how I've
written some of my implementations before) or what I described in my
original message. I am not trying to decide what would be best or change
the normative text right now.

I *think* your rules violate "and thus may be present between two
other styling directives" which would mean that "***" is valid, but
I'm not sure.

It also may not matter since this isn't likely to be a real situation,
but if I can clarify the rules I'd love to do so.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.jabber.org/pipermail/standards/attachments/20201107/2ed98727/attachment.html>

More information about the Standards mailing list