[Standards] Need sanity check on an example in XEP-0393: Message Styling

Sam Whited sam at samwhited.com
Sat Nov 7 04:12:29 UTC 2020



On Fri, Nov 6, 2020, at 19:44, Tedd Sterr wrote:
> Whereas I say: the first asterisk CAN'T BE an open (it's followed
> immediately by a close), … I don't think there is a rule violation in
> either case precisely because this isn't specified

I think that's covered by "Spans are always parsed from the beginning of
the byte stream to the end". This in my mind meant we can't go backwards
to decide if the original start element really was one or not. Maybe
that's what needs to be clarified here. I wonder if it would be a
violation of the rules to add it.

> I'm not sure there is anything in the current rules that would require
> the extra lookahead; you should be able to parse the string once to
> identify all of the potential directives and then construct the spans
> using that list.

That's still unbounded look ahead, requiring that you range over the
entire string to make sure one span actually is styled.

> Repeat searching trying to find the best match doesn't sound
> very lazy.

You don't have to do repeated searching, just one which is potentially
unbounded in the forward direction. The way my code works right now is
basically as you've described (it does actually repeat in several cases,
but that's just because it made it easier to reason about and I didn't
care about performance for short messages, it could be rewritten to do
it in one pass instead of finding the close elements, then recursing
into the bytes inside of them to look for more spans), except that you
can't go back and change your mind on start tokens because I don't think
the current rules allow that.

> 3. {*text *text*} Example 1 is the same as in your examples; 2 is
>    basically the same (the space comes after the close); but for 3 the
>    space comes before the second asterisk which invalidates it as a
>    close, thus making it a possible-but-not-directive-present-between-two-
>    directives. That's easily identified without searching the whole
>    string, and no rules are broken.

I don't understand this example, sorry. This is how it works today and
is consistent with skipping the middle * in the "***" example.


> Anyway, let's not continue to spam the mailing list - sorry everyone!
> - we can continue this debate elsewhere if necessary.

This is the place to discuss XEPs, so this seems fine. I'd love to get
others opinions on whether this is underspecified or if I'm just
overthinking it and this seems like the place to do it.

> The best thing to do is write some code that follows the rules and see
> what that leads to - that should also allow you to identify where the
> rules are underspecified and generate consistent examples.

That's what I have done. I have written multiple implementations which
led me to realizing that this part is confusing and possibly
underspecified. Now that's why I'm having this conversation: I'm trying
to figure out what the current specification means if you follow it. One
of my implementations had "***" unformated, one of them had it all
strong and I'm trying to figure out which one is right, and/or where
things need to be clarified. I can see the argument for both, but I'm
unsure if it's underspecified, or if it's just unclear and one or the
other is right.

Maybe it would be more productive to ask what other implementations have
done? If there's broad consensus I can just clarify the rules to mean
whatever everyone is already doing.


—Sam

-- 
Sam Whited


More information about the Standards mailing list