[Standards] Need sanity check on an example in XEP-0393: Message Styling

Tedd Sterr teddsterr at outlook.com
Sat Nov 7 16:41:53 UTC 2020

> > You don't need to go backwards, you decide whether the current
> > character is a valid open according to the next character - you have
> > to do this anyway to check for spaces.
> That's fair. I think you may be right that it doesn't violate the rules
> to decide that both tokens are invalid vs. just the second one, making
> this underspecified (not just confusing). I'll start a new reply thread
> with what the implementations I know about do so we can figure out if
> it's possible to fix this one way or the other.

All that's needed is a rule to specify how to handle the case of an open followed immediately by a close - instead of just saying the span isn't valid, you need to say either keep the open or call it text.

I'll note that in the case of keeping the open and trying to find a close will turn "********" into the less-than-desirable "{***}{***}**", rather than "{********}" because matches must be taken lazily (first possible close wins). And counter-intuitively turns "******** my *strong* text" into "{***}{***}{** my *}strong* text"; treating the invalid open as text gives "******** my {*strong*} text".

> > Given a very-contrived-to-prove-the-point input such as: "*text _text
> > ~text" You would see the asterisk, decide that's an open, and then
> > search all the way to end of the string looking for a matching close -
> > you wouldn't find one, so then you'd have to go back and say the
> > asterisk isn't an open. Then you'd get to the underscore …
> It doesn't really matter, but you could also just use your technique
> here and range over the entire span (or to the first newline if the
> opening turns out to not have a matching closing) and mark each token as
> valid or not as you come to it, then when you're done with everything
> return the tokens you found. You don't have to repeatedly scan, and
> whether this example works one way or another doesn't change that as far
> as I can tell.

Yes, that's largely just a question of how it's implemented.
The more important point is that you say "you can't go back and change your mind on start tokens", but whichever way you implement it, you can't say for certain that an open starts a span until after you've checked further, i.e. this isn't a context-free grammar. It's not possible to decide whether an open is valid at the point you find it - it is necessary to be able to 'change your mind' or, rather, put off the decision until you have sufficient evidence.

> > The directives themselves are identified using a lookahead of one, not
> > unbounded
> Yes, you're right, my terminology wasn't correct there. Either way
> the point is this works this way regardless of how you parse this
> example, so I don't think it matters WRT whether this example should
> be strong or not.

The example itself is unimportant, but it serves to clarify how the rules should work, and there are possible knock-on effects for the rest of the string (as above.)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.jabber.org/pipermail/standards/attachments/20201107/12e7d6c9/attachment.html>

More information about the Standards mailing list