On Sun, 2026-03-15 at 16:42 +0000, Stephen Paul Weber wrote:
Right. Obviously we want to avoid the HTML syntax
which would require
an
HTML parser and is not XML.
I don't see how that's obvious when the current flow of XHTML-IM
implementations is often
1. Parse XHTML
2. Serialize XHTML
3. Optional: Preprocess serialized XHTML with regex or string
replacement to match HTML parser expectations
4. Pass XHTML to HTML parser to create annotated text
5. Pass annotated text to text renderer
If we pass the (X)HTML as verbatim string, rather than XML, we get:
1. Take (X)HTML from string
2. Optional: Preprocess XHTML with regex or string replacement to match
HTML parser expectations
3. Pass XHTML to HTML parser to create annotated text
4. Pass annotated text to text renderer
If you want to parse XML, the reasonable thing to do is XEP-0394 which
would skip the HTML parser entirely:
1. Parse XML
2. Translate to text annotations
3. Pass annotated text to text renderer
Since the common XML syntax is a strict subset of the
HTML5 HTML
syntax
(which intentionally supports eg self closing tags) the support level
it
equivalent.
This is not correct. HTML5 HTML syntax allows for self-closing tags in
foreign elements (e.g. when having an svg embedded directly in the
document), it does not allow for self-closing tags for HTML void
elements (those that don't have a closing tag, like <img>), raw text or
normal elements (which have mandatory closing tags).
For void elements, it just happens that if you write <img />, the self-
closing is parsed as an attribute '/' with empty string value, which
has no effect. However if you do <img src=foo.png/> (which is valid in
HTML, but not XML) the '/' actually becomes part of the 'src' attribute
value, meaning that the file 'foo.png/' is used as src.
For normal elements where a closing tag is mandatory, the unnecessary
empty '/' attribute works the same, meaning the HTML parser is still
waiting for the closing tag after a seamingly self-closed element. As
an example <div><strong />Hey</div> will render "Hey" in bold
when
parsed as HTML (because the strong is not closed), but not when parsed
as
XHTML.
https://gist.github.com/mar-v-in/7aa612d173d02240b7d2124c18670ec3
is an example file, which when you save it with .html ending and open
it in a browser, it will make the last line bold, if you rename the
file to .xhtml, the last line won't be bold - because the file ending
is translated to an appropriate MIME type and the XML parser is
triggered. I reproduced this on both Firefox and Chromium and it
matches the specification.
Anyway, I still haven't heard of the features and functionality that
people aim to get by reinstating XHTML-IM that XEP-0394 couldn't
provide as well or even better.
Marvin