[Standards-JIG] XHTML-IM Conclusions
bje at apnic.net
Mon Sep 13 23:56:00 UTC 2004
Trejkaz Xaoza wrote:
> On Fri, 3 Sep 2004 20:16, Ian Paterson wrote:
>>- CSS is a very good, familiar and simple styling standard.
> Simple to write, perhaps...surely not simple to implement.
> If it were so simple to implement, then all the browsers would be able to get
> it to work the same, such that a single site rendered in only one way on all
> of them.
It's not quite as bad as it may seem. CSS is trivial to parse, assuming
you will only accept well formatted CSS. The two parts of CSS that tend
to be broken or different are the box model, which IE is notoriously bad
at, and font sizes, which don't cross platforms well. Mostly, it's
Gecko on X11 that does strange things with font sizes.
However, if you're accepting a very limited subset of CSS attributes,
and simply discarding the rest, you don't have to worry so much about
these things. You look for the CSS attributes you know and love, such
as font-weight and font-family, and discard those whose odour offends
you, such as float and margin.
>> <message to='foo' from='bar'>
>> Be bold!
>> Wear a false limb!
>> <html xmlns='http://jabber.org/protocol/xhtml-im'>
>> <body xmlns='http://www.w3.org/1999/xhtml'>
>> Be <span style='font-weight:bold'>bold</span>!
>> Wear a <span style='color:red'>false limb</span>!
> So basically, if I am on a monochrome Palm unit, and wanted to mark up bold
> but not colour, I would need to parse CSS as well as XML in order to
> determine what to drop.
Well, no. You don't need to parse CSS. You don't need to worry about
selectors, or @-rules, or anything except the declarations themselves,
which are quite easy to parse. You just need to tokenise a string, and
break it at colons and semicolons. There can be quoted sections, and
backslash escaping shenanigans, but nothing terribly complex, and when
you're done you'll have a set of attribute/value pairs. Discard any
pairs for which you either don't recognise the attribute or cannot make
sense of the value, and you're left with the style information you need.
You could also ignore escaping and quoting behaviours, since none of the
JEP RECOMMENDED attributes take values that should have those things,
but your parser would be less robust. It'd also be about four lines, so
it's kind of a tradeoff. :)
> Actually, I would need to do this anyway, as there are a wealth of CSS
> elements which can be detrimental to the chat flow. Weblog sites like
> LiveJournal know all about this, as it enables you to mark up text appearing
> outside of the area you're supposed to be writing in. They have a
> sophisticated set of regular expressions to clip out the bits they don't
> want. :-)
You can't apply a style to an arbitrary element using the XHTML style
attribute. Styles specified in that attribute only affect the element
in question, and if you are not allowing any positional or box model CSS
attributes, which the JEP does not, you cannot alter the flow of the
> But at least if it were all pure XML in some way, we could drop the elements
> which don't make sense like we always do. Even something based on the HTML
> 3.2 <font/> model would have been more appropriate for simple formatting,
> even if it's completely useless for IM, because I suspect that the reality of
> IM is people won't mark things up semantically anyway, since only one person
> would get the benefit of the semantics. ;-)
If it were all pure XML, you wouldn't have to worry about any parsing at
all, true. Peter would instead have to define an XML schema for these
simple formatting elements, and the JEP's name would probably have to
I don't disagree with you, but I've already made my statement about the
use of XHTML+CSS, and the JEP was altered to reduce the complexity of
interpreting an XHTML-IM. You still have to parse CSS declarations, but
that's a lot easier than it seems.
More information about the Standards