[Standards-JIG] XHTML-IM Conclusions

Byron Ellacott bje at apnic.net
Mon Sep 13 23:56:00 UTC 2004

Trejkaz Xaoza wrote:
> On Fri, 3 Sep 2004 20:16, Ian Paterson wrote:
>>- CSS is a very good, familiar and simple styling standard.
> Simple to write, perhaps...surely not simple to implement.
> If it were so simple to implement, then all the browsers would be able to get 
> it to work the same, such that a single site rendered in only one way on all 
> of them.

It's not quite as bad as it may seem.  CSS is trivial to parse, assuming 
you will only accept well formatted CSS.  The two parts of CSS that tend 
to be broken or different are the box model, which IE is notoriously bad 
at, and font sizes, which don't cross platforms well.  Mostly, it's 
Gecko on X11 that does strange things with font sizes.

However, if you're accepting a very limited subset of CSS attributes, 
and simply discarding the rest, you don't have to worry so much about 
these things.  You look for the CSS attributes you know and love, such 
as font-weight and font-family, and discard those whose odour offends 
you, such as float and margin.

>>  <message to='foo' from='bar'>
>>    <body>
>>      Be bold!
>>      Wear a false limb!
>>    </body>
>>    <html xmlns='http://jabber.org/protocol/xhtml-im'>
>>      <body xmlns='http://www.w3.org/1999/xhtml'>
>>        Be <span style='font-weight:bold'>bold</span>!
>>        Wear a <span style='color:red'>false limb</span>!
>>      </body>
>>    </html>
>>  </message>
> So basically, if I am on a monochrome Palm unit, and wanted to mark up bold 
> but not colour, I would need to parse CSS as well as XML in order to 
> determine what to drop.

Well, no.  You don't need to parse CSS.  You don't need to worry about 
selectors, or @-rules, or anything except the declarations themselves, 
which are quite easy to parse.  You just need to tokenise a string, and 
break it at colons and semicolons.  There can be quoted sections, and 
backslash escaping shenanigans, but nothing terribly complex, and when 
you're done you'll have a set of attribute/value pairs.  Discard any 
pairs for which you either don't recognise the attribute or cannot make 
sense of the value, and you're left with the style information you need.

You could also ignore escaping and quoting behaviours, since none of the 
JEP RECOMMENDED attributes take values that should have those things, 
but your parser would be less robust.  It'd also be about four lines, so 
it's kind of a tradeoff. :)

> Actually, I would need to do this anyway, as there are a wealth of CSS 
> elements which can be detrimental to the chat flow.  Weblog sites like 
> LiveJournal know all about this, as it enables you to mark up text appearing 
> outside of the area you're supposed to be writing in.  They have a 
> sophisticated set of regular expressions to clip out the bits they don't 
> want. :-)

You can't apply a style to an arbitrary element using the XHTML style 
attribute.  Styles specified in that attribute only affect the element 
in question, and if you are not allowing any positional or box model CSS 
attributes, which the JEP does not, you cannot alter the flow of the 
overall display.

> But at least if it were all pure XML in some way, we could drop the elements 
> which don't make sense like we always do.  Even something based on the HTML 
> 3.2 <font/> model would have been more appropriate for simple formatting, 
> even if it's completely useless for IM, because I suspect that the reality of 
> IM is people won't mark things up semantically anyway, since only one person 
> would get the benefit of the semantics. ;-)

If it were all pure XML, you wouldn't have to worry about any parsing at 
all, true.  Peter would instead have to define an XML schema for these 
simple formatting elements, and the JEP's name would probably have to 
change. ;)

I don't disagree with you, but I've already made my statement about the 
use of XHTML+CSS, and the JEP was altered to reduce the complexity of 
interpreting an XHTML-IM.  You still have to parse CSS declarations, but 
that's a lot easier than it seems.


More information about the Standards mailing list