[jdev] xml processing question

Scott Cotton wsc at mindowl.com
Fri Aug 11 20:24:50 CDT 2006


On 8/10/06, Peter Saint-Andre <stpeter at jabber.org> wrote:
>
> Scott Cotton wrote:
> >
> >
> > On 8/9/06, *Michal vorner Vaner* <michal.vaner at kdemail.net
> > <mailto:michal.vaner at kdemail.net>> wrote:
> >
> >     On Wed, Aug 09, 2006 at 08:34:28PM +0200, Scott  Cotton wrote:
> >     >    Hi all,
> >     >
>
> >
> > I wouldn't equate removing text with ignoring it, but this is certainly
> > sensible for embedded
> > dtds.  Removing all such restricted content might lead to confusion, if
> > say a message contains non-default entity references which are standard
> > in in some common format like xhtml.  These may even be crucial to the
> > communication (like dollar sign vs. euro) Should those be silently
> > removed too?  If it were up to me,  I'd either  pass it all through,
> reject
> > it all, or return a warning to the initiator to all restricted content.

[ignoring restricted xml data]


In RFC 3920, ignore means "treat it as if it did not exist". Probably we
> can make this clearer in rfc3921bis -- i.e., what this means both for
> XML routers (servers) and for the stanza recipient.


Hi,

I'm still unclear on what "treat as if does not exist" means.
First and foremost, I don't know whether ignoring is
passing through untouched  and uninterpreted or
removing it.

A smaller more technical issue is that some restricted content,
like embedded dtds, has its own structure.  Since an implementation
is bound to accept such input (but ignore it), it has to parse it
in order to identify it (which hardly counts as ignoring it).  What if
the restricted input doesn't actually parse according to xml 1.0?
Then a server returns a stream error?   For example:
<!DOCTYPE[
  <jibberish>
]>
Since it's not a valid embedded DOCTYPE, its not restricted xml and so an
implementation is not bound to accept it.  But if it were a valid  xml 1.0
embedded doctype, the implementation must accept the input (parse
it and validate that it's xml 1.0 compliant) and the implementation must
ignore it.  But by that time, the implementation can't ignore it because it
already parsed it.

Well, enough games :)  what is the reason for the assymetry in rfc3920?
I mean why is it that everyone conforming to the protocol MUST
use the xml subset which is not restricted, but then again everyone MUST
accept and ignore restricted xml?




Peter
>
> --
> Peter Saint-Andre
> Jabber Software Foundation
> http://www.jabber.org/people/stpeter.shtml
>
>
>
>


-- 
scott
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.jabber.org/pipermail/jdev/attachments/20060811/b339fba3/attachment-0002.htm>


More information about the JDev mailing list