[Standards-JIG] XMPP bandwidth compression
bob at wyman.us
Sat Jul 3 17:49:50 UTC 2004
Jean-Louis Seguineau wrote:
> And probably some day someone will put forward the idea to share
> the dictionary in advance instead off rebuilding it every time to
> speed up the process more, and it will be called 'fast schema' :)
Actually, that has already been done (almost 15 years ago...). It is
Some background: Fast-Infoset grew out of the Fast-WebServices
effort that has been lead by Sun but has reasonably broad support. In web
services applications, there is quite often clear agreement between
communicating parties on the schemas that will be used for interchange.
These agreements are worked out and documented via the WSDL language (Web
Services Description Language). Thus, much WebServices work can be
considered "schema-aware". For such applications, it turns out that doing
normal XML to ASN.1 binary format conversion using the mappings of XML
Schema to ASN.1 which are defined in X.695 is an excellent method of getting
highly compressed and efficient data exchange.
However, implementing a full ASN.1 stack and the various binary
encodings is a bit of a challenge (and one the open source community hasn't
taken on in earnest yet). Thus, there is a desire to provide an easier to
implement method of compressing XML than simply using the full ASN.1
approach. At the same time, there is a desire to provide a means to compress
and make more efficient the XML which is used in "schema-free" applications
or applications where there might be some base agreement on schema, but many
namespaced additions, variations from the base schema, etc. These two needs
(ease of implementation and support for schema-free interop -- or
"weak-schema applications) are those that are addressed by Fast-Infoset.
With Fast-WebServices, endpoints will be free to either use ASN.1 binary
formats in their full-blown form, or they will be free to use fast-infoset
for their payloads. Which encoding is in use would be defined in appropriate
WSDL statements. In either case, the data interchanged will be semantically
identical to what would have been exchanged if text-based XML was in use.
While Fast-webservices is specific to webservices, a number of us
recognized that fast-infoset was a generally useful method for serializing
XML and successfully argued for splitting the specification into two
documents. One defines Fast-Infoset and the second defines Fast-Webservices
(Fast-Webservices provides for the use of both Fast-Infoset and the use of
full-blown ASN.1 binary encodings in web services). This unfortunately,
makes the Fast-Webservices documentation a bit more fragmented, but provides
the benefit of creating a general purpose yet efficient serialization of the
XML Infoset that can be used in a broad range of applications.
So, the hierarchy of encodings is something like this:
1. XML, exceptionally easy to generate by hand or with library
support, easy for humans to read and debug, useful with or without schemas,
but "fat" on the wire and not very efficient to parse or generate.
2. Fast-Infoset, serialization of the XML Infoset, reasonably easy
to generate with some code library support, can't be read or debugged by
humans without great effort and/or tool support, useful with or without
schemas, "thin" on the wire and efficient to parse relative to XML.
3. ASN.1 binary (i.e. PER), can't be generated without good library
support, virtually impossible for humans to read without library support,
only useful in "schema-based" applications, exceptionally compact on the
wire and very efficient to parse. (Note: There are actually a number of
ASN.1 defined binary encodings, BER, CER, DER, aligned PER, unaligned PER
and there is of course "XER" or the XML Encoding Rules for ASN.1...)
We should also anticipate that we'll probably see mixed use of these
encoding methods in the future. However, only time will tell as to if, how,
and where all this stuff is used and what benefits people get from it or
what problems it causes.
More information about the Standards