[standards-jig] JNG Ramblings.

Mike Lin mikelin at MIT.EDU
Tue Aug 13 00:24:36 UTC 2002


Hi Joe,

> I'm not sure whether a different framing protocol is right.  I'm
> pretty sure we've achieved the scalability we have due to our use of
> element per packet vs. document per packet, but without actual
> real-world load tests of both, I can't be sure.  More framing may also
> help, but again, we do a pretty good job now.  Maybe I'm being naive,
> or haning out with the wrong crowd, but server performance matters
> more than client performance, and our server does pretty well.
> Really, I think the point of framing should be the ability to send
> binary data without encoding.

Just to get a feel for the streaming document vs. document packet, I
hacked together something this past weekend that at least provides a
data point on the question (though by no means a full picture). It's a
little C program that tries to simulate streaming-document and
document-per-packet by running Expat over one million Jabber-style XML
elements. In one version, the same parser instance is used; in the
other, a new instance is created and freed each time*. The Expat
callbacks allocate fake data structures to simulate what I presume the
server does in order to store information about each packet. I attached
the code if you are interested to take a look.

The result is pretty consistent that over one million iterations,
document-per-packet is slower by a factor of about 1.2. So, while there
is a non-negligible cost, I think it can be said that
document-per-packet is not catastrophically slower than streaming
document. As to whether the overhead would cause an actual scalability
bottleneck, as you mention, there's no way to know without fully
implementing it and finding out.

Anyway, obviously these results are nonscientific, nonthorough, and
should not be taken seriously; but at least they provide a data point if
not a full picture.

-Mike

* Don Box is fond of giving the following demo in his .NET lectures:

Write a C++ program that loops many times, creating a new object and
deleting it each time. Demonstrate that it takes some length of time to
run.
Write a C# program that loops the same number of times, creating a new
object each time. Demonstrate that it runs an order of magnitude faster
than the C++ program.

The reason is that in the second case the CLR's garbage-collector gets
to be clever about managing its heap and free lists in order to make it
fast to allocate and GC objects, while the C++ allocator has to be
pretty dumb in order to allow the programmer to delete the object at any
time. So, I wonder how much of the overhead cost in creating new XML
parser instances is related to these memory allocation issues that could
be helped in a garbage-collected language.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: packetsim.c
Type: text/x-c
Size: 2531 bytes
Desc: not available
URL: <http://mail.jabber.org/pipermail/standards/attachments/20020812/76065c3a/attachment.bin>


More information about the Standards mailing list