[standards-jig] JNG Ramblings.

Mike Lin mikelin at MIT.EDU
Fri Aug 9 16:20:08 UTC 2002


> - It's much easier to debug a non-binary protocol. Not just because you 
> can telnet, it's also that that you can easily tcpdump the network 
> traffic and see what has been exchanged. If there has been send wrong 
> data you see it at the first look - wrong bits, wrong packet length 
> numbers and all that things are much harder so recognize in a dump.

The protocol I laid out is perfectly readable in ethereal. There are a
few extra bytes spread around that are meant to help the machine, and
everything else is XML.

You only have to get framing right once, and you'll know something is
wrong if you don't.

> - A binary protocol has less redundancy, this means it's less tollerant 
> on errors and it's less robust.

If you emit bad XML in Jabber, your stream is over, and the server
disconnects you. With a framing protocol, the possibility exists to
recover from that error; so I would argue it is more robust at the
transport level.

Binary framing simply leaves less room to screw up at the framing level,
which is a good thing, because you don't want to screw up - at all - at
the framing level.

> - "Packet sizes" in binary protocols are always limited as size fields 
> have limited size, this limits the extensibility.

So does your CPU. You're going to load that number into a CPU register
at some point no matter how it's represented on the wire, so it is a
good thing from every perspective to have bounded packet sizes. It's
easier on everyone.

> - In non-binary protocols there are more ways to extend the protocol 
> when new features are added.

The entire protocol isn't binary, only the message framing is.
Everything that needs to be extensible is done in XML. I'm using binary
framing to achieve a specific, measured goal of greatly improving the
ease and efficiency with which a PDU framer can be implemented. I never
intend to use it for anything more than that, because we have XML
everywhere else.

> - When programmers deal with binary protocols they tend to use fixed 
> size buffers ... this causes buffer overflows. My impression is that 
> binary protocols are more often vulnerable.

Buffer overflows are a result of bad programmers and bad programming
languages, not binary protocols. Static buffers are a boon for
performance. Not checking the bound, when your language doesn't do it
for you, is just incompetence.

Web servers (e.g. IIS) have had at least as many vulnerabilities as any
other protocol. The first massively widespread buffer overflow
vulnerability was in ftpd. So I think this clearly has nothing to do
with whether your protocol is binary or text.

> - Programmers have to deal with endianness, this can be a really hard 
> work in some languages especially if you want to write portable code.

If it doesn't have ntohl, your language needs AND, OR, SHL, and SHR.
Show me a general-purpose programming language that doesn't.

> - The protocols that have success are the simple protocols: SMTP, NNTP, 
> SNMP, HTTP, POP3, IMAP, ... all of them are protocols you could even 
> implement without much hacking and reading the specs.

Any protocol that tries to squeeze performance is binary. I'm squeezing
performance -- at the framing level _only_.

> - If we want to support virtual channels we don't have to invent a new 
> protocol for that: There is already BEEP and others that can be used for 
> this.

Well, first, my opinion is that there's no good reason for BEEP not to
be a binary protocol. It imposes 32-bit limits on its sequence numbers
anyway. Having to parse text integers and not having fixed-sized headers
just makes things slower, so I really don't understand why it's not
binary.

Secondly, synchrony is expressly not something I am out to achieve,
because we can express it with multiple TCP connections for multiple
resources, and we can even express it by layering over BEEP. I don't
want to do my own flow control, which both BEEP and TCP do.

But even if we used BEEP for everything, we would need some higher-order
framing protocol in order to support multipart messages. BEEP has tended
to use MIME for this purpose. I'm using a binary protocol instead. The
difference is that the binary protocol can also be used very easily at
the TCP level. This reflects a central theme of what I'm doing - we
_allow_ a great deal of additional functionality without _requiring_ too
much.

- Mike




More information about the Standards mailing list