[standards-jig] Pub/Sub for JNG?
dave at dave.tj
Wed May 1 05:31:00 UTC 2002
Thomas Muldowney wrote: > > On Tue, 2002-04-30 at 23:11, Dave wrote: >
> Reply inline: > > > > - Dave
> > Dave Smith wrote:
> > >
> > > On 4/30/02 9:08 PM, "Dave" <dave at dave.tj> wrote:
> > >
> > > > How about allowing multiple "connection types?" One type can be standard;
> > > > another can be SSL; yet another can be gzipped. In the last "connection
> > > > type," every message would be piped through a gzip-compatible deflation
> > > > before being pumped out to the network. The gzip format (like all
> > > > LZW algorithms) can tell where the end of the file is supposed to be,
> > > > so there's no need to send meta information to alert the receiver to
> > > > the point where one message ends and the next begins. I suspect this
> > > > "connection type" would be most useful for s2s connections between
> > > > servers exchanging reasonably heavy traffic.
> > >
> > > Connection types aren't that bad, but show me some stats that gzipping will
> > > actually do anything to improve "speed" or "efficiency". Empirical data is
> > > of essence.
> > Text is very highly compressible. Any server that has its network
> > bandwidth as a bigger bottleneck than its own processing power (which
> > is rather cheap to increase) is going to benefit from the awesome
> > plain-text compression abilities of gzip. I really don't have the energy
> > to produce a scientifically correct experiment, but try transferring
> > a bunch of files across a small pipe without compression, and you'll
> > probably be able to convince yourself easily that compression is key,
> > because bandwidth isn't _that_ cheap.
> Ok, text itself is not highly compressable (or is it ible?) I can send
> you a set of purely random text data, and you won't compress it for jack
> squat. So what you need is highly repetetive text. Ok, jabber has
> that, because we reuse the same elements. Wrong. Jabber does have a
> lot of common elements but you would have to precomputer the compression
> tables because the samples that are sent each direction are too short to
> get a good sampling from. Mike Lin actually was investigating this some
> when we were discussing compression with SSL, unfortunately I don't have
> the URL to his post handy, and I'm not seeing it with a quick search.
> Try using jabbersearch and finding it for some discussion, with
> evidence, about the compression overhead, and setup costs.
Text itself _is_ highly compressible with LZW, because the character
mappings can ignore half the ASCII charset, resulting in a major savings
even without statistical compression.
As for Mike Lin's post, I'm guessing that his stats were all based on
normal-use c2s connections. I'm not expecting people to use gzipped
connections for normal c2s connections - only for specialized purposes on
c2s; the focus is on s2s mostly, because servers _do_ send lots of stuff
to each other all the time, so they can benefit from the statistical
> > >
> > >
> > > > Another interesting "connection type" is aimed squarely at the average
> > > ..blah..blah..blah.. (see "Greg the Bunny")
> > > > and you'll see a dramatic improvement in your IM response time.
> > >
> > > *Dizzy hands Dave a nice asbestos suit*
> > LOL. . .
> > >
> > > Are you under the impression that TCP packets actually move slower across
> > > ethernet than UDP packets? Are they heavier, or harder to route, or
> > > something? I bet that the little guys in the network don't like having to
> > > carry those heavy TCP packets all over the place.
> > /me starts to fall off his chair. . .
> > >
> > > You know, to be truthful, UDP is quite a bit faster. In fact, it can achieve
> > > zero-time delivery -- more often than not it opts not to deliver packets. I
> > > know it's crazy, but we're really not all that interested in such a fast
> > > transport layer. We like the slow, but reliable pace of TCP -- kinda like a
> > > mule.
> > Whatever happend to the first letter of IM??? Clients that are going to
> > be sending lots of tiny messages can improve their latency (and the load
> > on their favorite Jabber server - UDP is cheaper for an OS to handle
> > than TCP) by using a UDP connection, while mules can continue to use
> > their reliable TCP connections. That harmony makes the world a rather
> > happy place :-)
> IM itself is an absolutely horrid term. I don't see any of the major
> "IM" systems built on true real time operating systems or even
> pretending to offer anything near zero latency. The structure of an
> internet based conversation is one of latency. We are not making a voip
> system, just a simple messaging one.
That may be, but reducing latency for messages is a goal worthy of any
IM system's attention. Jabber has worse latency than ICQ because of the
combination of c2c ICQ connections and the lack of XML overhead there.
I don't like the idea of introducing c2c into Jabber, and axing XML
wouldn't be terribly wise - I don't think too many will disagree on
the latter. However, we _can_ use UDP to lower our latency, and we'll
be reducing the load on servers at the same time, as well. Trust me:
one size _doesn't_ fit all when it comes to connection methods, since
different methods are better for different needs. Something as universal
as Jabber aims to be has to be adaptable to different needs, because
otherwise, there'll always be a better tool for everything Jabber tries
to do, and if integration is the only virtue we can sell Jabber on,
Microsoft will quickly win out with its dotNET gunk.
> > >
> > > Jabber absolutely needs TCP -- even for so-called "lightweight" IM. If you
> > > would look at all the heartache that the IETF IMWG has been through, one of
> > > their fundamental discoveries is that congestion control (something TCP
> > > provides happily) is absolutely necessary for any sort of scalable IM
> > > system.
> > Congestion control isn't exactly at the top of an average guy's
> > priority list. I'd be somewhat suspicious of anybody who decided to
> > code a UDP-based s2s module, but I'd certainly vote to allow clients to
> > use UDP c2s if they prefer. The packet loss statistics on most of the
> > modern Internet are negligible, and the TCP control connection (which
> > simply gets a ping packet sent every few minutes to keep it alive, most
> > of the time) can be used to request retransmission of any packet (since
> > each packet can be numbered - you get two packets with the same number,
> > toss the second one out; you get a packet with an out-of-sequence number,
> > request a retransmission). I don't think I'd mind not finding out about a
> > lost message until the next message arrives (which probably won't be more
> > than a few seconds later); if it really becomes a problem for some people,
> > they can ask every minute over the TCP connection whether X is the number
> > of the last message sent, or they can ask the server to send an alert
> > over the TCP connection when the next message is sent. There are plenty
> > of ways to make UDP as reliable as TCP without losing most of the speed
> > advantage UDP enjoys, while trying to deal with the occasional lost packet
> > in a reasonably time-sensitive manner. UDP really is a good thing :-)
> Congestion control is at the top of an average guy's list, they just
> don't realize it. Congestion control gives us all a fair oppurtunity to
> send a message, as well as keep our presence status known.
There's no congestion on your pipe to the server that can't be solved
by simply switching over to UDP. You don't need fancy flow-control
functionality to send a tiny message every few seconds.
> The rest of your statements seem contradictory. Up above you argure for
> an "Instant" system. Yet here you don't mind if you have retransmission
> problems on the scale of a few seconds.
If the vast majority of your messages can speed across the network (and
you're sending lots of small messages - the type of user who's likely
to benefit most from a UDP connection), it's not the end of the world
if you miss a message for a few seconds - as long as it takes before
you send your next message.
> Next, you describe a whole
> system of numbering and retransmission that TCP already implements.
Well, that's not quite accurate. TCP can't make any assumptions about
the domain of the application involved, so it must be very generalized.
Our system has a rather tiny overhead by comparison to the TCP overhead.
We're also willing to accept delays every now and then (i.e., when the
original UDP packet is dropped, somehow), so we can sidestep some other
issues that TCP must address in real-time.
> we want features like that we should use TCP.
I probably should've pushed that up above my previous note :-(
> All of this is not to say
> I'm against you having a special purpose client that uses UDP. We
> actually end up having this discussion about ever 4 months on JDev or
> some other email. Jabber is designed around the XML, not a certain
> transport layer. That said, it really does tie you to only your server
> implementation, which can be bad, but not in a specialized case. So
> it's a matter of picking.
If we introduce some sort of method of "chatting" with the server at
connection-time, and deciding on a transport layer to use, then we've
solved the implementation issue, as well. Different servers (and
different clients) can implement whatever transport layers they want,
and they can then negotiate to decide which one to use for any given
> > >
> > > Other than that, I think you're completely missing the point of Jabber.
> > >
> > > It's not just about IM. We don't want to sink significant amounts of effort
> > > into optimizing for IM traffic.
> > I'm trying to sink some effort into being able to optimize particular
> > applications for particular purposes. A Jabber client that's actually
> > a bot fetching large XML documents from another Jabber agent (a Jabber
> > gateway to a database, if you will) will certainly want to find some way
> > of compressing all the data coming in, especially if the poor client
> > is on a lousy dial-up connection. If that client can tell its Jabber
> > server when it connects that it wants all packets in that c2s connection
> > compressed, it'll be able to cut the time required to get the data in
> > 2-5, or even better, if the data is very repititious (as it tends to
> > be, with XML). In fact, it'd be neat if it could request queueing of
> > packets, so they're only sent every 4K or whatever - that'd allow the
> > server to optimize the TCP packets to be as large as possible, improving
> > the throughput. "Optimizing IM" is only one facet of my proposal: my
> > proposal is all about choices - the ability to optimize your particular
> > environment to whatever task you happen to be using Jabber for, ATM.
> > That, if you ask me, is the secret to scalability for Jabber under
> > the load of many different types of applications running concurrently,
> > because it enables each application to make optimal use of the network.
> > >
> > > Morale of the story: UDP == blah, blah, blah.
> > >
> > > Diz
> > >
> > > _______________________________________________
> > > Standards-JIG mailing list
> > > Standards-JIG at jabber.org
> > > http://mailman.jabber.org/listinfo/standards-jig
> > >
> > _______________________________________________
> > Standards-JIG mailing list
> > Standards-JIG at jabber.org
> > http://mailman.jabber.org/listinfo/standards-jig
> Standards-JIG mailing list
> Standards-JIG at jabber.org
More information about the Standards