[Foundation] Farming - The solution ? (Was : Re: Jabber Klatch: Discuss .org server role)
borowka at medialogic.it
Fri Aug 10 01:52:17 CDT 2001
On Thu, 9 Aug 2001 11:48:00 -0500, Thomas Muldowney <temas at box5.net> wrote :
> Well to be fair to myself and the work I've done you can press the open source
> server pretty high if you use dpsm or jpolld from cvs.
This is true - jpolld works very well, some more docs are however necessary
if we want it to be used by more people.
> Then there is sheath's farming work, which I'm sure allows it to go beyond that.
Actually, I passed several nights trying to make mod_farm run and then understanding
how it works (no documentation !) and of course testing it.
I see several problems from the conception point of view:
(for the people that don't know how it works) : mod_farm gives you a possibility to have
"redundant services" : you install a master server when everybody connect and
then this server simply route the packets (xml packets, not TCP ones of course)
to several identical nodes. In the simplest form, the nodes are normal jabber
servers with special jsm instead of normal one. This means that the user sessions
are distributed among differents nodes. When a user open a session on one of
the nodes, the node send this information to the master who broadcast it
to other nodes.
Then, ALL the communication between nodes passes through the master.
So : if user A on node 1 wants to send a message to user B who is unfortunately
on the node 2, the message will make the following route :
user A -> master -> node 1 -> master -> node 2 -> master -> userB
As I said, there are several issues :
0. mod_farm segfaults from time to time - and this is not good.
1. All the users are connected to one machine (the master). This is not such a big
problem, we put a multiplexer in front of it and we can boost the number of users.
2. All the packets pass through master. What if we have 50K users sending
500 messages per second? The performances of master should be very carefully
3. A lot ( f.e. 90% in a farm of 10 nodes ) of packets pass twice through the master.
This is a more serious problem in fact, because instead of balancing anything, we double
the load of our farm.
4. Master becomes the critical part of the system. If one of the nodes crashes, it is
relatively easy to inform the others that the pending sessions have been terminated.
However, if the master crashes/has other problems (just because our brand new Pentium IV
is too hot or somone needed the elecric cable to do some coffee ;) we can simply take
our toys and go home.
For point 3, we could introduce some node-to-node communication without passing
through the master. Then, however, we need a machine which would take care
of informing the whole farm if one node has any problem.
Problem 2 can be fixed if we start to use the 'Redirect' error and we connect a client
directly to a node. This is a very good idea, but to be honest, I don' know
any Jabber client that supports redirect. But, that's true, this is a feature quite easy to add.
To summarise : without redirect and node-to-node communication, mod_farm is not
very usable today in a production environement - Sheath, i hope you agree.
I have the proposition which could solve the scalability issue until the new jabbelin.org
server appears: let's concentrate on mod_farm changes and let's stress jabber client
developpers to introduce redirect handling as soon as possible. I think, it would solve
a lot of problems of a lot of people.
What do you think?
Here, in Medialogic, we were also thinking about what Michael Bauer said: what if we
introduce another attribute to every packet, something like : deliver-to with 'true'
jabberid of a user?
We would have a system very similar to the e-mail one:
to: Maciek.Borowka at medialogic.it
deliver-to : mborowka at jabber.medialogic.it:5222
I like it too, but it means that we need to discuss a standardized way to handle it,
which means that we need time, and I think a lot of people want a solution ASAP.
Voila, my 0,02$
PS. Good vacations to everybody :+)
More information about the Members