[JDEV] Re: jabberd 1.4.3 release candidate again
jabber at dsutton.legend.uk.com
Mon Nov 10 23:10:27 CST 2003
I will try and work on a patch either tonight or tomorrow, since i already have to technically disable it for JCR. This
was always a workaround using an undocumented 'feature', so I at least want a way to be able to disable it. It is still
relevant for v0.6.x, if that code is used as an internal component.
Correcting one miscommunication - MU-Conference now makes use of libglib2, which is not the same as glibc. libglib2 is
a set of data types, event handlers and other useful routines. In particular, MU-Conference is making use of the
ghashtable routines for the internal hashtables. As soon as I can get enough time, i'll try and get cygwin onto my win98
partition and try myself. If we can get glib2 working, then there is something else we can try, which means that
mu-conference can be run as an executable, rather than a dll.
On Mon, Nov 10, 2003 at 11:29:00PM -0500, Frank Seesink wrote:
> Hey David!
> Thanks for the quick response. Regarding the MU-C end of things, I was
> just noting some things I observed. Not sure if a patch is the way to
> go, but figured I'd better post what I was finding.
> Note I'm just standing on the shoulders of giants here. Whoever did the
> initial work to make Jabberd compile under Cygwin deserves most of the
> credit. I just used his/her work to get the rest to build. And the
> rest of the credit goes to the Cygwin and gcc teams, who obviously have
> made serious leaps since Jabberd 1.4.2 was first released.
> That being said, if I understand all this stuff right, what I'm seeing
> is that under both *nix and Cygwin, building a dynamic library that you
> can compile against by using a .a library is now pretty much trivial.
> But the linchpin lies in building such a .a library for an executable.
> Under *nix this seems no more complex than doing so for a dynamic
> library, allowing things like Jabberd's plugin architecture, where
> pieces like MU-C can "see" both functions and variables within the main
> jabberd.exe just as easily as jabberd.exe can "see" any libraries it was
> compiled against.
> In Cygwin, there appears to be a limitation (I tried my best in my
> limited capacity to explain this in another message...not sure I was
> successful). Noting that one conditional in jabberd.h, it APPEARS that
> you can use the old 5-step process to build a .a file from the
> jabberd.exe which contains the exported functions, but NOT variables.
> At least that's how it looks. I'm guessing here as I did not write this
> conditional, but its existence makes me believe that exporting variables
> is non-trivial under Cygwin vs. exporting functions. Note the function
> get_debug_flag() is used EXCLUSIVELY by Cygwin code (both in the dnsrv
> code and log.c).
> Using the Cygwin tools like 'nm' to build an export list from a binary
> gives you all the functions but apparently NOT the variables that should
> be "exposed" to outside modules like MU-C. So when I initially tried
> compiling MU-C v0.52, the compilation failed with an unresolved
> reference to deliver__flag, as there was no such export in the jabberd.a
> file. I confirmed this by looking at the jabberd.def file created
> during the compilation of jabberd itself, and sure enough, there was no
> line in the file (just a basic text file) with deliver__flag.
> So I manually added the reference to the export list in the .def file by
> way of an echo statement in the jabberd Makefile. When I recompiled
> jabberd, jabberd.a now contained that in its export table. This made
> the compilation of MU-C v0.52 happy, and it appeared to hook in and run
> just fine...as long as I did not CREATE a room. I could bring up the
> MU-C window, click around in it, see it build the conference room list
> from jabber.org and tipic.com (I was using the Rhymbox client)--implying
> my jabber server was going out and getting all that info from
> jabber.org/tipic.com--and even join MU-chats on OTHER servers. But the
> moment I tried to CREATE a room on my jabber server, BOOM! I just
> suspected that this was when MU-C tried to "touch" deliver__flag. But I
> haven't dug that deep to confirm. I'm sure you probably know this off
> the top of your head. :-)
> I have no idea if this means such a patch is the way to go. I just
> wanted to let folks know what I've been finding, and based on what I
> found, it seems that maybe this can explain why I get such a vicious
> segfault when I try to create a room with MU-C v0.52. I mean, jabberd
> just up and dies with a core dump. It's like clockwork. And if any
> other module writers have been having fits under Cygwin, this might be
> one thing to note.
> If this helps you understand how things are under Cygwin, then great.
> I'd love to feel like I've done something useful. But I won't lie to
> you. I'm just guessing here, as I'm so rusty in my coding skills that
> maybe I'm overlooking something simple. :-/ Don't know if the answer is
> a patch to avoid trying to access deliver__flag, or if like the
> debug_flag, what's needed is a simple accessor function like
> get_debug_flag() and its matching modifier function set_debug_flag().
> Something like get_deliver__flag() and set_deliver__flag(). But that
> would require the code be modified in jabberd itself, not in MU-C.
> But if you can provide a patch just to test this theory, that'd be
> awesome. Not sure how relevant this will be once you bring out v0.6
> though, as it seems you're doing a good bit of overhauling if I
> understand right...like the use of glibc, for example. By the way, I
> did not have success building that from source under Cygwin. But again,
> could just be me. (Man how I wish all these "givens" like glibc and
> bind were offered as clean packages under Cygwin, but I don't think I'm
> the guy for the job. :-()
> P.S. I have an updated Makefile for you for MU-C v0.52. It greatly
> simplifies the difference between *nix and Cygwin. I'm attaching it
> here. You'll see what I mean if you compare it to the last one I sent
> you. By the way, for MU-C v0.3, all I do is remove hash.o from
> David Sutton wrote:
> >(CC'ing to the MU-Conference list)
> >Hi there,
> > If its the deliver__flag thats causing the issue, then I can make up a
> >patch that will disable it if the cygwin define is set. FYI, this was a
> >hack done in the v0.5.x series, to try and help room entry times, and
> >the associated cpu usage. There is a pipe in the pth scheduler code that
> >is causing large cpu usage if you try and send lots of small stanzas
> >through the jabberd deliver() function one at a time. By triggering that
> >flag, you can ask the hosting jabberd to simply queue up the packets,
> >until the flag is released and you flush the queue by sending the
> >deliver(NULL, NULL).
> > If this does fix things, then I'll incorporate the patch into cvs.
> > David
> >On Mon, 2003-11-10 at 15:14, Frank Seesink wrote:
> >>More info regarding the segfault caused by using -D under Cygwin:
> >>I have tracked things down to line 826 in ./jabberd/mio.c (indicated
> >>with <===):
> >> log_debug(ZONE,"mio while loop top");
> >> /* if we are closing down, exit the loop */
> >> if(mio__data->shutdown == 1 && mio__data->master__list == NULL)
> >> break;
> >> /* wait for a socket event */
> >> FD_SET(mio__data->zzz,&rfds); /* include our wakeup socket */
> >> if(bcast > 0)
> >> FD_SET(bcast,&rfds); /* optionally include our
> >>announcements socket
> >> retval = pth_select(maxfd+1, &rfds, &wfds, NULL, NULL); <===
> >> /* if retval is -1, fd sets are undefined across all platforms */
> >> log_debug(ZONE,"mio while loop, working");
> >>Apparently this call to pth_select() is making Jabberd go BOOM! right on
> >>(Verified this by adding a few more log_debug() lines just before and
> >>after the offending call, and sure enough, got up to but not past
> >>Did some Googling and best I could find was the following thread:
> >> http://firstname.lastname@example.org/msg00052.html
> >>which would seem to indicate that possibly enough data is being pushed
> >>onto the run-time stack to cause the "STACK OVERFLOW". Not sure why
> >>simply enabling debug mode would do this, as all it does is throw out
> >>statements (and why does this happen under Cygwin but apparently not
> >>under Linux/etc.?).
> >>As written in discussion thread listed above:
> >>There are only one good reason I can think of which cause the stack
> >>overflow in such a "simple thread": Some of your functions or functions
> >>inside some other libraries (libc, etc.) use large variables on the
> >>stack. In C, every variable not declared "static" in a function is per
> >>default allocated from the run-time stack. So, if you have a simple
> >>"char buf[SIZE]" somewhere and SIZE is a few KB in size, this noticably
> >>fills the stack of the thread while the function's scope is active.
> >>Looked at the code for debug_log() in ./jabberd/log.c, which is
> >>basically what's called. log_debug is just a macro that resolves to a
> >>conditional check to see if debug_flag is set, in which case run
> >>debug_log() is called (see ./jabberd/jabberd.h lines 109-113).
> >>Only thing I see is the declarations at the beginning of debug_log():
> >> va_list ap;
> >> char message[MAX_LOG_SIZE];
> >> char *pos, c = '\0';
> >> int offset;
> >>which might push a good bit of data on the stack depending on what the
> >>size of the va_list type is and the value of MAX_LOG_SIZE (which is 1024
> >>as seen on line 105 in jabberd.h). But if that's the cause, I don't
> >>think I'd be seeing the last debug message ("mio while loop top") as the
> >>program should be bombing out as the code enters debug_log(). And
> >>considering this function is called, entered, run, and returned, any
> >>values it pushed on the stack are popped before continuing.
> >>The only other thing I see that might affect the run-time stack are the
> >>calls to FD_SET(), which I'm not quite sure how they resolve. All caps
> >>indicates a #define, but did a grep through the code and found nothing.
> >> Looked at the GNU Pth docs, and nothing there except references to
> >>lower-case 'fd_set' var type. Googling makes me think this is some kind
> >>of Unix standard connected with the select() function (which appears to
> >>be superceded/replaced by GNU Pth where it's used), so not quite sure
> >>how one plays with the other. But maybe FD_SET under Cygwin pushes more
> >>data onto the stack than it does under *nix? But does turning on debug
> >>output really cause this? Not sure they're connected when I look at the
> >>Guess at this point I'm kind of at a loss. Looks like serious reading
> >>time to try and get up to speed on all this. But if anyone out there--
> >>unlike me out on the fringes--has intimate knowledge of this code or
> >>just the whole pthread vs. GNU Pth function calls, I'd love to get some
> >>insight. Thanks in advance for reading this far and for any help you
> >>can provide.
> >>ACCESSING VARIABLES FROM OUTSIDE COMPILED MODULE UNDER CYGWIN
> >>AND MU-Conference
> >>After noting lines 109-113 in ./jabberd/jabberd.h, it occurred to me
> >>that jabberd.exe is compiled slightly differently under Cygwin than it
> >>is under *nix. *nix version just checks debug_flag var directly (which
> >>is declared in ./jabberd/log.c), whereas Cygwin version calls a trivial
> >>function to do same. (NOTE: Did a grep on all the jabberd code, and
> >>this is the ONLY reference to __CYGWIN__ I can find in the entire source
> >>tree!! So is this really the only difference in code now?)
> >>Not sure why that's necessary, but removing this conditional, using just
> >>the *nix version of the #define, and re-compiling gave a few hiccups.
> >>Had to add a line to the Makefile to add one more export variable for
> >>doing the non-*nix build of export lib. But even then things weren't
> >>100% right, as running jabberd.exe gave issues.
> >>I suspect this all ties in with the way dynamic libraries can hook back
> >>into variables exported from executables in *nix but trying to do
> >>something similar under Cygwin gives all kinds of headaches (see post
> >>from 6Nov2003 for more info). And this simple "wrapper" function might
> >>be a trick, possibly because under Cygwin functions can be exported but
> >>variables cannot? (That's a question, not a statement.) I have no
> >>clue. So I've left this alone for now.
> >>But this might explain why MU-Conference v0.52 blows up on me as well,
> >>whereas v0.3 does not. MU-C v0.52 appears to try and connect back into
> >>a variable deliver__flag, which is defined in ./jabberd/deliver.c and
> >>compiled into jabberd.exe. I added this variable to the export list via
> >>the Makefile, which allows MU-C v0.52 to compile/link against
> >>./jabberd/jabberd.a just fine, but MU-C still blows sky high when a room
> >>is created. However, MU-C v0.3 suffers none of these issues, and
> >>compiles fine without that entry, implying MU-C v0.3 does NOT try to
> >>look at deliver__flag. Anyway, just more observations.
> >>Frank Seesink wrote:
> >>>Ok, I admit it. I'm kind of on a mission. At this point Jabberd
> >>>1.4.3CVS compiles/links/runs the same under Cygwin as it does on other
> >>>*nix platforms, with the one exception of running in debug mode (using
> >>>the -D switch).
> >>>So let me ask this, as I'm just starting to dig into the source code
> >>>itself. Can anyone steer me in the right direction as to why, whenever
> >>>I attempt to fire up Jabberd in debug mode, I see the following:
> >>>$ ./jabberd/jabberd.exe -D
> >>>Sat Nov 8 18:44:11 2003 mio.c:787 MIO is starting up
> >>>Sat Nov 8 18:44:11 2003 mio.c:816 mio while loop top
> >>>**Pth** STACK OVERFLOW: thread pid_t=0xa040750, name="unknown"
> >>>Segmentation fault
> >>>This happens regardless of whether I have configured/built jabberd with
> >>>(--enable-ssl) or without SSL support. So I've ruled that out at least.
> >>>It fails with the generic jabber.xml config. Basically, I have not
> >>>been able to get Jabberd to fire up if I use the -D switch.
> >>>The actual pid_t number may vary (haven't been paying enough attention
> >>>to notice if it changes or if there's a pattern to be honest), but the
> >>>sequence of messages is always the same. Jabberd starts and dies in the
> >>>blink of an eye.
> >>>However, simply NOT running in debug mode avoids ALL this, and I've had
> >>>a Jabber server running for weeks at a time in production (granted, low
> >>>user load, but still), usually only restarting when I reboot the Windows
> >>>XP Pro box it's running on.
> >>>Has anyone else experienced this kind of behavior on any other platform?
> >>> Any insight into where to look? I realize running Cygwin under
> >>>Windows, I'm working in a cludged environment at best. But figured it
> >>>best to ask you good folks if you've ever seen this before, as you might
> >>>save me a great deal of time in finding the source of the problem...even
> >>>if the end result is just "It's a limitation of Cygwin/Windows. Suck it
> >>>up." :-)
> >>>In the meantime, the hunt continues...
> >>jdev mailing list
> >>jdev at jabber.org
> include ../../platform-settings
> CFLAGS:=$(CFLAGS) -I../../jabberd -I../include
> # Debug/Experimental
> #CFLAGS:=$(CFLAGS) -pipe -Os -I../../jabberd -I../include
> #LIBS:=$(LIBS) /usr/local/lib/ccmalloc-gcc.o -lccmalloc
> #LIBS:=$(LIBS) -lmemusage
> #LIBS:=$(LIBS) -lmcheck
> conference_OBJECTS=conference.o conference_room.o conference_user.o utils.o xdata.o admin.o roles.o xdb.o hash.o
> all: conference
> conference: $(conference_OBJECTS)
> ifeq ($(__CYGWIN__),1)
> $(CC) $(CFLAGS) $(MCFLAGS) -o mu-conference.dll $(conference_OBJECTS) ../../jabberd/jabberd.a $(LDFLAGS) $(LIBS)
> $(CC) $(CFLAGS) $(MCFLAGS) -o mu-conference.so $(conference_OBJECTS) $(LDFLAGS) $(LIBS)
> static: $(conference_OBJECTS)
> single: $(conference_OBJECTS)
> ifeq ($(__CYGWIN__),1)
> rm -f $(conference_OBJECTS) mu-conference.dll
> rm -f $(conference_OBJECTS) mu-conference.so *~
Email: dsutton at legend.co.uk
Jabber: peregrine at legend.net.uk
More information about the JDev