Hi Ludovic, thanks for your reply On Mon, 2008-06-30 at 21:42 +0200, Ludovic Courtès wrote: > Hi, > > Roland Orre writes: > > > I need hints on how to find occasional segmentation faults > > and missed GC references. This relates to 64 bit machines. > > Is it x86-64, IA64, or something else? What I'm trying to get working now is on x86-64 (Opteron) to be able to run it on a big large memory computer IA64 (Itanium2). > The Git repository (the future 1.8.6) contains an important bug fix for > IA64. I think there were x86-64-related during the 1.8.x series, too. > Thus, I'd suggest using the latest Guile on these platforms. That's a good hint. I'll check out the code and see if I can locate the changes. Problem is that I've considered switching a few years, but since the array API changed from 1.8 it would imply a major rework, possibly causing other issues as the old array API is used in hundreds of places in my code, and there may be other API changes as well. > > My modules have worked perfectly fine on 32 bit machines but > > on 64 bits I occasionally get something like > > # if I run that > > code fast, which indicates a threading problem (I do not use > > threads in this case, but seems like guile does). This does > > not occur if I run guile through gdb. This happens not too often > > but it seems to be related to string->symbol symbol->string. > > Is it reproducible? This is not really reproducable. If I execute the lines quick by loading it as a file then it occurs with about 60 % probability. If I execute the lines in that file, line by one, it does not occur. To come around that I can see that it may be complaining at e.g. a string->symbol conversion. If I then simply replace that with the id i.e. (lambda(x) x) then it doesn't happen but probably this relates to the big issue below. > > My bigger problem though is frequently occurring > > segmentation faults or otherwise corrupt pointers. > > > > If I then run the code in gdb I can get > > Program received signal SIGSEGV, Segmentation fault. > > [Switching to Thread 0x2ae316e4f070 (LWP 6699)] > > 0x00002ae314b9d091 in scm_gc_mark_dependencies (p=0x97c) at > > gc-mark.c:441 > > 441 if (SCM_GC_MARK_P (ptr)) > > Current language: auto; currently c > > Likewise, is it reproducible? Can you show the full backtrace (it > should show where 0x97c comes from)? This is fully reproducible when it happens as shown. Most often I get a segmentation fault like this. I have attached a full gdb backtrace from this. This can be produced over and over with only base address differences. Sometimes I've got a pointer to some internal structure like pointing to the procedure of a loop in the middle of a list of numbers for instance, which is kind illogical as that internal structure should not be freed. > Hope this helps, > Ludovic. Best regards Roland