unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* Need help finding heap corruption bug
@ 2006-09-17 18:13 Pat Lasswell
  2006-09-18  8:27 ` Ludovic Courtès
  2006-09-23 11:18 ` Neil Jerram
  0 siblings, 2 replies; 8+ messages in thread
From: Pat Lasswell @ 2006-09-17 18:13 UTC (permalink / raw)



[-- Attachment #1.1: Type: text/plain, Size: 1944 bytes --]

In guile 1.6.8 on an intel mac, I've encountered reliable heap corruption
with steps like the following

% guile
guile> (use-modules (project all))
guile> (set-current-module (resolve-module '(project foo)))
guile> (load "foo.scm")
==> segmentation fault

(project all) is a module that loads all of the prerequisites and submodules
(about 4k lines total) and exports symbols.  These steps happen in the
directory 'project' containing 'main.scm' and 'foo.scm'.

I have two questions.  Is there something intrinsically wrong with with
setting the current module to the one on which I am actively working and
then just reloading that file rather than the entire suite?  (I.e., there
are usually a lot of testing steps between the 'set-current-module' form and
the 'load' form; however, the steps above will always produce a crash.)


My second question is this: given that the answer to 1) is no, how do I
track this pest down?  Placing an undefined identifier, say 'x', at various
points in one of the modules will sometimes produce the appropriate warning,
sometimes produce a segfault, other times a bus error or illegal instruction
or even an abort during gc.  I have run attached to gdb, but I get a deep
stack (131 frames) ending with, e.g.

#0  scm_gc_sweep () at ../../../src/guile-1.6.8/libguile/gc.c:1729
#1  0x0021f573 in scm_igc (what=0x26c720 "cells") at ../../../src/guile-
1.6.8/libguile/gc.c:1161
#2  0x0021fa83 in scm_gc_for_newcell (master=0x273440, freelist=0x273438) at
../../../src/guile-1.6.8/libguile/gc.c:985
#3  0x00238aef in scm_cons (x=0x6826c0, y=0x2974) at ../../../src/guile-
1.6.8/libguile/pairs.c:84
#4  0x00216b87 in scm_deval (x=0xc3028, env=0x682480) at ../../../src/guile-
1.6.8/libguile/eval.c:2835
#5  0x00219af5 in scm_deval (x=0xc3030, env=0x682480) at ../../../src/guile-
1.6.8/libguile/eval.c:2819

Are there guile primitives to verify the heap?

_Any_ help would be appreciated.


Thanks in advance
pat

[-- Attachment #1.2: Type: text/html, Size: 2123 bytes --]

[-- Attachment #2: Type: text/plain, Size: 140 bytes --]

_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-user

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Need help finding heap corruption bug
  2006-09-17 18:13 Need help finding heap corruption bug Pat Lasswell
@ 2006-09-18  8:27 ` Ludovic Courtès
  2006-09-18 14:24   ` Pat Lasswell
  2006-09-23 11:18 ` Neil Jerram
  1 sibling, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2006-09-18  8:27 UTC (permalink / raw)
  Cc: guile-user

Hi,

"Pat Lasswell" <imofftoseethewizard@gmail.com> writes:

> In guile 1.6.8 on an intel mac, I've encountered reliable heap corruption with
> steps like the following

Before investigating any further, does Guile 1.6.8 "works" on Intel
Macs?  Is it under MacOS X?  For instance, did you successfully run the
test suite (type `make check' under the top-level source directory)?

Thanks,
Ludovic.


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Need help finding heap corruption bug
  2006-09-18  8:27 ` Ludovic Courtès
@ 2006-09-18 14:24   ` Pat Lasswell
  2006-09-18 15:08     ` Ludovic Courtès
  2006-09-18 15:08     ` Pat Lasswell
  0 siblings, 2 replies; 8+ messages in thread
From: Pat Lasswell @ 2006-09-18 14:24 UTC (permalink / raw)



[-- Attachment #1.1: Type: text/plain, Size: 1281 bytes --]

make check produces 6 failures.  I haven't tried 1.4, but 1.8 doesn't even
build.

FAIL: goops.test: defining generics: define-generic: creating a new
top-level binding
FAIL: goops.test: defining generics: define-generic: overwriting a top-level
binding to a generic
FAIL: goops.test: defining accessors: define-accessor: creating a new
top-level binding
FAIL: goops.test: defining accessors: define-accessor: overwriting a
top-level binding to an accessor
ERROR: regexp.test: regexp-quote: regexp/extended: (char 254 ? ? ?) -
arguments: ((regular-expression-syntax "make-regexp" "empty (sub)expression"
#f #f))
FAIL: regexp.test: regexp-quote: regexp/extended: (string "aX" 254 ? a? a?)
FAIL: regexp.test: regexp-quote: regexp/extended: string of all chars


On 9/18/06, Ludovic Courtès <ludovic.courtes@laas.fr> wrote:
>
> Hi,
>
> "Pat Lasswell" <imofftoseethewizard@gmail.com> writes:
>
> > In guile 1.6.8 on an intel mac, I've encountered reliable heap
> corruption with
> > steps like the following
>
> Before investigating any further, does Guile 1.6.8 "works" on Intel
> Macs?  Is it under MacOS X?  For instance, did you successfully run the
> test suite (type `make check' under the top-level source directory)?
>
> Thanks,
> Ludovic.
>

[-- Attachment #1.2: Type: text/html, Size: 1714 bytes --]

[-- Attachment #2: Type: text/plain, Size: 140 bytes --]

_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-user

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Need help finding heap corruption bug
  2006-09-18 14:24   ` Pat Lasswell
@ 2006-09-18 15:08     ` Ludovic Courtès
  2006-09-18 15:08     ` Pat Lasswell
  1 sibling, 0 replies; 8+ messages in thread
From: Ludovic Courtès @ 2006-09-18 15:08 UTC (permalink / raw)
  Cc: guile-user

Hi,

(Assuming we're talking about MacOS X on Intel-based macs.  Maybe
`i386-apple-darwin*' is the correct GNU triple?)

"Pat Lasswell" <imofftoseethewizard@gmail.com> writes:

> make check produces 6 failures.  I haven't tried 1.4, but 1.8 doesn't even build.
> FAIL: goops.test: defining generics: define-generic: creating a new top-level
> binding
> FAIL: goops.test: defining generics: define-generic: overwriting a top-level
> binding to a generic
> FAIL: goops.test: defining accessors: define-accessor: creating a new top-level
> binding
> FAIL: goops.test: defining accessors: define-accessor: overwriting a top-level
> binding to an accessor
> ERROR: regexp.test: regexp-quote: regexp/extended: (char 254 ? ? ?) - arguments:
> ((regular-expression-syntax "make-regexp" "empty (sub)expression" #f #f))
> FAIL: regexp.test: regexp-quote: regexp/extended: (string "aX" 254 ? a? a?)
> FAIL: regexp.test: regexp-quote: regexp/extended: string of all chars

These failures are quite worrying, although they do not explain the heap
corruption you are experiencing.  I don't know what the status of Guile
1.6 on OS X (in general) is, so I don't know whether these are "known"
OS X problems, or whether it just shows up on OS X for PCs.  Maybe
someone else can help on this?

As for 1.8, could you report the exact build problems that you have (in
a separate thread)?

Thanks,
Ludovic.


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Need help finding heap corruption bug
  2006-09-18 14:24   ` Pat Lasswell
  2006-09-18 15:08     ` Ludovic Courtès
@ 2006-09-18 15:08     ` Pat Lasswell
  2006-09-18 23:57       ` Kevin Ryde
  1 sibling, 1 reply; 8+ messages in thread
From: Pat Lasswell @ 2006-09-18 15:08 UTC (permalink / raw)



[-- Attachment #1.1: Type: text/plain, Size: 2081 bytes --]

The goops failures are a side effect of a change I made to (initialize
<generic> ...) so that adding a method to no-applicable-method wouldn't
crash; the warning arises from the generic function having one method
instead of none after define-generic is called.  There are details on
bug-guile here:

http://lists.gnu.org/archive/html/bug-guile/2006-09/msg00002.html

It seems that char 254 doesn't behave as expected in regular expressions on
this platform.

I backed out the change to initialize <generic> and make check lists only
the two failures for regexps with char 254, but the same steps still corrupt
the heap.

thanks
pat


On 9/18/06, Pat Lasswell <imofftoseethewizard@gmail.com> wrote:
>
> make check produces 6 failures.  I haven't tried 1.4, but 1.8 doesn't even
> build.
>
> FAIL: goops.test: defining generics: define-generic: creating a new
> top-level binding
> FAIL: goops.test: defining generics: define-generic: overwriting a
> top-level binding to a generic
> FAIL: goops.test: defining accessors: define-accessor: creating a new
> top-level binding
> FAIL: goops.test: defining accessors: define-accessor: overwriting a
> top-level binding to an accessor
> ERROR: regexp.test: regexp-quote: regexp/extended: (char 254 ? ? ?) -
> arguments: ((regular-expression-syntax "make-regexp" "empty (sub)expression"
> #f #f))
> FAIL: regexp.test: regexp-quote: regexp/extended: (string "aX" 254 ? a?
> a?)
> FAIL: regexp.test: regexp-quote: regexp/extended: string of all chars
>
>
>
> On 9/18/06, Ludovic Courtès <ludovic.courtes@laas.fr> wrote:
> >
> > Hi,
> >
> > "Pat Lasswell" <imofftoseethewizard@gmail.com> writes:
> >
> > > In guile 1.6.8 on an intel mac, I've encountered reliable heap
> > corruption with
> > > steps like the following
> >
> > Before investigating any further, does Guile 1.6.8 "works" on Intel
> > Macs?  Is it under MacOS X?  For instance, did you successfully run the
> > test suite (type `make check' under the top-level source directory)?
> >
> > Thanks,
> > Ludovic.
> >
>
>

[-- Attachment #1.2: Type: text/html, Size: 3061 bytes --]

[-- Attachment #2: Type: text/plain, Size: 140 bytes --]

_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-user

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Need help finding heap corruption bug
  2006-09-18 15:08     ` Pat Lasswell
@ 2006-09-18 23:57       ` Kevin Ryde
  0 siblings, 0 replies; 8+ messages in thread
From: Kevin Ryde @ 2006-09-18 23:57 UTC (permalink / raw)
  Cc: guile-user

"Pat Lasswell" <imofftoseethewizard@gmail.com> writes:
>
> It seems that char 254 doesn't behave as expected in regular expressions on
> this platform.

Trying all bytes is probably bogus in a multibyte locale, but if we're
still in "C" locale at that point then it ought to work.  (Maybe
should force (setlocale "C") just in case for those tests.)


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Need help finding heap corruption bug
  2006-09-17 18:13 Need help finding heap corruption bug Pat Lasswell
  2006-09-18  8:27 ` Ludovic Courtès
@ 2006-09-23 11:18 ` Neil Jerram
       [not found]   ` <ca8589e00609231109j36f07390i8fe88c8d0e396079@mail.gmail.com>
  1 sibling, 1 reply; 8+ messages in thread
From: Neil Jerram @ 2006-09-23 11:18 UTC (permalink / raw)
  Cc: guile-user

"Pat Lasswell" <imofftoseethewizard@gmail.com> writes:

> I have two questions.  Is there something intrinsically wrong with with setting
> the current module to the one on which I am actively working and then just
> reloading that file rather than the entire suite? 

No.

>
> My second question is this: given that the answer to 1) is no, how do I track
> this pest down?

Based on the other messages in this thread, it doesn't sound like
there's an obvious overall problem with Guile on MacOS.  It may still
be MacOS-specific, though.  Do you know whether the same steps cause
problems on Linux?

>  Placing an undefined identifier, say 'x', at various points in
> one of the modules will sometimes produce the appropriate warning, sometimes
> produce a segfault, other times a bus error or illegal instruction or even an
> abort during gc.  I have run attached to gdb, but I get a deep stack (131
> frames) ending with, e.g.
>
> #0  scm_gc_sweep () at ../../../src/guile-1.6.8/libguile/gc.c:1729
> #1  0x0021f573 in scm_igc (what=0x26c720 "cells") at ../../../src/guile-1.6.8/
> libguile/gc.c:1161
> #2  0x0021fa83 in scm_gc_for_newcell (master=0x273440, freelist=0x273438) at ..
> /../../src/guile- 1.6.8/libguile/gc.c:985
> #3  0x00238aef in scm_cons (x=0x6826c0, y=0x2974) at ../../../src/guile-1.6.8/
> libguile/pairs.c:84
> #4  0x00216b87 in scm_deval (x=0xc3028, env=0x682480) at ../../../src/
> guile-1.6.8/libguile/eval.c:2835
> #5  0x00219af5 in scm_deval (x=0xc3030, env=0x682480) at ../../../src/
> guile-1.6.8/libguile/eval.c:2819
>
> Are there guile primitives to verify the heap?

Apart from (gc), no. :-)

The only approach I can think of right now is to look at the detail of
why scm_gc_sweep is aborting and work back from there.  Anyone have
any other ideas?

Regards,
     Neil



_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Need help finding heap corruption bug
       [not found]   ` <ca8589e00609231109j36f07390i8fe88c8d0e396079@mail.gmail.com>
@ 2006-09-23 18:10     ` Pat Lasswell
  0 siblings, 0 replies; 8+ messages in thread
From: Pat Lasswell @ 2006-09-23 18:10 UTC (permalink / raw)



[-- Attachment #1.1: Type: text/plain, Size: 2746 bytes --]

I found the cause of the bug, but more by happenstance than intent.

I had attempted to do something clever during metaclass initialization --
attempting to ensure all class instantiations had a specific slot.  I consed
a new initargs parameter but failed to produce exactly what the base
implementation expected.  I didn't track down the root cause, opting rather
to have a less clever, but probably better-structured approach of adding
that slot to an instance of the metaclass from which all clients derive.
Given the obtuse code path necessary to trigger the corruption, I am content
to leave it be; if someone thinks it worth investigating, let me know and
I'll track it down.

Thanks for all the feedback.  I'll keep (gc) in mind if something like this
happens again.

pat
On 9/23/06, Neil Jerram < neil@ossau.uklinux.net> wrote:
>
> "Pat Lasswell" < imofftoseethewizard@gmail.com> writes:
>
> > I have two questions.  Is there something intrinsically wrong with with
> setting
> > the current module to the one on which I am actively working and then
> just
> > reloading that file rather than the entire suite?
>
> No.
>
> >
> > My second question is this: given that the answer to 1) is no, how do I
> track
> > this pest down?
>
> Based on the other messages in this thread, it doesn't sound like
> there's an obvious overall problem with Guile on MacOS.  It may still
> be MacOS-specific, though.  Do you know whether the same steps cause
> problems on Linux?
>
> >  Placing an undefined identifier, say 'x', at various points in
> > one of the modules will sometimes produce the appropriate warning,
> sometimes
> > produce a segfault, other times a bus error or illegal instruction or
> even an
> > abort during gc.  I have run attached to gdb, but I get a deep stack
> (131
> > frames) ending with, e.g.
> >
> > #0  scm_gc_sweep () at ../../../src/guile-1.6.8/libguile/gc.c:1729
> > #1  0x0021f573 in scm_igc (what=0x26c720 "cells") at ../../../src/guile-
> 1.6.8/
> > libguile/gc.c:1161
> > #2  0x0021fa83 in scm_gc_for_newcell (master=0x273440,
> freelist=0x273438) at ..
> > /../../src/guile- 1.6.8/libguile/gc.c:985
> > #3  0x00238aef in scm_cons (x=0x6826c0, y=0x2974) at ../../../src/guile-
> 1.6.8 /
> > libguile/pairs.c:84
> > #4  0x00216b87 in scm_deval (x=0xc3028, env=0x682480) at ../../../src/
> > guile-1.6.8/libguile/eval.c:2835
> > #5  0x00219af5 in scm_deval (x=0xc3030, env=0x682480) at ../../../src/
> > guile-1.6.8/libguile/eval.c:2819
> >
> > Are there guile primitives to verify the heap?
>
> Apart from (gc), no. :-)
>
> The only approach I can think of right now is to look at the detail of
> why scm_gc_sweep is aborting and work back from there.  Anyone have
> any other ideas?
>
> Regards,
>      Neil
>
>

[-- Attachment #1.2: Type: text/html, Size: 3629 bytes --]

[-- Attachment #2: Type: text/plain, Size: 140 bytes --]

_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-user

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-09-23 18:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-17 18:13 Need help finding heap corruption bug Pat Lasswell
2006-09-18  8:27 ` Ludovic Courtès
2006-09-18 14:24   ` Pat Lasswell
2006-09-18 15:08     ` Ludovic Courtès
2006-09-18 15:08     ` Pat Lasswell
2006-09-18 23:57       ` Kevin Ryde
2006-09-23 11:18 ` Neil Jerram
     [not found]   ` <ca8589e00609231109j36f07390i8fe88c8d0e396079@mail.gmail.com>
2006-09-23 18:10     ` Pat Lasswell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).