Christopher Baines writes: > Ludovic Courtès writes: > >> Hi Christopher, >> >> Christopher Baines skribis: >> >>> I've attached a script that when run should reproduce the issue. I >>> extracted the code relating to lint warnings from the Guix Data >>> Service. The script attached runs this code twice against the inferior, >>> once will often be enough to cause it to crash, but twice should >>> reproduce it more reliably. >> >> Thanks a lot. >> >> Here’s a backtrace from the core dumped by the inferior: > > ... > >> It could be an unbounded growth of libgc’s finalizer table or our weak >> tables as we experienced in . >> >> We should be able to reproduce it with something like: >> >> guix time-machine --commit=d523eb5c9c2659cbbaf4eeef3691234ae527ee6a -- \ >> lint -c inputs-should-be-native,license,mirror-url,source-file-name,source-unstable-tarball,derivation,patch-file-names,formatting,synopsis >> >> In top one can see that heap usage keeps growing, which may well be a >> bug in Guix proper rather than in Guile… but it doesn’t crash. >> >> I would propose three actions here: >> >> 1. Run linters un ‘gcprof’ to see what’s eating memory and hopefully >> find and address the leak. As a start, maybe just start reducing >> the list of checkers to see if there’s one of them that’s causing >> it. >> >> The ‘derivation’ checker is definitely responsible for a lot of the >> heap consumption because of the various caches in (guix packages) & >> co. Perhaps add calls to ‘invalidate-derivation-caches!’ as in >> (gnu ci). >> >> 2. Work around the problem in Guix Data Service by running, say, one >> inferior per checker instead of one inferior for all checkers for >> all packages. >> >> 3. If #1 didn’t help, let’s see if we can isolate a Guile weak-table >> bug or something like that. >> >> Thoughts? > > Thanks, that's useful to know. > > I think I've now managed to find a way of reproducing this without the > inferior getting in the way. I was testing if triggering garbage > collection in Guile would help avoid the problem, but actually it seems > to cause it. I guess given the mentions of GC in the above stacktrace, > and the major version change of libgc, some GC related bug seems quite > likely here. > > I've been testing with a checkout of Guix built with Guix from the > core-updates branch. I think that provides the same broken Guile that > the guix repl is using. > > When trying to just use a checkout of the core-updates branch, and guile > built from that branch I get the following odd error: > > → ./pre-inst-env /gnu/store/18hp7flyb3yid3yp49i6qcdq0sbi5l1n-guile-3.0.2/bin/guile ./reproduce-core-updates-mmap-PROT_NONE-failed.scm > guile: warning: failed to install locale > warning: failed to load '(gnu packages abiword)': Function not implemented > error: git-fetch: unbound variable > hint: Did you forget `(use-modules (guix git-download))'? > > error: git-version: unbound variable > > > > No idea what's happening there, but when I ./configure and make with > packages from core-updates, I seem to end up with a setup that works: > > This is the guile I'm using: /gnu/store/18hp7flyb3yid3yp49i6qcdq0sbi5l1n-guile-3.0.2/bin/guile > > If you just run the script, you should see: > > → ./pre-inst-env guile ./reproduce-core-updates-mmap-PROT_NONE-failed.scm > > ;;; ("%package-table-setup" #) > mmap(PROT_NONE) failed > Aborted > > > For more information, you can pipe the script to the REPL. What you > should see is that it's slow to compute the lint warnings the first > time, but the subsequent times are quick, and it crashes in one of the > (gc) calls. > > I'm going to try and continue looking in to this, at least it'll be > easier to delve in to guile now that I can directly control what guile > is used. Following up on this, I've built Guile on core-updates with libgc@7 rather than libgc@8 (which is what's used above), and I can't reproduce the issue. So, I'm getting more certain that this is a regression which the libgc upgrade has led to. Would it be feasible to keep guile, or at least the guile Guix uses with libgc@7 for now?