On Mon, Aug 8, 2022, 9:14 AM Andrea Corallo <akrl@sdf.org> wrote:
Lynn Winebarger <owinebar@gmail.com> writes:

> On Mon, Aug 8, 2022, 3:44 AM Andrea Corallo <akrl@sdf.org> wrote:
>
>  Lynn Winebarger <owinebar@gmail.com> writes:
>
>  > On Fri, Aug 5, 2022, 10:42 AM Andrea Corallo <akrl@sdf.org> wrote:
>  >
>  >  Andrea Corallo <akrl@sdf.org> writes:
>  >
>  >  > Lars Ingebrigtsen <larsi@gnus.org> writes:
>  >  >
>  >  >> Joseph Mingrone <jrm@ftfl.ca> writes:
>  >  >>
>  >  >>> Could 261d6af have broken --with-native-compilation builds on 32-bit
>  >  >>> systems?  This is what I see building in a clean FreeBSD/i386 13.0
>  >  >>> jail using 261d6af:
>  >  >>> http://pkg.ftfl.ca/data/13i386-default/2022-08-04_22h38m28s/logs/errors/emacs-devel-29.0.50.20220804,2.log
>  >  >>
>  >  >> I guess these are the error messages?
>  >  >>
>  >  >> emacs: Trying to load incoherent dumped eln file
>  >  >>
>  >
>  /wrkdirs/usr/ports/editors/emacs-devel/work-full/emacs-261d6af/native-lisp/29.0.50-7cc1a43d/preloaded/ediff-hook-0b92f1a2-f843c8a0.eln

>  > 
>  >  >>
>  >  >> I don't know what that means; Andrea added to the CCs.
>  >  >
>  >  > It's very surprising to see 261d6af causing this side effect, at least I
>  >  > don't see why should effect the 32bit build only.
>  >  >
>  >  > I'm trying to reproduce it on my 32bit env.
>  >
>  >  I confirm the build it's broken on my 32bit env as well, (but not on the
>  >  64 one).
>  >
>  >  Loading the second dump, while we are relocating the ediff-hook
>  >  compilation unit, we realize (@ pdumper.c:5304) that its file field is
>  >  not a cons as expected but just a string.
>  >
>  >  Now the question is why this is not fixed-up in loadup.el:477 as for the
>  >  other compilation units?
>  >
>  > Are you sure it's actually fixed up in the other compilation units?
>
>  Indeed, otherwise an error is signaled.
>
>  > This problem should be signaled by loadup if there are any NCUs it does not fix up.  It would be a lot easier to
>  diagnose
>  > the problem from there.
>
>  loadup is in charge of fixing up on all CU's file fields, and indeed if
>  something goes wrong in that code an error is signaled.  But evidently
>  this is not the case, so there's something more to understand.
>
> I just looked, and there are 2 possible paths for NCUs to be in the dump without an error being signaled:
> 1 - either the --bin-dest or --eln-dest flag is not specified (or is
> on the command line but empty)

This is not the case in our build.

No, but it is one way the dump can produce an unusable file without any error signaled until an Emacs instance attempts to load it.

> 2 - there is an NCU loaded for which no symbol is bound to a subr in that NCU.

CUs that are not reachable from the function slot of a symbol are
unloaded when GC runs.  We do run GC before dumping so this should not
happen.

Yes, "should" is the operative word there.  Why not validate the condition before writing the dump file?  If not in loadup, then in the procedure that records the NCU in the dump?  Why wait until load-time to catch something that was almost certainly (barring user performing surgery on the dump file) the case when the dump was produced?  Just put the same check before the "write" operation that is  done immediately after the corresponding "read" operation.

Lynn