unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
@ 2021-02-02 11:11 Andy Moreton
  2021-02-03 20:51 ` akrl--- via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Andy Moreton @ 2021-02-02 11:11 UTC (permalink / raw)
  To: 46256

Hi,

I have built emacs native-comp branch for 64bit Mingw64 with
NATIVE_FULL_AOT=1 (out of tree, so build dir != source dir).

I notice that if I run the built emacs from the build dir then the
prebuilt .eln files are ignored, and async compilation of the .eln file
happens again to add them to the user eln-cache dir.

The prebuilt .eln files are not found in the user eln-cache (expected)
or the installed emacs directory (also expected), but it looks like it
does not also check the build dir (relative to the running emacs rather
than relative to the install prefix).

Running from the build dir without installing is common for developers
building from source, so it would be useful to keep this working with
native AOT builds.

     AndyM






^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-02 11:11 bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree Andy Moreton
@ 2021-02-03 20:51 ` akrl--- via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-02-04  0:03   ` Andy Moreton
  0 siblings, 1 reply; 179+ messages in thread
From: akrl--- via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-02-03 20:51 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 46256

Andy Moreton <andrewjmoreton@gmail.com> writes:

> Hi,
>
> I have built emacs native-comp branch for 64bit Mingw64 with
> NATIVE_FULL_AOT=1 (out of tree, so build dir != source dir).
>
> I notice that if I run the built emacs from the build dir then the
> prebuilt .eln files are ignored, and async compilation of the .eln file
> happens again to add them to the user eln-cache dir.
>
> The prebuilt .eln files are not found in the user eln-cache (expected)
> or the installed emacs directory (also expected), but it looks like it
> does not also check the build dir (relative to the running emacs rather
> than relative to the install prefix).
>
> Running from the build dir without installing is common for developers
> building from source, so it would be useful to keep this working with
> native AOT builds.
>
>     AndyM

Hi Andy,

could you share the values of PATH_DUMPLOADSEARCH and
PATH_REL_LOADSEARCH from your epaths.h ?

Thanks

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-03 20:51 ` akrl--- via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-02-04  0:03   ` Andy Moreton
  2021-02-04  1:40     ` Andy Moreton
  2021-02-05 14:39     ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 2 replies; 179+ messages in thread
From: Andy Moreton @ 2021-02-04  0:03 UTC (permalink / raw)
  To: 46256

On Wed 03 Feb 2021, akrl--- via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:

> Andy Moreton <andrewjmoreton@gmail.com> writes:
>
>> Hi,
>>
>> I have built emacs native-comp branch for 64bit Mingw64 with
>> NATIVE_FULL_AOT=1 (out of tree, so build dir != source dir).
>>
>> I notice that if I run the built emacs from the build dir then the
>> prebuilt .eln files are ignored, and async compilation of the .eln file
>> happens again to add them to the user eln-cache dir.
>>
>> The prebuilt .eln files are not found in the user eln-cache (expected)
>> or the installed emacs directory (also expected), but it looks like it
>> does not also check the build dir (relative to the running emacs rather
>> than relative to the install prefix).
>>
>> Running from the build dir without installing is common for developers
>> building from source, so it would be useful to keep this working with
>> native AOT builds.
>>
>>     AndyM
>
> Hi Andy,
>
> could you share the values of PATH_DUMPLOADSEARCH and
> PATH_REL_LOADSEARCH from your epaths.h ?
>
> Thanks
>
>   Andrea

Native branch checkout is in: "c:/emacs/git/emacs/native/"

"c:/emacs/git/emacs/native/build/mingw64-x86_64-O2/src/epaths.h" contains:

#define PATH_DUMPLOADSEARCH "C:/emacs/git/emacs/native/lisp"
#define PATH_REL_LOADSEARCH "28.0.50/lisp"


HTH,

    AndyM






^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-04  0:03   ` Andy Moreton
@ 2021-02-04  1:40     ` Andy Moreton
  2021-02-05 14:42       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-02-05 23:55       ` Andy Moreton
  2021-02-05 14:39     ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 2 replies; 179+ messages in thread
From: Andy Moreton @ 2021-02-04  1:40 UTC (permalink / raw)
  To: 46256

On Thu 04 Feb 2021, Andy Moreton wrote:

> On Wed 03 Feb 2021, akrl--- via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:
>>
>> Hi Andy,
>>
>> could you share the values of PATH_DUMPLOADSEARCH and
>> PATH_REL_LOADSEARCH from your epaths.h ?
>>
>> Thanks
>>
>>   Andrea
>
> Native branch checkout is in: "c:/emacs/git/emacs/native/"
>
> "c:/emacs/git/emacs/native/build/mingw64-x86_64-O2/src/epaths.h" contains:
>
> #define PATH_DUMPLOADSEARCH "C:/emacs/git/emacs/native/lisp"
> #define PATH_REL_LOADSEARCH "28.0.50/lisp"

I've bootstrapped again after the recent hash shortening to ensure my build
is up to date, from commit 1f626e9662d8120acd5a937f847123cc2b8c6e31. The
paths above are unchanged.

Running this from the build dir, I see messages like:

error in process sentinel: Native elisp load failed: "file does not exists",
 "c:/home/ajm/.emacs.d/eln-cache/28.0.50-e2ae3598/hl-line-e67628ec-664ef650.eln"

This suggests that the AOT .eln files are not being found. It should find:

c:/emacs/git/emacs/native/build/mingw64-x86_64-O2/native-lisp/28.0.50-e2ae3598/hl-line-8fa29c14-664ef650.eln

The middle hash (e67628ec vs. 8fa29c14) is not the same - any idea why ?

   AndyM






^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-04  0:03   ` Andy Moreton
  2021-02-04  1:40     ` Andy Moreton
@ 2021-02-05 14:39     ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-02-05 15:08       ` Eli Zaretskii
  1 sibling, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-02-05 14:39 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 46256

[-- Attachment #1: Type: text/plain, Size: 1533 bytes --]

Andy Moreton <andrewjmoreton@gmail.com> writes:

> On Wed 03 Feb 2021, akrl--- via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:
>
>> Andy Moreton <andrewjmoreton@gmail.com> writes:
>>
>>> Hi,
>>>
>>> I have built emacs native-comp branch for 64bit Mingw64 with
>>> NATIVE_FULL_AOT=1 (out of tree, so build dir != source dir).
>>>
>>> I notice that if I run the built emacs from the build dir then the
>>> prebuilt .eln files are ignored, and async compilation of the .eln file
>>> happens again to add them to the user eln-cache dir.
>>>
>>> The prebuilt .eln files are not found in the user eln-cache (expected)
>>> or the installed emacs directory (also expected), but it looks like it
>>> does not also check the build dir (relative to the running emacs rather
>>> than relative to the install prefix).
>>>
>>> Running from the build dir without installing is common for developers
>>> building from source, so it would be useful to keep this working with
>>> native AOT builds.
>>>
>>>     AndyM
>>
>> Hi Andy,
>>
>> could you share the values of PATH_DUMPLOADSEARCH and
>> PATH_REL_LOADSEARCH from your epaths.h ?
>>
>> Thanks
>>
>>   Andrea
>
> Native branch checkout is in: "c:/emacs/git/emacs/native/"
>
> "c:/emacs/git/emacs/native/build/mingw64-x86_64-O2/src/epaths.h" contains:
>
> #define PATH_DUMPLOADSEARCH "C:/emacs/git/emacs/native/lisp"
> #define PATH_REL_LOADSEARCH "28.0.50/lisp"
>
>
> HTH,
>
>     AndyM

Hi Andy could you give it a go to the following blind patch?

Thanks

  Andrea


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 46256.patch --]
[-- Type: text/x-diff, Size: 982 bytes --]

diff --git a/src/comp.c b/src/comp.c
index 289d89d37d..980462b520 100644
--- a/src/comp.c
+++ b/src/comp.c
@@ -433,6 +433,12 @@ #define TEXT_DATA_RELOC_EPHEMERAL_SYM "text_data_reloc_eph"
 #define TEXT_OPTIM_QLY_SYM "text_optim_qly"
 #define TEXT_FDOC_SYM "text_data_fdoc"
 
+#ifdef WINDOWSNT
+#define DIR_SLASH "\\"
+#else
+#define DIR_SLASH "/"
+#endif
+
 #define STR_VALUE(s) #s
 #define STR(s) STR_VALUE (s)
 
@@ -4032,9 +4038,11 @@ DEFUN ("comp-el-to-eln-filename", Fcomp_el_to_eln_filename,
     {
       Lisp_Object sys_re =
 	concat2 (build_string ("\\`[[:ascii:]]+"),
-		 Fregexp_quote (build_string ("/" PATH_REL_LOADSEARCH "/")));
+		 Fregexp_quote (build_string (DIR_SLASH PATH_REL_LOADSEARCH
+					      DIR_SLASH)));
       loadsearch_re_list =
-	list2 (sys_re, Fregexp_quote (build_string (PATH_DUMPLOADSEARCH "/")));
+	list2 (sys_re, Fregexp_quote (build_string (PATH_DUMPLOADSEARCH
+						    DIR_SLASH)));
     }
 
   Lisp_Object lds_re_tail = loadsearch_re_list;

^ permalink raw reply related	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-04  1:40     ` Andy Moreton
@ 2021-02-05 14:42       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-02-05 20:59         ` Andy Moreton
  2021-02-05 23:55       ` Andy Moreton
  1 sibling, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-02-05 14:42 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 46256

Andy Moreton <andrewjmoreton@gmail.com> writes:

> On Thu 04 Feb 2021, Andy Moreton wrote:
>
>> On Wed 03 Feb 2021, akrl--- via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:
>>>
>>> Hi Andy,
>>>
>>> could you share the values of PATH_DUMPLOADSEARCH and
>>> PATH_REL_LOADSEARCH from your epaths.h ?
>>>
>>> Thanks
>>>
>>>   Andrea
>>
>> Native branch checkout is in: "c:/emacs/git/emacs/native/"
>>
>> "c:/emacs/git/emacs/native/build/mingw64-x86_64-O2/src/epaths.h" contains:
>>
>> #define PATH_DUMPLOADSEARCH "C:/emacs/git/emacs/native/lisp"
>> #define PATH_REL_LOADSEARCH "28.0.50/lisp"
>
> I've bootstrapped again after the recent hash shortening to ensure my build
> is up to date, from commit 1f626e9662d8120acd5a937f847123cc2b8c6e31. The
> paths above are unchanged.
>
> Running this from the build dir, I see messages like:
>
> error in process sentinel: Native elisp load failed: "file does not exists",
>  "c:/home/ajm/.emacs.d/eln-cache/28.0.50-e2ae3598/hl-line-e67628ec-664ef650.eln"
>
> This suggests that the AOT .eln files are not being found. It should find:

AFAIK you should not get any error but the eln should be recompiled
automatically.  I guess having updated the hash algorithm the startup is
failing and you need to clean-up completely the build directory
(especially native-lisp/), please retry after a git clean -xfd.

Thanks

  Andrea

> c:/emacs/git/emacs/native/build/mingw64-x86_64-O2/native-lisp/28.0.50-e2ae3598/hl-line-8fa29c14-664ef650.eln
>
> The middle hash (e67628ec vs. 8fa29c14) is not the same - any idea why ?
>
>    AndyM
>
>
>
>
>

-- 
akrl@sdf.org





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-05 14:39     ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-02-05 15:08       ` Eli Zaretskii
  0 siblings, 0 replies; 179+ messages in thread
From: Eli Zaretskii @ 2021-02-05 15:08 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton

> Cc: 46256@debbugs.gnu.org
> Date: Fri, 05 Feb 2021 14:39:58 +0000
> From:  Andrea Corallo via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> 
> > Native branch checkout is in: "c:/emacs/git/emacs/native/"
> >
> > "c:/emacs/git/emacs/native/build/mingw64-x86_64-O2/src/epaths.h" contains:
> >
> > #define PATH_DUMPLOADSEARCH "C:/emacs/git/emacs/native/lisp"
> > #define PATH_REL_LOADSEARCH "28.0.50/lisp"
> >
> >
> > HTH,
> >
> >     AndyM
> 
> Hi Andy could you give it a go to the following blind patch?

You assume that Windows programs don't understand "/" as a directory
separator?  They do.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-05 14:42       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-02-05 20:59         ` Andy Moreton
  0 siblings, 0 replies; 179+ messages in thread
From: Andy Moreton @ 2021-02-05 20:59 UTC (permalink / raw)
  To: 46256

On Fri 05 Feb 2021, Andrea Corallo via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:

> Andy Moreton <andrewjmoreton@gmail.com> writes:
>
>> On Thu 04 Feb 2021, Andy Moreton wrote:
>>
>>> On Wed 03 Feb 2021, akrl--- via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:
>>>>
>>>> Hi Andy,
>>>>
>>>> could you share the values of PATH_DUMPLOADSEARCH and
>>>> PATH_REL_LOADSEARCH from your epaths.h ?
>>>>
>>>> Thanks
>>>>
>>>>   Andrea
>>>
>>> Native branch checkout is in: "c:/emacs/git/emacs/native/"
>>>
>>> "c:/emacs/git/emacs/native/build/mingw64-x86_64-O2/src/epaths.h" contains:
>>>
>>> #define PATH_DUMPLOADSEARCH "C:/emacs/git/emacs/native/lisp"
>>> #define PATH_REL_LOADSEARCH "28.0.50/lisp"
>>
>> I've bootstrapped again after the recent hash shortening to ensure my build
>> is up to date, from commit 1f626e9662d8120acd5a937f847123cc2b8c6e31. The
>> paths above are unchanged.
>>
>> Running this from the build dir, I see messages like:
>>
>> error in process sentinel: Native elisp load failed: "file does not exists",
>>  "c:/home/ajm/.emacs.d/eln-cache/28.0.50-e2ae3598/hl-line-e67628ec-664ef650.eln"
>>
>> This suggests that the AOT .eln files are not being found. It should find:
>
> AFAIK you should not get any error but the eln should be recompiled
> automatically.  I guess having updated the hash algorithm the startup is
> failing and you need to clean-up completely the build directory
> (especially native-lisp/), please retry after a git clean -xfd.

All of these experiments are done from a clean tree after "git clean
-xdf", and after removing ~/.emacs.d/eln-cache/*.

>> c:/emacs/git/emacs/native/build/mingw64-x86_64-O2/native-lisp/28.0.50-e2ae3598/hl-line-8fa29c14-664ef650.eln
>>
>> The middle hash (e67628ec vs. 8fa29c14) is not the same - any idea why ?

Also a following obsrvation from this:

a) Initially, the AOT bootstrap creates:
   <build-dir>/native-lisp/28.0.50-e2ae3598/hl-line-8fa29c14-664ef650.eln

b) Running <build-dir>/src/emacs complains about:
   <user-emacs-dir>/eln-cache/28.0.50-e2ae3598/hl-line-e67628ec-664ef650.eln

c) The running emacs then builds:
   ~/.emacs.d/eln-cache/28.0.50-e2ae3598/hl-line-8fa29c14-664ef650.eln

The files (a) and (c) have the same filename, but different sizes and
content.

    AndyM







^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-04  1:40     ` Andy Moreton
  2021-02-05 14:42       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-02-05 23:55       ` Andy Moreton
  2021-02-17 22:39         ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 179+ messages in thread
From: Andy Moreton @ 2021-02-05 23:55 UTC (permalink / raw)
  To: 46256

On Thu 04 Feb 2021, Andy Moreton wrote:

> On Thu 04 Feb 2021, Andy Moreton wrote:
>
>> On Wed 03 Feb 2021, akrl--- via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:
>>>
>>> Hi Andy,
>>>
>>> could you share the values of PATH_DUMPLOADSEARCH and
>>> PATH_REL_LOADSEARCH from your epaths.h ?
>>>
>>> Thanks
>>>
>>>   Andrea
>>
>> Native branch checkout is in: "c:/emacs/git/emacs/native/"
>>
>> "c:/emacs/git/emacs/native/build/mingw64-x86_64-O2/src/epaths.h" contains:
>>
>> #define PATH_DUMPLOADSEARCH "C:/emacs/git/emacs/native/lisp"
>> #define PATH_REL_LOADSEARCH "28.0.50/lisp"
>
> I've bootstrapped again after the recent hash shortening to ensure my build
> is up to date, from commit 1f626e9662d8120acd5a937f847123cc2b8c6e31. The
> paths above are unchanged.
>
> Running this from the build dir, I see messages like:
>
> error in process sentinel: Native elisp load failed: "file does not exists",
>  "c:/home/ajm/.emacs.d/eln-cache/28.0.50-e2ae3598/hl-line-e67628ec-664ef650.eln"
>
> This suggests that the AOT .eln files are not being found. It should find:
>
> c:/emacs/git/emacs/native/build/mingw64-x86_64-O2/native-lisp/28.0.50-e2ae3598/hl-line-8fa29c14-664ef650.eln
>
> The middle hash (e67628ec vs. 8fa29c14) is not the same - any idea why ?

After looking at what `comp-el-to-eln-filename' does, I observe that:

(substring (md5 "c:/emacs/git/emacs/native/lisp/hl-line.el") 0 8)
"e67628ec"

(substring (md5 "//hl-line.el") 0 8)
"8fa29c14"

That matches the two middle hashes seen above.

It looks like `comp-el-to-eln-filename` fails to match the filename
prefix against PATH_DUMPLOADSEARCH. It is using case-sensitive matching,
but on Windows filesystems are case-insensitive.

    AndyM








^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-05 23:55       ` Andy Moreton
@ 2021-02-17 22:39         ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-02-18 20:48           ` Andy Moreton
  0 siblings, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-02-17 22:39 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 46256

Andy Moreton <andrewjmoreton@gmail.com> writes:

> On Thu 04 Feb 2021, Andy Moreton wrote:
>
>> On Thu 04 Feb 2021, Andy Moreton wrote:
>>
>>> On Wed 03 Feb 2021, akrl--- via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:
>>>>
>>>> Hi Andy,
>>>>
>>>> could you share the values of PATH_DUMPLOADSEARCH and
>>>> PATH_REL_LOADSEARCH from your epaths.h ?
>>>>
>>>> Thanks
>>>>
>>>>   Andrea
>>>
>>> Native branch checkout is in: "c:/emacs/git/emacs/native/"
>>>
>>> "c:/emacs/git/emacs/native/build/mingw64-x86_64-O2/src/epaths.h" contains:
>>>
>>> #define PATH_DUMPLOADSEARCH "C:/emacs/git/emacs/native/lisp"
>>> #define PATH_REL_LOADSEARCH "28.0.50/lisp"
>>
>> I've bootstrapped again after the recent hash shortening to ensure my build
>> is up to date, from commit 1f626e9662d8120acd5a937f847123cc2b8c6e31. The
>> paths above are unchanged.
>>
>> Running this from the build dir, I see messages like:
>>
>> error in process sentinel: Native elisp load failed: "file does not exists",
>>  "c:/home/ajm/.emacs.d/eln-cache/28.0.50-e2ae3598/hl-line-e67628ec-664ef650.eln"
>>
>> This suggests that the AOT .eln files are not being found. It should find:
>>
>> c:/emacs/git/emacs/native/build/mingw64-x86_64-O2/native-lisp/28.0.50-e2ae3598/hl-line-8fa29c14-664ef650.eln
>>
>> The middle hash (e67628ec vs. 8fa29c14) is not the same - any idea why ?
>
> After looking at what `comp-el-to-eln-filename' does, I observe that:
>
> (substring (md5 "c:/emacs/git/emacs/native/lisp/hl-line.el") 0 8)
> "e67628ec"
>
> (substring (md5 "//hl-line.el") 0 8)
> "8fa29c14"
>
> That matches the two middle hashes seen above.
>
> It looks like `comp-el-to-eln-filename` fails to match the filename
> prefix against PATH_DUMPLOADSEARCH. It is using case-sensitive matching,
> but on Windows filesystems are case-insensitive.

Hi Andy,

The Windows filesystem is case-insensitive but the case is preserved
correct?  If so it should work no?

Last queston: do reverse slashes '\' appear somewhere in those
filenames?  This was issue I tried to fix with the blind patch I've
sent.

Thanks

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-17 22:39         ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-02-18 20:48           ` Andy Moreton
  2021-02-18 21:00             ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Andy Moreton @ 2021-02-18 20:48 UTC (permalink / raw)
  To: 46256

On Wed 17 Feb 2021, Andrea Corallo via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:

> Andy Moreton <andrewjmoreton@gmail.com> writes:
>
>> On Thu 04 Feb 2021, Andy Moreton wrote:
>>
>>> On Thu 04 Feb 2021, Andy Moreton wrote:
>>>
>>>> On Wed 03 Feb 2021, akrl--- via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:
>>>>>
>>>>> Hi Andy,
>>>>>
>>>>> could you share the values of PATH_DUMPLOADSEARCH and
>>>>> PATH_REL_LOADSEARCH from your epaths.h ?
>>>>>
>>>>> Thanks
>>>>>
>>>>>   Andrea
>>>>
>>>> Native branch checkout is in: "c:/emacs/git/emacs/native/"
>>>>
>>>> "c:/emacs/git/emacs/native/build/mingw64-x86_64-O2/src/epaths.h" contains:
>>>>
>>>> #define PATH_DUMPLOADSEARCH "C:/emacs/git/emacs/native/lisp"
>>>> #define PATH_REL_LOADSEARCH "28.0.50/lisp"
>>>
>>> I've bootstrapped again after the recent hash shortening to ensure my build
>>> is up to date, from commit 1f626e9662d8120acd5a937f847123cc2b8c6e31. The
>>> paths above are unchanged.
>>>
>>> Running this from the build dir, I see messages like:
>>>
>>> error in process sentinel: Native elisp load failed: "file does not exists",
>>>  "c:/home/ajm/.emacs.d/eln-cache/28.0.50-e2ae3598/hl-line-e67628ec-664ef650.eln"
>>>
>>> This suggests that the AOT .eln files are not being found. It should find:
>>>
>>> c:/emacs/git/emacs/native/build/mingw64-x86_64-O2/native-lisp/28.0.50-e2ae3598/hl-line-8fa29c14-664ef650.eln
>>>
>>> The middle hash (e67628ec vs. 8fa29c14) is not the same - any idea why ?
>>
>> After looking at what `comp-el-to-eln-filename' does, I observe that:
>>
>> (substring (md5 "c:/emacs/git/emacs/native/lisp/hl-line.el") 0 8)
>> "e67628ec"
>>
>> (substring (md5 "//hl-line.el") 0 8)
>> "8fa29c14"
>>
>> That matches the two middle hashes seen above.
>>
>> It looks like `comp-el-to-eln-filename` fails to match the filename
>> prefix against PATH_DUMPLOADSEARCH. It is using case-sensitive matching,
>> but on Windows filesystems are case-insensitive.
>
> Hi Andy,
>
> The Windows filesystem is case-insensitive but the case is preserved
> correct?  If so it should work no?

Yes, Windows filesystems are case-preserving and do case-insensitive
lookup. The fact the code complains about the file not existing, and the
hashes matching as described earlier shows it is clearnly not working. I
have conjectured why, but the reason may well be something else.

> Last queston: do reverse slashes '\' appear somewhere in those
> filenames?  This was issue I tried to fix with the blind patch I've
> sent.

As Eli pointed out, that is not the problem: forward slashes are ok.

      AndyM






^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-18 20:48           ` Andy Moreton
@ 2021-02-18 21:00             ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-02-19  8:02               ` Eli Zaretskii
  0 siblings, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-02-18 21:00 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 46256

Andy Moreton <andrewjmoreton@gmail.com> writes:

> On Wed 17 Feb 2021, Andrea Corallo via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:
>
>> Andy Moreton <andrewjmoreton@gmail.com> writes:
>>
>>> On Thu 04 Feb 2021, Andy Moreton wrote:
>>>
>>>> On Thu 04 Feb 2021, Andy Moreton wrote:
>>>>
>>>>> On Wed 03 Feb 2021, akrl--- via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:
>>>>>>
>>>>>> Hi Andy,
>>>>>>
>>>>>> could you share the values of PATH_DUMPLOADSEARCH and
>>>>>> PATH_REL_LOADSEARCH from your epaths.h ?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>   Andrea
>>>>>
>>>>> Native branch checkout is in: "c:/emacs/git/emacs/native/"
>>>>>
>>>>> "c:/emacs/git/emacs/native/build/mingw64-x86_64-O2/src/epaths.h" contains:
>>>>>
>>>>> #define PATH_DUMPLOADSEARCH "C:/emacs/git/emacs/native/lisp"
>>>>> #define PATH_REL_LOADSEARCH "28.0.50/lisp"
>>>>
>>>> I've bootstrapped again after the recent hash shortening to ensure my build
>>>> is up to date, from commit 1f626e9662d8120acd5a937f847123cc2b8c6e31. The
>>>> paths above are unchanged.
>>>>
>>>> Running this from the build dir, I see messages like:
>>>>
>>>> error in process sentinel: Native elisp load failed: "file does not exists",
>>>>  "c:/home/ajm/.emacs.d/eln-cache/28.0.50-e2ae3598/hl-line-e67628ec-664ef650.eln"
>>>>
>>>> This suggests that the AOT .eln files are not being found. It should find:
>>>>
>>>> c:/emacs/git/emacs/native/build/mingw64-x86_64-O2/native-lisp/28.0.50-e2ae3598/hl-line-8fa29c14-664ef650.eln
>>>>
>>>> The middle hash (e67628ec vs. 8fa29c14) is not the same - any idea why ?
>>>
>>> After looking at what `comp-el-to-eln-filename' does, I observe that:
>>>
>>> (substring (md5 "c:/emacs/git/emacs/native/lisp/hl-line.el") 0 8)
>>> "e67628ec"
>>>
>>> (substring (md5 "//hl-line.el") 0 8)
>>> "8fa29c14"
>>>
>>> That matches the two middle hashes seen above.
>>>
>>> It looks like `comp-el-to-eln-filename` fails to match the filename
>>> prefix against PATH_DUMPLOADSEARCH. It is using case-sensitive matching,
>>> but on Windows filesystems are case-insensitive.
>>
>> Hi Andy,
>>
>> The Windows filesystem is case-insensitive but the case is preserved
>> correct?  If so it should work no?
>
> Yes, Windows filesystems are case-preserving and do case-insensitive
> lookup. The fact the code complains about the file not existing, and the
> hashes matching as described earlier shows it is clearnly not working. I
> have conjectured why, but the reason may well be something else.
>
>> Last queston: do reverse slashes '\' appear somewhere in those
>> filenames?  This was issue I tried to fix with the blind patch I've
>> sent.
>
> As Eli pointed out, that is not the problem: forward slashes are ok.

I understand they are handled, but here as we do a substitution we must
substitute what's coming in.

As you have the possibility to debug this piece of code on Windows
please have a look at this (or try my blind patch if you haven't).

Thanks

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-18 21:00             ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-02-19  8:02               ` Eli Zaretskii
  2021-02-19 14:49                 ` Andy Moreton
  0 siblings, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-02-19  8:02 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton

> Cc: 46256@debbugs.gnu.org
> Date: Thu, 18 Feb 2021 21:00:29 +0000
> From:  Andrea Corallo via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> 
> >> Last queston: do reverse slashes '\' appear somewhere in those
> >> filenames?  This was issue I tried to fix with the blind patch I've
> >> sent.
> >
> > As Eli pointed out, that is not the problem: forward slashes are ok.
> 
> I understand they are handled, but here as we do a substitution we must
> substitute what's coming in.
> 
> As you have the possibility to debug this piece of code on Windows
> please have a look at this (or try my blind patch if you haven't).

If the problem is with hashing file names, you will have to
canonicalize them first, including resolving the letter-case issue,
the forward/back-slashes issue, and also the issue with those pesky
numerical tails Windows sometimes produces.  We have a function
Fw32_long_file_name for that purpose, I think you should use it (if
you need it for C strings, we could add a wrapper around
w32_get_long_filename to do that instead).  This assumes that you are
talking about existing files; if that assumption is not true, we will
need a slightly different strategy.






^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-19  8:02               ` Eli Zaretskii
@ 2021-02-19 14:49                 ` Andy Moreton
  2021-02-19 15:28                   ` Eli Zaretskii
                                     ` (2 more replies)
  0 siblings, 3 replies; 179+ messages in thread
From: Andy Moreton @ 2021-02-19 14:49 UTC (permalink / raw)
  To: 46256

On Fri 19 Feb 2021, Eli Zaretskii wrote:

>> Cc: 46256@debbugs.gnu.org
>> Date: Thu, 18 Feb 2021 21:00:29 +0000
>> From:  Andrea Corallo via "Bug reports for GNU Emacs,
>>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
>> 
>> >> Last queston: do reverse slashes '\' appear somewhere in those
>> >> filenames?  This was issue I tried to fix with the blind patch I've
>> >> sent.
>> >
>> > As Eli pointed out, that is not the problem: forward slashes are ok.
>> 
>> I understand they are handled, but here as we do a substitution we must
>> substitute what's coming in.
>> 
>> As you have the possibility to debug this piece of code on Windows
>> please have a look at this (or try my blind patch if you haven't).
>
> If the problem is with hashing file names, you will have to
> canonicalize them first, including resolving the letter-case issue,
> the forward/back-slashes issue, and also the issue with those pesky
> numerical tails Windows sometimes produces.  We have a function
> Fw32_long_file_name for that purpose, I think you should use it (if
> you need it for C strings, we could add a wrapper around
> w32_get_long_filename to do that instead).  This assumes that you are
> talking about existing files; if that assumption is not true, we will
> need a slightly different strategy.

The problem is with the file names used to generate the hashes, where
comparison of file names.

As an experiment, I changed epaths.h from:
#define PATH_DUMPLOADSEARCH "C:/emacs/git/emacs/native/lisp"

to:
#define PATH_DUMPLOADSEARCH "c:/emacs/git/emacs/native/lisp"

and then ran make (to build without regenerating the header).
The resulting emacs did not complain about mismatched filenames.

Thus the fix outlined by Eli above looks like it will solve the problem.

    AndyM






^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-19 14:49                 ` Andy Moreton
@ 2021-02-19 15:28                   ` Eli Zaretskii
  2021-02-19 16:01                   ` Andrea Corallo
  2021-02-26 20:34                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2 siblings, 0 replies; 179+ messages in thread
From: Eli Zaretskii @ 2021-02-19 15:28 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 46256

> From: Andy Moreton <andrewjmoreton@gmail.com>
> Date: Fri, 19 Feb 2021 14:49:25 +0000
> 
> As an experiment, I changed epaths.h from:
> #define PATH_DUMPLOADSEARCH "C:/emacs/git/emacs/native/lisp"
> 
> to:
> #define PATH_DUMPLOADSEARCH "c:/emacs/git/emacs/native/lisp"
> 
> and then ran make (to build without regenerating the header).
> The resulting emacs did not complain about mismatched filenames.
> 
> Thus the fix outlined by Eli above looks like it will solve the problem.

Btw, there's a similar in principle, but different in details, problem
with macOS: it stores file names in decomposed form, i.e., for
example, ä will be stored as two codepoints: a, followed by U+00A8
DIAERESIS.  So any hashing that relies on comparing file names as
strings will need to normalize the file names on macOS filesystems
(HFS) as well.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-19 14:49                 ` Andy Moreton
  2021-02-19 15:28                   ` Eli Zaretskii
@ 2021-02-19 16:01                   ` Andrea Corallo
  2021-02-26 20:34                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2 siblings, 0 replies; 179+ messages in thread
From: Andrea Corallo @ 2021-02-19 16:01 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 46256

Andy Moreton <andrewjmoreton@gmail.com> writes:

> As an experiment, I changed epaths.h from:
> #define PATH_DUMPLOADSEARCH "C:/emacs/git/emacs/native/lisp"
>
> to:
> #define PATH_DUMPLOADSEARCH "c:/emacs/git/emacs/native/lisp"
>
> and then ran make (to build without regenerating the header).
> The resulting emacs did not complain about mismatched filenames.
>
> Thus the fix outlined by Eli above looks like it will solve the problem.

Sounds great!

I'll write the fix in the following days (in case somebody wants to take
over the task just mention it here, indeed this is very welcome).

Thanks

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-19 14:49                 ` Andy Moreton
  2021-02-19 15:28                   ` Eli Zaretskii
  2021-02-19 16:01                   ` Andrea Corallo
@ 2021-02-26 20:34                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-02-26 20:45                     ` Eli Zaretskii
  2021-02-27 12:08                     ` Andy Moreton
  2 siblings, 2 replies; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-02-26 20:34 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 46256

[-- Attachment #1: Type: text/plain, Size: 698 bytes --]

Andy Moreton <andrewjmoreton@gmail.com> writes:

[...]

> The problem is with the file names used to generate the hashes, where
> comparison of file names.
>
> As an experiment, I changed epaths.h from:
> #define PATH_DUMPLOADSEARCH "C:/emacs/git/emacs/native/lisp"
>
> to:
> #define PATH_DUMPLOADSEARCH "c:/emacs/git/emacs/native/lisp"
>
> and then ran make (to build without regenerating the header).
> The resulting emacs did not complain about mismatched filenames.
>
> Thus the fix outlined by Eli above looks like it will solve the problem.
>
>     AndyM

Hi Andy,

could you give it a try to the attached patch?  It follows Eli's
suggestion of using 'Fw32_long_file_name'.

Thanks

  Andrea

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Canonicalize-filenames-on-Windows-before-hashing-bug.patch --]
[-- Type: text/x-diff, Size: 1494 bytes --]

From 312deba5302a8136fa104b054af54572cc64ea5e Mon Sep 17 00:00:00 2001
From: Andrea Corallo <akrl@sdf.org>
Date: Fri, 26 Feb 2021 21:27:02 +0100
Subject: [PATCH] * Canonicalize filenames on Windows before hashing
 (bug#46256)

	* src/comp.c (Fcomp_el_to_eln_filename): On Windowns
	canonicalize filenames before hashing.
---
 src/comp.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/comp.c b/src/comp.c
index a8b8ef95fa..1a89e4e62a 100644
--- a/src/comp.c
+++ b/src/comp.c
@@ -3983,6 +3983,10 @@ DEFUN ("comp-el-to-eln-filename", Fcomp_el_to_eln_filename,
   if (NILP (Ffile_exists_p (filename)))
     xsignal1 (Qfile_missing, filename);
 
+#ifdef WINDOWSNT
+  filename = Fw32_long_file_name (filename);
+#endif
+
   Lisp_Object content_hash = comp_hash_source_file (filename);
 
   if (suffix_p (filename, ".gz"))
@@ -4014,8 +4018,11 @@ DEFUN ("comp-el-to-eln-filename", Fcomp_el_to_eln_filename,
       Lisp_Object sys_re =
 	concat2 (build_string ("\\`[[:ascii:]]+"),
 		 Fregexp_quote (build_string ("/" PATH_REL_LOADSEARCH "/")));
-      loadsearch_re_list =
-	list2 (sys_re, Fregexp_quote (build_string (PATH_DUMPLOADSEARCH "/")));
+      Lisp_Object dump_load_search = build_string (PATH_DUMPLOADSEARCH "/");
+#ifdef WINDOWSNT
+      dump_load_search = Fw32_long_file_name (dump_load_search);
+#endif
+      loadsearch_re_list = list2 (sys_re, Fregexp_quote (dump_load_search));
     }
 
   Lisp_Object lds_re_tail = loadsearch_re_list;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-26 20:34                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-02-26 20:45                     ` Eli Zaretskii
  2021-02-26 20:48                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-02-27 12:08                     ` Andy Moreton
  1 sibling, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-02-26 20:45 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton

> Cc: 46256@debbugs.gnu.org
> Date: Fri, 26 Feb 2021 20:34:10 +0000
> From:  Andrea Corallo via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> 
> --- a/src/comp.c
> +++ b/src/comp.c
> @@ -3983,6 +3983,10 @@ DEFUN ("comp-el-to-eln-filename", Fcomp_el_to_eln_filename,
>    if (NILP (Ffile_exists_p (filename)))
>      xsignal1 (Qfile_missing, filename);
>  
> +#ifdef WINDOWSNT
> +  filename = Fw32_long_file_name (filename);
> +#endif

Is "filename" here a name of an existing file?  If not,
Fw32_long_file_name will return nil.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-26 20:45                     ` Eli Zaretskii
@ 2021-02-26 20:48                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-02-26 20:52                         ` Eli Zaretskii
  0 siblings, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-02-26 20:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton

Eli Zaretskii <eliz@gnu.org> writes:

>> Cc: 46256@debbugs.gnu.org
>> Date: Fri, 26 Feb 2021 20:34:10 +0000
>> From:  Andrea Corallo via "Bug reports for GNU Emacs,
>>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
>> 
>> --- a/src/comp.c
>> +++ b/src/comp.c
>> @@ -3983,6 +3983,10 @@ DEFUN ("comp-el-to-eln-filename", Fcomp_el_to_eln_filename,
>>    if (NILP (Ffile_exists_p (filename)))
>>      xsignal1 (Qfile_missing, filename);
>>  
>> +#ifdef WINDOWSNT
>> +  filename = Fw32_long_file_name (filename);
>> +#endif
>
> Is "filename" here a name of an existing file?  If not,
> Fw32_long_file_name will return nil.

It should always be as we explicitly check for that.

Quick question: I assumed Fw32_long_file_name works for directories as
well, is this correct?

Thanks

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-26 20:48                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-02-26 20:52                         ` Eli Zaretskii
  2021-02-27  6:58                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-02-26 20:52 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton

> From: Andrea Corallo <akrl@sdf.org>
> Cc: andrewjmoreton@gmail.com, 46256@debbugs.gnu.org
> Date: Fri, 26 Feb 2021 20:48:52 +0000
> 
> Quick question: I assumed Fw32_long_file_name works for directories as
> well, is this correct?

Yes, it does.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-26 20:52                         ` Eli Zaretskii
@ 2021-02-27  6:58                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-02-27  7:55                             ` Eli Zaretskii
  0 siblings, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-02-27  6:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: andrewjmoreton@gmail.com, 46256@debbugs.gnu.org
>> Date: Fri, 26 Feb 2021 20:48:52 +0000
>> 
>> Quick question: I assumed Fw32_long_file_name works for directories as
>> well, is this correct?
>
> Yes, it does.

Nice, thinking about I've got a last question: normalizing "c:/foo/" the
trainling '/' is kept or removed?  If the case is the second the patch
needs an adjustment.

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-27  6:58                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-02-27  7:55                             ` Eli Zaretskii
  0 siblings, 0 replies; 179+ messages in thread
From: Eli Zaretskii @ 2021-02-27  7:55 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton

> From: Andrea Corallo <akrl@sdf.org>
> Cc: andrewjmoreton@gmail.com, 46256@debbugs.gnu.org
> Date: Sat, 27 Feb 2021 06:58:45 +0000
> 
> Nice, thinking about I've got a last question: normalizing "c:/foo/" the
> trainling '/' is kept or removed?  If the case is the second the patch
> needs an adjustment.

A single trailing slash, if any, is kept.  Multiple trailing slashes
are collapsed into a single one.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-26 20:34                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-02-26 20:45                     ` Eli Zaretskii
@ 2021-02-27 12:08                     ` Andy Moreton
  2021-02-27 19:14                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 179+ messages in thread
From: Andy Moreton @ 2021-02-27 12:08 UTC (permalink / raw)
  To: 46256

On Fri 26 Feb 2021, Andrea Corallo via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:

> Andy Moreton <andrewjmoreton@gmail.com> writes:
>
> [...]
>
>> The problem is with the file names used to generate the hashes, where
>> comparison of file names.
>>
>> As an experiment, I changed epaths.h from:
>> #define PATH_DUMPLOADSEARCH "C:/emacs/git/emacs/native/lisp"
>>
>> to:
>> #define PATH_DUMPLOADSEARCH "c:/emacs/git/emacs/native/lisp"
>>
>> and then ran make (to build without regenerating the header).
>> The resulting emacs did not complain about mismatched filenames.
>>
>> Thus the fix outlined by Eli above looks like it will solve the problem.
>>
>>     AndyM
>
> Hi Andy,
>
> could you give it a try to the attached patch?  It follows Eli's
> suggestion of using 'Fw32_long_file_name'.

The patch looks good - please apply it.

I tried building with the patch applied to a clean tree, and the
resulting emacs runs without the filename mismatch messages, and did not
recompile the AOT files into the per-user eln-cache.

There were also a couple of errors in the build:

Backtrace:
00007ff78467a2a2
00007ff78453be26
00007ff7845a98ac
...[snipped]...
00007ff784626548
Eager macro-expansion failure: (file-error "Renaming" "Permission
denied"
"c:/emacs/git/emacs/native/build/mingw64-x86_64-O2-native/native-lisp/28.0.50-e09cfb99/cc-bytecomp-4817e810-d16f606e.eln"
"c:/emacs/git/emacs/native/build/mingw64-x86_64-O2-native/native-lisp/28.0.50-e09cfb99/cc-bytecomp-4817e810-d16f606e.elnGMMUdn.eln.tmp")
C:/emacs/git/emacs/native/src/alloc.c:3160: Emacs fatal error: assertion failed: cu->handle
make[2]: *** [Makefile:319: progmodes/antlr-mode.elc] Error 3

The backtrace addresses did not give anything useful from addr2line.

There are still some elisp files that did not get native compiled when
the build was done with "make -j8 NATIVE_FULL_AOT=1", and so get async
compiled when running the built emacs:

  ansi-color auth_source byte-opt bytecomp cconv cl-extra cl-lib cl-macs
  cl-seq comint comp comp-cstr cus-edit cus-start desktop
  display-fill-column-indicator easy-mmode easymenu edmacro eieio
  eieio-core frameset gv help-mode hl-line image-file info json kmacro
  map minibuf-eldef package paren password-cache pcase pp ring rx seq
  subr-x time-date warnings wid-edit

That may be a result of the error during the build.

    AndyM






^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-27 12:08                     ` Andy Moreton
@ 2021-02-27 19:14                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-02-27 19:20                         ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-02-27 19:46                         ` Andy Moreton
  0 siblings, 2 replies; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-02-27 19:14 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 46256

Andy Moreton <andrewjmoreton@gmail.com> writes:

> On Fri 26 Feb 2021, Andrea Corallo via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:
>
>> Andy Moreton <andrewjmoreton@gmail.com> writes:
>>
>> [...]
>>
>>> The problem is with the file names used to generate the hashes, where
>>> comparison of file names.
>>>
>>> As an experiment, I changed epaths.h from:
>>> #define PATH_DUMPLOADSEARCH "C:/emacs/git/emacs/native/lisp"
>>>
>>> to:
>>> #define PATH_DUMPLOADSEARCH "c:/emacs/git/emacs/native/lisp"
>>>
>>> and then ran make (to build without regenerating the header).
>>> The resulting emacs did not complain about mismatched filenames.
>>>
>>> Thus the fix outlined by Eli above looks like it will solve the problem.
>>>
>>>     AndyM
>>
>> Hi Andy,
>>
>> could you give it a try to the attached patch?  It follows Eli's
>> suggestion of using 'Fw32_long_file_name'.
>
> The patch looks good - please apply it.

Thanks for verifying it, installed as 312deba530.

> I tried building with the patch applied to a clean tree, and the
> resulting emacs runs without the filename mismatch messages, and did not
> recompile the AOT files into the per-user eln-cache.
>
> There were also a couple of errors in the build:
>
> Backtrace:
> 00007ff78467a2a2
> 00007ff78453be26
> 00007ff7845a98ac
> ...[snipped]...
> 00007ff784626548
> Eager macro-expansion failure: (file-error "Renaming" "Permission
> denied"
> "c:/emacs/git/emacs/native/build/mingw64-x86_64-O2-native/native-lisp/28.0.50-e09cfb99/cc-bytecomp-4817e810-d16f606e.eln"
> "c:/emacs/git/emacs/native/build/mingw64-x86_64-O2-native/native-lisp/28.0.50-e09cfb99/cc-bytecomp-4817e810-d16f606e.elnGMMUdn.eln.tmp")
> C:/emacs/git/emacs/native/src/alloc.c:3160: Emacs fatal error: assertion failed: cu->handle
> make[2]: *** [Makefile:319: progmodes/antlr-mode.elc] Error 3
>
> The backtrace addresses did not give anything useful from addr2line.
>
> There are still some elisp files that did not get native compiled when
> the build was done with "make -j8 NATIVE_FULL_AOT=1", and so get async
> compiled when running the built emacs:
>
>   ansi-color auth_source byte-opt bytecomp cconv cl-extra cl-lib cl-macs
>   cl-seq comint comp comp-cstr cus-edit cus-start desktop
>   display-fill-column-indicator easy-mmode easymenu edmacro eieio
>   eieio-core frameset gv help-mode hl-line image-file info json kmacro
>   map minibuf-eldef package paren password-cache pcase pp ring rx seq
>   subr-x time-date warnings wid-edit
>
> That may be a result of the error during the build.

Mmmmh, that's strange some of these are even compiled as COMPILE_FIRST
therfore are certainly native compiled.

One thing you could do (before one of these is recompiled) is to use
`comp-el-to-eln-filename' to check what the native compiler is expecting
as eln filename and if this is present in any of the folders in your
`comp-eln-load-path'.

Thanks

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-27 19:14                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-02-27 19:20                         ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-02-27 19:46                         ` Andy Moreton
  1 sibling, 0 replies; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-02-27 19:20 UTC (permalink / raw)
  To: 46256; +Cc: andrewjmoreton

Andrea Corallo via "Bug reports for GNU Emacs, the Swiss army knife of
text editors" <bug-gnu-emacs@gnu.org> writes:

> Andy Moreton <andrewjmoreton@gmail.com> writes:
>
>> On Fri 26 Feb 2021, Andrea Corallo via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:
>>
>>> Andy Moreton <andrewjmoreton@gmail.com> writes:
>>>
>>> [...]
>>>
>>>> The problem is with the file names used to generate the hashes, where
>>>> comparison of file names.
>>>>
>>>> As an experiment, I changed epaths.h from:
>>>> #define PATH_DUMPLOADSEARCH "C:/emacs/git/emacs/native/lisp"
>>>>
>>>> to:
>>>> #define PATH_DUMPLOADSEARCH "c:/emacs/git/emacs/native/lisp"
>>>>
>>>> and then ran make (to build without regenerating the header).
>>>> The resulting emacs did not complain about mismatched filenames.
>>>>
>>>> Thus the fix outlined by Eli above looks like it will solve the problem.
>>>>
>>>>     AndyM
>>>
>>> Hi Andy,
>>>
>>> could you give it a try to the attached patch?  It follows Eli's
>>> suggestion of using 'Fw32_long_file_name'.
>>
>> The patch looks good - please apply it.
>
> Thanks for verifying it, installed as 312deba530.
>
>> I tried building with the patch applied to a clean tree, and the
>> resulting emacs runs without the filename mismatch messages, and did not
>> recompile the AOT files into the per-user eln-cache.
>>
>> There were also a couple of errors in the build:
>>
>> Backtrace:
>> 00007ff78467a2a2
>> 00007ff78453be26
>> 00007ff7845a98ac
>> ...[snipped]...
>> 00007ff784626548
>> Eager macro-expansion failure: (file-error "Renaming" "Permission
>> denied"
>> "c:/emacs/git/emacs/native/build/mingw64-x86_64-O2-native/native-lisp/28.0.50-e09cfb99/cc-bytecomp-4817e810-d16f606e.eln"
>> "c:/emacs/git/emacs/native/build/mingw64-x86_64-O2-native/native-lisp/28.0.50-e09cfb99/cc-bytecomp-4817e810-d16f606e.elnGMMUdn.eln.tmp")
>> C:/emacs/git/emacs/native/src/alloc.c:3160: Emacs fatal error: assertion failed: cu->handle
>> make[2]: *** [Makefile:319: progmodes/antlr-mode.elc] Error 3
>>
>> The backtrace addresses did not give anything useful from addr2line.
>>
>> There are still some elisp files that did not get native compiled when
>> the build was done with "make -j8 NATIVE_FULL_AOT=1", and so get async
>> compiled when running the built emacs:
>>
>>   ansi-color auth_source byte-opt bytecomp cconv cl-extra cl-lib cl-macs
>>   cl-seq comint comp comp-cstr cus-edit cus-start desktop
>>   display-fill-column-indicator easy-mmode easymenu edmacro eieio
>>   eieio-core frameset gv help-mode hl-line image-file info json kmacro
>>   map minibuf-eldef package paren password-cache pcase pp ring rx seq
>>   subr-x time-date warnings wid-edit
>>
>> That may be a result of the error during the build.
>
> Mmmmh, that's strange some of these are even compiled as COMPILE_FIRST
> therfore are certainly native compiled.
>
> One thing you could do (before one of these is recompiled) is to use
> `comp-el-to-eln-filename' to check what the native compiler is expecting
> as eln filename and if this is present in any of the folders in your
> `comp-eln-load-path'.

Apologies, to be more precise: if the file is compiled during the build
(as should be in this case) it should be in the "native-lisp/" dir in
the build tree.

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-27 19:14                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-02-27 19:20                         ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-02-27 19:46                         ` Andy Moreton
  2021-02-27 21:58                           ` Andy Moreton
  1 sibling, 1 reply; 179+ messages in thread
From: Andy Moreton @ 2021-02-27 19:46 UTC (permalink / raw)
  To: 46256

On Sat 27 Feb 2021, Andrea Corallo via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:

> Andy Moreton <andrewjmoreton@gmail.com> writes:
>> There are still some elisp files that did not get native compiled when
>> the build was done with "make -j8 NATIVE_FULL_AOT=1", and so get async
>> compiled when running the built emacs:
>>
>>   ansi-color auth_source byte-opt bytecomp cconv cl-extra cl-lib cl-macs
>>   cl-seq comint comp comp-cstr cus-edit cus-start desktop
>>   display-fill-column-indicator easy-mmode easymenu edmacro eieio
>>   eieio-core frameset gv help-mode hl-line image-file info json kmacro
>>   map minibuf-eldef package paren password-cache pcase pp ring rx seq
>>   subr-x time-date warnings wid-edit
>>
>> That may be a result of the error during the build.
>
> Mmmmh, that's strange some of these are even compiled as COMPILE_FIRST
> therfore are certainly native compiled.

I suspect that the issue may be with parallel builds (note the "-j8"
above). Repeating the build with "-j1" appears to be building the
missing .eln files as expected.

    AndyM






^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-27 19:46                         ` Andy Moreton
@ 2021-02-27 21:58                           ` Andy Moreton
  2021-02-28 17:35                             ` Eli Zaretskii
  2021-02-28 21:04                             ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 2 replies; 179+ messages in thread
From: Andy Moreton @ 2021-02-27 21:58 UTC (permalink / raw)
  To: 46256

On Sat 27 Feb 2021, Andy Moreton wrote:

> On Sat 27 Feb 2021, Andrea Corallo via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:
>
>> Andy Moreton <andrewjmoreton@gmail.com> writes:
>>> There are still some elisp files that did not get native compiled when
>>> the build was done with "make -j8 NATIVE_FULL_AOT=1", and so get async
>>> compiled when running the built emacs:
>>>
>>>   ansi-color auth_source byte-opt bytecomp cconv cl-extra cl-lib cl-macs
>>>   cl-seq comint comp comp-cstr cus-edit cus-start desktop
>>>   display-fill-column-indicator easy-mmode easymenu edmacro eieio
>>>   eieio-core frameset gv help-mode hl-line image-file info json kmacro
>>>   map minibuf-eldef package paren password-cache pcase pp ring rx seq
>>>   subr-x time-date warnings wid-edit
>>>
>>> That may be a result of the error during the build.
>>
>> Mmmmh, that's strange some of these are even compiled as COMPILE_FIRST
>> therfore are certainly native compiled.
>
> I suspect that the issue may be with parallel builds (note the "-j8"
> above). Repeating the build with "-j1" appears to be building the
> missing .eln files as expected.

Now that the -j1 build has completed (without error), all of the lisp
files have been compiled AOT as expected, and running the resulting
emacs does not rebuild any of those .eln files.

So I think there are still some other issues with dependencies and
handling parallel builds, but this bug has been fixed.

Thanks,

    AndyM






^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-27 21:58                           ` Andy Moreton
@ 2021-02-28 17:35                             ` Eli Zaretskii
  2021-02-28 21:15                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-01  9:48                               ` Andy Moreton
  2021-02-28 21:04                             ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 2 replies; 179+ messages in thread
From: Eli Zaretskii @ 2021-02-28 17:35 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 46256

> From: Andy Moreton <andrewjmoreton@gmail.com>
> Date: Sat, 27 Feb 2021 21:58:25 +0000
> 
> > I suspect that the issue may be with parallel builds (note the "-j8"
> > above). Repeating the build with "-j1" appears to be building the
> > missing .eln files as expected.
> 
> Now that the -j1 build has completed (without error), all of the lisp
> files have been compiled AOT as expected, and running the resulting
> emacs does not rebuild any of those .eln files.
> 
> So I think there are still some other issues with dependencies and
> handling parallel builds, but this bug has been fixed.

Hmm... what would be the reason for parallel builds not work well on
MS-Windows? file sharing issues?

Does the async native compilation use temporary files, and if so, do
they reside in the same directory when multiple compilations are
running?





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-27 21:58                           ` Andy Moreton
  2021-02-28 17:35                             ` Eli Zaretskii
@ 2021-02-28 21:04                             ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 0 replies; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-02-28 21:04 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 46256-done

Andy Moreton <andrewjmoreton@gmail.com> writes:

> On Sat 27 Feb 2021, Andy Moreton wrote:
>
>> On Sat 27 Feb 2021, Andrea Corallo via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:
>>
>>> Andy Moreton <andrewjmoreton@gmail.com> writes:
>>>> There are still some elisp files that did not get native compiled when
>>>> the build was done with "make -j8 NATIVE_FULL_AOT=1", and so get async
>>>> compiled when running the built emacs:
>>>>
>>>>   ansi-color auth_source byte-opt bytecomp cconv cl-extra cl-lib cl-macs
>>>>   cl-seq comint comp comp-cstr cus-edit cus-start desktop
>>>>   display-fill-column-indicator easy-mmode easymenu edmacro eieio
>>>>   eieio-core frameset gv help-mode hl-line image-file info json kmacro
>>>>   map minibuf-eldef package paren password-cache pcase pp ring rx seq
>>>>   subr-x time-date warnings wid-edit
>>>>
>>>> That may be a result of the error during the build.
>>>
>>> Mmmmh, that's strange some of these are even compiled as COMPILE_FIRST
>>> therfore are certainly native compiled.
>>
>> I suspect that the issue may be with parallel builds (note the "-j8"
>> above). Repeating the build with "-j1" appears to be building the
>> missing .eln files as expected.
>
> Now that the -j1 build has completed (without error), all of the lisp
> files have been compiled AOT as expected, and running the resulting
> emacs does not rebuild any of those .eln files.
>
> So I think there are still some other issues with dependencies and
> handling parallel builds, but this bug has been fixed.

Thanks for checking, I'm closing then.

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-28 17:35                             ` Eli Zaretskii
@ 2021-02-28 21:15                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-01  5:36                                 ` Eli Zaretskii
  2021-03-01  9:48                               ` Andy Moreton
  1 sibling, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-02-28 21:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, Andy Moreton

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andy Moreton <andrewjmoreton@gmail.com>
>> Date: Sat, 27 Feb 2021 21:58:25 +0000
>> 
>> > I suspect that the issue may be with parallel builds (note the "-j8"
>> > above). Repeating the build with "-j1" appears to be building the
>> > missing .eln files as expected.
>> 
>> Now that the -j1 build has completed (without error), all of the lisp
>> files have been compiled AOT as expected, and running the resulting
>> emacs does not rebuild any of those .eln files.
>> 
>> So I think there are still some other issues with dependencies and
>> handling parallel builds, but this bug has been fixed.
>
> Hmm... what would be the reason for parallel builds not work well on
> MS-Windows? file sharing issues?

I suspect this is not Windows related.

> Does the async native compilation use temporary files, and if so, do
> they reside in the same directory when multiple compilations are
> running?

Yes, we rely on Fmake_temp_file_internal in Fcomp__compile_ctxt_to_file
to decide the output filename to be passed to libgccjit when asking for
compilation.

There should be no conflict unless more then one process is trying to
compile the same file (not sure ATM if this is what we are seeing here
and why this should be happening).

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-28 21:15                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-01  5:36                                 ` Eli Zaretskii
  2021-03-01  6:34                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-01  5:36 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton

> From: Andrea Corallo <akrl@sdf.org>
> Cc: Andy Moreton <andrewjmoreton@gmail.com>, 46256@debbugs.gnu.org
> Date: Sun, 28 Feb 2021 21:15:03 +0000
> 
> > Does the async native compilation use temporary files, and if so, do
> > they reside in the same directory when multiple compilations are
> > running?
> 
> Yes, we rely on Fmake_temp_file_internal in Fcomp__compile_ctxt_to_file
> to decide the output filename to be passed to libgccjit when asking for
> compilation.

That shouldn't cause a problem, I think.

> There should be no conflict unless more then one process is trying to
> compile the same file

Is there a way to print to some log file the names of the files being
compiled?  Then perhaps we could catch such multiple compilations.

AFAIR, the Emacs build process divides files into several groups, and
no 2 groups include the same file.  So the top-level compilation
process cannot cause multiple compilations of the same file.  But
could it happen that compiling file A indirectly causes file B to be
compiled, because file A requires B or loads B or calls functions
declared to be in B, and there's not yet a .eln file for file B?





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-01  5:36                                 ` Eli Zaretskii
@ 2021-03-01  6:34                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 0 replies; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-01  6:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: Andy Moreton <andrewjmoreton@gmail.com>, 46256@debbugs.gnu.org
>> Date: Sun, 28 Feb 2021 21:15:03 +0000
>> 
>> > Does the async native compilation use temporary files, and if so, do
>> > they reside in the same directory when multiple compilations are
>> > running?
>> 
>> Yes, we rely on Fmake_temp_file_internal in Fcomp__compile_ctxt_to_file
>> to decide the output filename to be passed to libgccjit when asking for
>> compilation.
>
> That shouldn't cause a problem, I think.
>
>> There should be no conflict unless more then one process is trying to
>> compile the same file
>
> Is there a way to print to some log file the names of the files being
> compiled?  Then perhaps we could catch such multiple compilations.

We don't have any log facility ATM for that.

Yesterday evening testing a patch I had the same issue, it should be
sufficient to add some print.  I'll try to look into.

> AFAIR, the Emacs build process divides files into several groups, and
> no 2 groups include the same file.  So the top-level compilation
> process cannot cause multiple compilations of the same file.  But
> could it happen that compiling file A indirectly causes file B to be
> compiled, because file A requires B or loads B or calls functions
> declared to be in B, and there's not yet a .eln file for file B?

It should not happen.

When 'noninteractive' is true we disable deferred compilation in
'maybe_defer_native_compilation'.  Reason for this being that we want to
trigger automatic compilations only for reasonably long standing
sessions and very often non interactive ones aren't.

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-02-28 17:35                             ` Eli Zaretskii
  2021-02-28 21:15                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-01  9:48                               ` Andy Moreton
  2021-03-03 18:27                                 ` Eli Zaretskii
  1 sibling, 1 reply; 179+ messages in thread
From: Andy Moreton @ 2021-03-01  9:48 UTC (permalink / raw)
  To: 46256

On Sun 28 Feb 2021, Eli Zaretskii wrote:

>> From: Andy Moreton <andrewjmoreton@gmail.com>
>> Date: Sat, 27 Feb 2021 21:58:25 +0000
>> 
>> > I suspect that the issue may be with parallel builds (note the "-j8"
>> > above). Repeating the build with "-j1" appears to be building the
>> > missing .eln files as expected.
>> 
>> Now that the -j1 build has completed (without error), all of the lisp
>> files have been compiled AOT as expected, and running the resulting
>> emacs does not rebuild any of those .eln files.
>> 
>> So I think there are still some other issues with dependencies and
>> handling parallel builds, but this bug has been fixed.
>
> Hmm... what would be the reason for parallel builds not work well on
> MS-Windows? file sharing issues?
>
> Does the async native compilation use temporary files, and if so, do
> they reside in the same directory when multiple compilations are
> running?

I've tried a few builds from a clean tree (after "git clean -xdf") and
have note  been able to reproduce this parallel build problem again.

Do let me know if there are any steps I should take to help diagnose it
if it does reproduce again.

    AndyM






^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-01  9:48                               ` Andy Moreton
@ 2021-03-03 18:27                                 ` Eli Zaretskii
  2021-03-03 18:43                                   ` Eli Zaretskii
  2021-03-03 18:48                                   ` Eli Zaretskii
  0 siblings, 2 replies; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-03 18:27 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 46256

Progress report:

 . I've successfully build a 32-bit Emacs --with-wide-int on
   MS-Windows, for now _without_ NATIVE_FULL_AOT=1.

 . The built Emacs crashes on startup in interactive invocations from
   cmd.exe, if invoked as "emacs -Q" or "src/emacs -Q".  This was
   traced to set_invocation_vars, which calls openp, which calls
   expand-file-name, which on MS-Windows expects the emacs_dir
   variable to be defined in the environment -- but this is false at
   that point, because init_environment was not yet called.

   I fixed this by avoiding the call to openp (MS-Windows executables
   have an easy way of determining their full absolute file name), but
   in general I must say that the call to init_vars_for_load in
   pdumper_load worries me quite a bit: this is a very early stage in
   startup, before we init most of our infrastructure, and so relying
   on file-name functions, memory allocation, etc. is very dangerous,
   especially on Windows, where the infrastructure not yet initialized
   at that point includes the environment.

 . After fixing the above, Emacs starts, but as soon as some simple
   command is invoked, and Emacs starts native-compiling Lisp
   packages, the Emacs subprocesses which run the async compilation
   start crashing.  Not all of them crash, but some do.  I wasn't yet
   able to find where they crash or why; stay tuned.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-03 18:27                                 ` Eli Zaretskii
@ 2021-03-03 18:43                                   ` Eli Zaretskii
  2021-03-03 19:46                                     ` Eli Zaretskii
  2021-03-07 17:59                                     ` Eli Zaretskii
  2021-03-03 18:48                                   ` Eli Zaretskii
  1 sibling, 2 replies; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-03 18:43 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton

> Date: Wed, 03 Mar 2021 20:27:55 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 46256@debbugs.gnu.org
> 
>  . After fixing the above, Emacs starts, but as soon as some simple
>    command is invoked, and Emacs starts native-compiling Lisp
>    packages, the Emacs subprocesses which run the async compilation
>    start crashing.  Not all of them crash, but some do.  I wasn't yet
>    able to find where they crash or why; stay tuned.

Some more info about these crashes: I see this in the *Messages*
buffer when a compilation crashes:

       Warning (comp): comp.h:70: Emacs fatal error: assertion failed: NATIVE_COMP_UNITP (a)^M

I also see similar picture in emacs_backtrace.txt:

  emacs_abort at src/w32fns.c:10947
  terminate_due_to_signal at src/emacs.c:417
  die at src/alloc.c:7452
  XNATIVE_COMP_UNIT at src/comp.h:70
  load_comp_unit at src/comp.c:4766
  syms_of_comp at src/comp.c:5077
  Fload at src/lread.c:1548

(My Emacs is compiled with --enable-checking=yes.)

Btw, that ^M character after the error message probably means we don't
correctly decode messages from the async compilation subprocesses --
but this is a secondary problem for now.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-03 18:27                                 ` Eli Zaretskii
  2021-03-03 18:43                                   ` Eli Zaretskii
@ 2021-03-03 18:48                                   ` Eli Zaretskii
  2021-03-03 19:28                                     ` Eli Zaretskii
  2021-03-03 19:37                                     ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 2 replies; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-03 18:48 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton

Also, while compiling I see this warning:

  comp.c: In function 'eln_load_path_final_clean_up':
  comp.c:4514:15: warning: trampoline generated for nested function 'return_nil' [-Wtrampolines]
   4514 |   Lisp_Object return_nil (Lisp_Object arg) { return Qnil; }
	|               ^~~~~~~~~~

Why do we need this nested function on Windows, and what is the story
about the trampoline?  And how to avoid the warning?

Thanks.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-03 18:48                                   ` Eli Zaretskii
@ 2021-03-03 19:28                                     ` Eli Zaretskii
  2021-03-03 19:50                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-03 19:37                                     ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-03 19:28 UTC (permalink / raw)
  To: akrl; +Cc: 46256

I have a question: how do I determine which Emacs binary corresponds
to a particular directory in ~/.emacs.d/eln-cache/ ?

AFAIU, when I make a change in Emacs C sources and rebuild Emacs, the
netive-compiled files will be put in a new directory under eln-cache,
right?  Suppose I later would later like to remove stale binaries --
how do I know which eln-cache subdirectories I can remove at that
time?

TIA





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-03 18:48                                   ` Eli Zaretskii
  2021-03-03 19:28                                     ` Eli Zaretskii
@ 2021-03-03 19:37                                     ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-03 20:13                                       ` Eli Zaretskii
  1 sibling, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-03 19:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton

Eli Zaretskii <eliz@gnu.org> writes:

> Also, while compiling I see this warning:
>
>   comp.c: In function 'eln_load_path_final_clean_up':
>   comp.c:4514:15: warning: trampoline generated for nested function 'return_nil' [-Wtrampolines]
>    4514 |   Lisp_Object return_nil (Lisp_Object arg) { return Qnil; }
> 	|               ^~~~~~~~~~
>
> Why do we need this nested function on Windows, and what is the story
> about the trampoline?  And how to avoid the warning?
>
> Thanks.

This nested function was nested only to save some ifdefs (as it's used
only in Windows ifdefed code).  Didn't know it could cause warnings,
I've made it a regular function with cf37850e2d.

Thanks

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-03 18:43                                   ` Eli Zaretskii
@ 2021-03-03 19:46                                     ` Eli Zaretskii
  2021-03-03 20:04                                       ` Eli Zaretskii
  2021-03-07 17:59                                     ` Eli Zaretskii
  1 sibling, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-03 19:46 UTC (permalink / raw)
  To: akrl; +Cc: 46256, andrewjmoreton

> Date: Wed, 03 Mar 2021 20:43:01 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> 
> Some more info about these crashes: I see this in the *Messages*
> buffer when a compilation crashes:
> 
>        Warning (comp): comp.h:70: Emacs fatal error: assertion failed: NATIVE_COMP_UNITP (a)^M
> 
> I also see similar picture in emacs_backtrace.txt:
> 
>   emacs_abort at src/w32fns.c:10947
>   terminate_due_to_signal at src/emacs.c:417
>   die at src/alloc.c:7452
>   XNATIVE_COMP_UNIT at src/comp.h:70
>   load_comp_unit at src/comp.c:4766
>   syms_of_comp at src/comp.c:5077
>   Fload at src/lread.c:1548

It looks like these crashes are when compiling subr-x, because I see
zero-sized subr-x-XXXXX.eln.tmp files in the eln-cache directory.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-03 19:28                                     ` Eli Zaretskii
@ 2021-03-03 19:50                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-03 20:08                                         ` Eli Zaretskii
  0 siblings, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-03 19:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256

Eli Zaretskii <eliz@gnu.org> writes:

> I have a question: how do I determine which Emacs binary corresponds
> to a particular directory in ~/.emacs.d/eln-cache/ ?
>
> AFAIU, when I make a change in Emacs C sources and rebuild Emacs, the
> netive-compiled files will be put in a new directory under eln-cache,
> right?

Essentially only if you add a primitive function.

> Suppose I later would later like to remove stale binaries --
> how do I know which eln-cache subdirectories I can remove at that
> time?

ATM I tipically just remove all but the least recent one.  But another
smarter technique might be looking at the subfolder name in the build
tree you are interested in inside the 'native-lisp' directory, this is
the same subfolder name that's used inside 'eln-cache'.

Thinking about from Emacs one can find it simply inspecting the
`comp-abi-hash' variable.

Thanks

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-03 19:46                                     ` Eli Zaretskii
@ 2021-03-03 20:04                                       ` Eli Zaretskii
  2021-03-03 20:21                                         ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-03 20:04 UTC (permalink / raw)
  To: akrl; +Cc: 46256, andrewjmoreton

> Date: Wed, 03 Mar 2021 21:46:44 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> 
> >   emacs_abort at src/w32fns.c:10947
> >   terminate_due_to_signal at src/emacs.c:417
> >   die at src/alloc.c:7452
> >   XNATIVE_COMP_UNIT at src/comp.h:70
> >   load_comp_unit at src/comp.c:4766
> >   syms_of_comp at src/comp.c:5077
> >   Fload at src/lread.c:1548
> 
> It looks like these crashes are when compiling subr-x, because I see
> zero-sized subr-x-XXXXX.eln.tmp files in the eln-cache directory.

Yes:

  (gdb) r -batch -l comp -f batch-native-compile ../lisp/emacs-lisp/subr-x.el
  Starting program: D:\gnu\git\emacs\native-comp\src\emacs.exe -batch -l comp -f batch-native-compile ../lisp/emacs-lisp/subr-x.el
  warning: Enabling Low Fragmentation Heap failed: error 31
  [New Thread 14244.0x320c]
  [New Thread 14244.0x3540]
  [Thread 14244.0x3540 exited with code 1]
  Debugger entered--Lisp error: (native-compiler-error "../lisp/emacs-lisp/subr-x.el" "\nException 0xc0000005 at this address:\n07cdac3e\n\nB...")
    signal(native-compiler-error ("../lisp/emacs-lisp/subr-x.el" "\nException 0xc0000005 at this address:\n07cdac3e\n\nB..."))
    comp--native-compile("../lisp/emacs-lisp/subr-x.el")
    batch-native-compile()
    command-line-1(("-l" "comp" "-f" "batch-native-compile" "../lisp/emacs-lisp/subr-x.el"))
    command-line()
    normal-top-level()

So the async compilation process crashes with SIGSEGV when compiling
subr-x.el.

Andrea, can you help me figure out the command line with which the
async compilation subprocess is invoked in this case?  I'd like to run
it as a foreground process under a debugger, and see why it crashes.

Thanks.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-03 19:50                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-03 20:08                                         ` Eli Zaretskii
  0 siblings, 0 replies; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-03 20:08 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256

> From: Andrea Corallo <akrl@sdf.org>
> Cc: 46256@debbugs.gnu.org
> Date: Wed, 03 Mar 2021 19:50:05 +0000
> 
> > AFAIU, when I make a change in Emacs C sources and rebuild Emacs, the
> > netive-compiled files will be put in a new directory under eln-cache,
> > right?
> 
> Essentially only if you add a primitive function.

Ah, okay, that's better.

> > Suppose I later would later like to remove stale binaries --
> > how do I know which eln-cache subdirectories I can remove at that
> > time?
> 
> ATM I tipically just remove all but the least recent one.  But another
> smarter technique might be looking at the subfolder name in the build
> tree you are interested in inside the 'native-lisp' directory, this is
> the same subfolder name that's used inside 'eln-cache'.

OK, but the name of the subfolder doesn't include the full version
number, I see 28.0.50-XXXXX, whereas the Emacs binaries are 28.0.50.1,
28.0.50.2, etc.

> Thinking about from Emacs one can find it simply inspecting the
> `comp-abi-hash' variable.

OK, so we can ask the binary itself which subdirectory it needs,
thanks.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-03 19:37                                     ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-03 20:13                                       ` Eli Zaretskii
  0 siblings, 0 replies; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-03 20:13 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton

> From: Andrea Corallo <akrl@sdf.org>
> Cc: andrewjmoreton@gmail.com, 46256@debbugs.gnu.org
> Date: Wed, 03 Mar 2021 19:37:52 +0000
> 
> >   comp.c: In function 'eln_load_path_final_clean_up':
> >   comp.c:4514:15: warning: trampoline generated for nested function 'return_nil' [-Wtrampolines]
> >    4514 |   Lisp_Object return_nil (Lisp_Object arg) { return Qnil; }
> > 	|               ^~~~~~~~~~
> >
> > Why do we need this nested function on Windows, and what is the story
> > about the trampoline?  And how to avoid the warning?
> >
> > Thanks.
> 
> This nested function was nested only to save some ifdefs (as it's used
> only in Windows ifdefed code).  Didn't know it could cause warnings,
> I've made it a regular function with cf37850e2d.

Thanks, the warning is gone now.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-03 20:04                                       ` Eli Zaretskii
@ 2021-03-03 20:21                                         ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-04  8:30                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-03 20:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Wed, 03 Mar 2021 21:46:44 +0200
>> From: Eli Zaretskii <eliz@gnu.org>
>> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
>> 
>> >   emacs_abort at src/w32fns.c:10947
>> >   terminate_due_to_signal at src/emacs.c:417
>> >   die at src/alloc.c:7452
>> >   XNATIVE_COMP_UNIT at src/comp.h:70
>> >   load_comp_unit at src/comp.c:4766
>> >   syms_of_comp at src/comp.c:5077
>> >   Fload at src/lread.c:1548
>> 
>> It looks like these crashes are when compiling subr-x, because I see
>> zero-sized subr-x-XXXXX.eln.tmp files in the eln-cache directory.
>
> Yes:
>
>   (gdb) r -batch -l comp -f batch-native-compile ../lisp/emacs-lisp/subr-x.el
>   Starting program: D:\gnu\git\emacs\native-comp\src\emacs.exe -batch -l comp -f batch-native-compile ../lisp/emacs-lisp/subr-x.el
>   warning: Enabling Low Fragmentation Heap failed: error 31
>   [New Thread 14244.0x320c]
>   [New Thread 14244.0x3540]
>   [Thread 14244.0x3540 exited with code 1]
>   Debugger entered--Lisp error: (native-compiler-error "../lisp/emacs-lisp/subr-x.el" "\nException 0xc0000005 at this address:\n07cdac3e\n\nB...")
>     signal(native-compiler-error ("../lisp/emacs-lisp/subr-x.el" "\nException 0xc0000005 at this address:\n07cdac3e\n\nB..."))
>     comp--native-compile("../lisp/emacs-lisp/subr-x.el")
>     batch-native-compile()
>     command-line-1(("-l" "comp" "-f" "batch-native-compile" "../lisp/emacs-lisp/subr-x.el"))
>     command-line()
>     normal-top-level()
>
> So the async compilation process crashes with SIGSEGV when compiling
> subr-x.el.
>
> Andrea, can you help me figure out the command line with which the
> async compilation subprocess is invoked in this case?  I'd like to run
> it as a foreground process under a debugger, and see why it crashes.

Yes, each async compilation runs executing a temporary (not to exceed
the max command line length on Windows) Elisp file.  This file is
created by `comp-run-async-workers'.

One can put a print there to have the name of this file (and execute it
regularly with emacs -batch -l ...) to have the reproducer or look into
the temporary directory for the most recent
emacs-async-comp-...something... file.

  Andrea

PS ATM I see a crash too in my 32bit wide-int setup here, this is while
executing a top_level_run function loading a .eln file.  I need to
compile a more recent gdb to look into this further but it looks
something basic is going wrong there.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-03 20:21                                         ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-04  8:30                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-04 11:54                                             ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-06  0:33                                             ` Andy Moreton
  0 siblings, 2 replies; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-04  8:30 UTC (permalink / raw)
  To: 46256; +Cc: eliz, andrewjmoreton

Andrea Corallo via "Bug reports for GNU Emacs, the Swiss army knife of
text editors" <bug-gnu-emacs@gnu.org> writes:

[...]

> PS ATM I see a crash too in my 32bit wide-int setup here, this is while
> executing a top_level_run function loading a .eln file.  I need to
> compile a more recent gdb to look into this further but it looks
> something basic is going wrong there.

Ok, I think this issue was that `comp-abi-hash' was not accounting for
'--with-wide-int' and on my system a wide-int binary was loading a
non-wide-int .eln.  With 6444f69de2 I added
`system-configuration-options' as an input to the hash.

This is a conservative choice, we may want to look only at
'--with-wide-int' but I'm wondering if that's really the only sensitive
input therefore having `system-configuration-options' in the equation
looked safer to me at least for now.

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-04  8:30                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-04 11:54                                             ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-04 14:13                                               ` Eli Zaretskii
  2021-03-06  0:33                                             ` Andy Moreton
  1 sibling, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-04 11:54 UTC (permalink / raw)
  To: 46256; +Cc: andrewjmoreton, eliz

Andrea Corallo via "Bug reports for GNU Emacs, the Swiss army knife of
text editors" <bug-gnu-emacs@gnu.org> writes:

> Andrea Corallo via "Bug reports for GNU Emacs, the Swiss army knife of
> text editors" <bug-gnu-emacs@gnu.org> writes:
>
> [...]
>
>> PS ATM I see a crash too in my 32bit wide-int setup here, this is while
>> executing a top_level_run function loading a .eln file.  I need to
>> compile a more recent gdb to look into this further but it looks
>> something basic is going wrong there.
>
> Ok, I think this issue was that `comp-abi-hash' was not accounting for
> '--with-wide-int' and on my system a wide-int binary was loading a
> non-wide-int .eln.  With 6444f69de2 I added
> `system-configuration-options' as an input to the hash.
>
> This is a conservative choice, we may want to look only at
> '--with-wide-int' but I'm wondering if that's really the only sensitive
> input therefore having `system-configuration-options' in the equation
> looked safer to me at least for now.

Just to report, this morning I've used a bit Emacs 32bit wide-int and as
of 6444f69de2 seems to work fine here (some or org and C file editing),
also the compiler testsuite is passing clean.

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-04 11:54                                             ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-04 14:13                                               ` Eli Zaretskii
  2021-03-04 14:24                                                 ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-04 14:13 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton

> From: Andrea Corallo <akrl@sdf.org>
> Cc: 46256@debbugs.gnu.org, eliz@gnu.org, andrewjmoreton@gmail.com
> Date: Thu, 04 Mar 2021 11:54:23 +0000
> 
> Just to report, this morning I've used a bit Emacs 32bit wide-int and as
> of 6444f69de2 seems to work fine here (some or org and C file editing),
> also the compiler testsuite is passing clean.

Did you successfully native-compiled subr-x.el?  If you did, the
problem is probably Windows specific.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-04 14:13                                               ` Eli Zaretskii
@ 2021-03-04 14:24                                                 ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-04 14:49                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-04 14:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: 46256@debbugs.gnu.org, eliz@gnu.org, andrewjmoreton@gmail.com
>> Date: Thu, 04 Mar 2021 11:54:23 +0000
>> 
>> Just to report, this morning I've used a bit Emacs 32bit wide-int and as
>> of 6444f69de2 seems to work fine here (some or org and C file editing),
>> also the compiler testsuite is passing clean.
>
> Did you successfully native-compiled subr-x.el?  If you did, the
> problem is probably Windows specific.

Yes subr-x.el is compiled and the eln it's loaded as well.

I'll keep on using it for what I can and see if something pops-up,
that's still possible.

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-04 14:24                                                 ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-04 14:49                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-04 17:24                                                     ` Eli Zaretskii
  2021-03-05 13:52                                                     ` Eli Zaretskii
  0 siblings, 2 replies; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-04 14:49 UTC (permalink / raw)
  To: 46256; +Cc: eliz, andrewjmoreton

Andrea Corallo via "Bug reports for GNU Emacs, the Swiss army knife of
text editors" <bug-gnu-emacs@gnu.org> writes:

> I'll keep on using it for what I can and see if something pops-up,
> that's still possible.

Exactly...

I've a reproducer that is most luckily due to the same issue you are
observing:

emacs -batch -l comp -f batch-native-compile .../emacs/lisp/progmodes/cc-engine.el

GC kicks-in and we end-up marking #<subr c-string-list-p>, we try then
to mark its compilation unit but we segfault (backtrace below).

Will look more into this as soon as I can.

  Andrea

(gdb) bt
#0  0x081ccce3 in symbol_marked_p (s=0x110a02e0) at alloc.c:3982
#1  0x081d1053 in mark_object (arg=XIL(0x8a15f3008a6e8c0)) at alloc.c:6775
#2  0x081d0fe3 in mark_object (arg=XIL(0xa000000008986f10)) at alloc.c:6754
#3  0x081d107f in mark_object (arg=XIL(0xc000000008a78510)) at alloc.c:6781
#4  0x081d1095 in mark_object (arg=XIL(0x351b38)) at alloc.c:6782
#5  0x081d122c in mark_object (arg=XIL(0xc00000000899b4a0)) at alloc.c:6828
#6  0x081d122c in mark_object (arg=XIL(0xc00000000899b470)) at alloc.c:6828
#7  0x081d122c in mark_object (arg=XIL(0xc00000000899b160)) at alloc.c:6828
#8  0x081d10d9 in mark_object (arg=XIL(0x304c78)) at alloc.c:6785
#9  0x081d122c in mark_object (arg=XIL(0xc000000008935960)) at alloc.c:6828
#10 0x081d122c in mark_object (arg=XIL(0xc000000008935420)) at alloc.c:6828
#11 0x081d10d9 in mark_object (arg=XIL(0x273fa0)) at alloc.c:6785
#12 0x081d0d97 in mark_objects (obj=0x89366f8, n=333) at alloc.c:6575
[...]
#979 0x081d1024 in mark_object (arg=XIL(0xa0000000086cc2c0)) at alloc.c:6766
#980 0x081d0fe3 in mark_object (arg=XIL(0xa00000000884a410)) at alloc.c:6754
#981 0x081d107f in mark_object (arg=XIL(0xacd34d78)) at alloc.c:6781
#982 0x081d122c in mark_object (arg=XIL(0xc0000000086d6260)) at alloc.c:6828
#983 0x081d1095 in mark_object (arg=XIL(0x5f78)) at alloc.c:6782
#984 0x081d122c in mark_object (arg=XIL(0xc0000000b5918910)) at alloc.c:6828
#985 0x081d0fcd in mark_object (arg=XIL(0xa0000000b59188d4)) at alloc.c:6753
#986 0x081d107f in mark_object (arg=XIL(0x51b8)) at alloc.c:6781
#987 0x081cf4cf in mark_object_root_visitor (
    root_ptr=0x8629f6c <buffer_defaults+76>, type=GC_ROOT_BUFFER_LOCAL_DEFAULT,
    data=0x0) at alloc.c:5907
#988 0x081cf3dd in visit_vectorlike_root (visitor=...,
    ptr=0x8629f20 <buffer_defaults>, type=GC_ROOT_BUFFER_LOCAL_DEFAULT)
    at alloc.c:5858
#989 0x081cf40a in visit_buffer_root (visitor=...,
    buffer=0x8629f20 <buffer_defaults>, type=GC_ROOT_BUFFER_LOCAL_DEFAULT)
    at alloc.c:5873
#990 0x081cf428 in visit_static_gc_roots (visitor=...) at alloc.c:5885
#991 0x081cfb2d in garbage_collect () at alloc.c:6105
#992 0x081cf8c0 in maybe_garbage_collect () at alloc.c:6018
#993 0x08200031 in maybe_gc () at lisp.h:5124
#994 0x0820825d in Ffuncall (nargs=2, args=0xbfffbfe0) at eval.c:2993
#995 0x082077b7 in call1 (fn=XIL(0xa000000008a11d18), arg1=XIL(0xc000000008b076f0))
    at eval.c:2869
#996 0x08218858 in mapcar1 (leni=352, vals=0xbfffc0d0, fn=XIL(0xa000000008a11d18),
    seq=XIL(0xc000000008b07cf0)) at fns.c:2742
#997 0x08218e34 in Fmapcar (function=XIL(0xa000000008a11d18),
    sequence=XIL(0xc000000008b07cf0)) at fns.c:2798
#998 0xb425f1c5 in F627974652d636f6d70696c652d726563757273652d746f706c6576656c_byte_compile_recurse_toplevel_0 ()
   from /home/andcor03/emacs2/native-lisp/28.0.50-92e930fb/bytecomp-12882072-bfe84587.eln
#999 0x082087c6 in funcall_subr (subr=0x87ee840, numargs=2, args=0xbfffce40)
    at eval.c:3086
#1000 0x08208375 in Ffuncall (nargs=3, args=0xbfffce38) at eval.c:3009
#1001 0xb4270738 in F627974652d636f6d70696c652d746f706c6576656c2d66696c652d666f726d_byte_compile_toplevel_file_form_0 ()
   from /home/andcor03/emacs2/native-lisp/28.0.50-92e930fb/bytecomp-12882072-bfe84587.eln
#1002 0x0820879f in funcall_subr (subr=0x884a010, numargs=1, args=0xbfffd008)
    at eval.c:3084
#1003 0x08208375 in Ffuncall (nargs=2, args=0xbfffd000) at eval.c:3009
#1004 0xb426dfc8 in F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_43 ()
   from /home/andcor03/emacs2/native-lisp/28.0.50-92e930fb/bytecomp-12882072-bfe84587.eln
#1005 0x0820879f in funcall_subr (subr=0x86c8840, numargs=1, args=0xbfffd1e8)
    at eval.c:3084
#1006 0x08208375 in Ffuncall (nargs=2, args=0xbfffd1e0) at eval.c:3009
#1007 0xb426eddc in F627974652d636f6d70696c652d66726f6d2d627566666572_byte_compile_from_buffer_0 ()
   from /home/andcor03/emacs2/native-lisp/28.0.50-92e930fb/bytecomp-12882072-bfe84587.eln
#1008 0x0820879f in funcall_subr (subr=0x8849e50, numargs=1, args=0xbfffd438)
    at eval.c:3084
#1009 0x08208375 in Ffuncall (nargs=2, args=0xbfffd430) at eval.c:3009
#1010 0xb426b91a in F627974652d636f6d70696c652d66696c65_byte_compile_file_0 ()
   from /home/andcor03/emacs2/native-lisp/28.0.50-92e930fb/bytecomp-12882072-bfe84587.eln
#1011 0x082087c6 in funcall_subr (subr=0x8849dd0, numargs=1, args=0xbfffd608)
    at eval.c:3086
#1012 0x08208375 in Ffuncall (nargs=2, args=0xbfffd600) at eval.c:3009
#1013 0x0825e5b4 in exec_byte_code (bytestr=XIL(0x8000000008815760),
    vector=XIL(0xa0000000086b1828), maxdepth=make_fixnum(16),
    args_template=make_fixnum(257), nargs=1, args=0xbfffded0) at bytecode.c:632
#1014 0x08208c03 in fetch_and_exec_byte_code (fun=XIL(0xa0000000086b1968),
    syms_left=make_fixnum(257), nargs=1, args=0xbfffdec8) at eval.c:3133
#1015 0x08208fe9 in funcall_lambda (fun=XIL(0xa0000000086b1968), nargs=1,
    arg_vector=0xbfffdec8) at eval.c:3214
#1016 0x082083d7 in Ffuncall (nargs=2, args=0xbfffdec0) at eval.c:3013
#1017 0x08206ca2 in Fapply (nargs=3, args=0xbfffdec0) at eval.c:2592
#1018 0x082086fa in funcall_subr (subr=0x85db400 <Sapply>, numargs=3,
    args=0xbfffdec0) at eval.c:3064
#1019 0x08208375 in Ffuncall (nargs=4, args=0xbfffdeb8) at eval.c:3009
#1020 0x0825e5b4 in exec_byte_code (bytestr=XIL(0x80000000b55aa5f8),
    vector=XIL(0xa0000000089994a0), maxdepth=make_fixnum(14),
    args_template=make_fixnum(385), nargs=1, args=0xbfffe4e0) at bytecode.c:632
#1021 0x08208c03 in fetch_and_exec_byte_code (fun=XIL(0xa0000000089984c8),
    syms_left=make_fixnum(385), nargs=1, args=0xbfffe4d8) at eval.c:3133
#1022 0x08208fe9 in funcall_lambda (fun=XIL(0xa0000000089984c8), nargs=1,
    arg_vector=0xbfffe4d8) at eval.c:3214
#1023 0x082083d7 in Ffuncall (nargs=2, args=0xbfffe4d0) at eval.c:3013
#1024 0xb43033fd in F636f6d702d7370696c6c2d6c6170_comp_spill_lap_0 ()
   from /home/andcor03/emacs2/native-lisp/28.0.50-92e930fb/comp-7672a6ed-2df580e9.eln
#1025 0x0820879f in funcall_subr (subr=0x89984f8, numargs=1, args=0xbfffe6c8)
    at eval.c:3084
#1026 0x08208375 in Ffuncall (nargs=2, args=0xbfffe6c0) at eval.c:3009
#1027 0xb434f53d in F636f6d702d2d6e61746976652d636f6d70696c65_comp__native_compile_0 ()
   from /home/andcor03/emacs2/native-lisp/28.0.50-92e930fb/comp-7672a6ed-2df580e9.eln
#1028 0x08208803 in funcall_subr (subr=0x89a70a8, numargs=1, args=0xbfffe8b0)
    at eval.c:3089
#1029 0x08208375 in Ffuncall (nargs=2, args=0xbfffe8a8) at eval.c:3009
#1030 0xb4350921 in F62617463682d6e61746976652d636f6d70696c65_batch_native_compile_0 ()
   from /home/andcor03/emacs2/native-lisp/28.0.50-92e930fb/comp-7672a6ed-2df580e9.eln
#1031 0x08208785 in funcall_subr (subr=0x89a71a8, numargs=0, args=0xbfffeb18)
    at eval.c:3082
#1032 0x08208375 in Ffuncall (nargs=1, args=0xbfffeb10) at eval.c:3009
#1033 0xb4a2b841 in F636f6d6d616e642d6c696e652d31_command_line_1_0 ()
   from /home/andcor03/emacs2/src/../native-lisp/28.0.50-92e930fb/startup-bbc6ea72-9be7c541.eln
#1034 0x0820879f in funcall_subr (subr=0xb55deb90, numargs=1, args=0xbfffeec8)
    at eval.c:3084
#1035 0x08208375 in Ffuncall (nargs=2, args=0xbfffeec0) at eval.c:3009
#1036 0xb4a2168d in F636f6d6d616e642d6c696e65_command_line_0 ()
   from /home/andcor03/emacs2/src/../native-lisp/28.0.50-92e930fb/startup-bbc6ea72-9be7c541.eln
#1037 0x08208785 in funcall_subr (subr=0xb54eccb0, numargs=0, args=0xbffff0b8)
    at eval.c:3082
#1038 0x08208375 in Ffuncall (nargs=1, args=0xbffff0b0) at eval.c:3009
#1039 0xb4a1c8ce in F6e6f726d616c2d746f702d6c6576656c_normal_top_level_0 ()
   from /home/andcor03/emacs2/src/../native-lisp/28.0.50-92e930fb/startup-bbc6ea72-9be7c541.eln
#1040 0x08206353 in eval_sub (form=XIL(0xc0000000b565b170)) at eval.c:2481
#1041 0x082059ee in Feval (form=XIL(0xc0000000b565b170), lexical=XIL(0))
    at eval.c:2313
#1042 0x081391f5 in top_level_2 () at keyboard.c:1103
#1043 0x08203347 in internal_condition_case (bfun=0x81391cc <top_level_2>,
    handlers=XIL(0x78), hfun=0x8138ab4 <cmd_error>) at eval.c:1448
#1044 0x08139268 in top_level_1 (ignore=XIL(0)) at keyboard.c:1111
#1045 0x082029e7 in internal_catch (tag=XIL(0xa410), func=0x81391fd <top_level_1>,
    arg=XIL(0)) at eval.c:1198
#1046 0x081390d6 in command_loop () at keyboard.c:1072
#1047 0x08138666 in recursive_edit_1 () at keyboard.c:720
#1048 0x08138841 in Frecursive_edit () at keyboard.c:789
#1049 0x08134e4a in main (argc=7, argv=0xbffff5f4) at emacs.c:2095

Lisp Backtrace:
"Automatic GC" (0x0)
0x8a11d18 PVEC_COMPILED
"byte-compile-recurse-toplevel" (0xbfffce40)
"byte-compile-toplevel-file-form" (0xbfffd008)
0x86c8840 PVEC_SUBR
"byte-compile-from-buffer" (0xbfffd438)
"byte-compile-file" (0xbfffd608)
0x86b1968 PVEC_COMPILED
"apply" (0xbfffdec0)
"comp-spill-lap-function" (0xbfffe4d8)
"comp-spill-lap" (0xbfffe6c8)
"comp--native-compile" (0xbfffe8b0)
"batch-native-compile" (0xbfffeb18)
"command-line-1" (0xbfffeec8)
"command-line" (0xbffff0b8)
"normal-top-level" (0xbffff168)
(gdb) 






^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-04 14:49                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-04 17:24                                                     ` Eli Zaretskii
  2021-03-04 18:56                                                       ` Eli Zaretskii
  2021-03-04 20:47                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-05 13:52                                                     ` Eli Zaretskii
  1 sibling, 2 replies; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-04 17:24 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256

When I build on MS-Windows, I see this:

  make -C src VCSWITNESS='$(srcdir)/../.git/logs/HEAD' BIN_DESTDIR='/d/usr/bin/' \

           ELN_DESTDIR='"/d/usr/lib/emacs/28.0.50/"' all
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Why is ELN_DESTDIR's value quoted twice? is that intentional?





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-04 17:24                                                     ` Eli Zaretskii
@ 2021-03-04 18:56                                                       ` Eli Zaretskii
  2021-03-04 20:11                                                         ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-04 21:30                                                         ` Andy Moreton
  2021-03-04 20:47                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 2 replies; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-04 18:56 UTC (permalink / raw)
  To: akrl; +Cc: 46256

I have a question about the build process of the native-comp branch:

Say I bootstrapped a fresh checkout without NATIVE_FULL_AOT=1, and I
now have a subdirectory under the native-lisp/ directory populated
with the *.eln files of the Lisp files we preload.

Now I make some change in Emacs that modifies the ABI hash, and
rebuild.  The previous subdirectory of native-lisp/ is no longer
valid; if I modify some of the preloaded Lisp files, a new .eln file
is produced in a new subdirectory of native-lisp/.  But now that new
subdirectory has only the *.eln files for those Lisp files I modified
_after_ the ABI-changing change.  Which means most of the preloaded
files do not have *.eln files in the native-lisp/ subdirectory that
corresponds to the latest ABI.  Does this mean Emacs now falls back to
using *.elc files when it produces the emacs.pdmp file?





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-04 18:56                                                       ` Eli Zaretskii
@ 2021-03-04 20:11                                                         ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-04 21:33                                                           ` Eli Zaretskii
  2021-03-04 21:30                                                         ` Andy Moreton
  1 sibling, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-04 20:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256

Eli Zaretskii <eliz@gnu.org> writes:

> I have a question about the build process of the native-comp branch:
>
> Say I bootstrapped a fresh checkout without NATIVE_FULL_AOT=1, and I
> now have a subdirectory under the native-lisp/ directory populated
> with the *.eln files of the Lisp files we preload.
>
> Now I make some change in Emacs that modifies the ABI hash, and
> rebuild.  The previous subdirectory of native-lisp/ is no longer
> valid; if I modify some of the preloaded Lisp files, a new .eln file
> is produced in a new subdirectory of native-lisp/.  But now that new
> subdirectory has only the *.eln files for those Lisp files I modified
> _after_ the ABI-changing change.  Which means most of the preloaded
> files do not have *.eln files in the native-lisp/ subdirectory that
> corresponds to the latest ABI.  Does this mean Emacs now falls back to
> using *.elc files when it produces the emacs.pdmp file?

Yes, I think so.  ATM if the ABI hash is modified something like 'make
bootstrap' is needed to re-build all .eln.

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-04 17:24                                                     ` Eli Zaretskii
  2021-03-04 18:56                                                       ` Eli Zaretskii
@ 2021-03-04 20:47                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 0 replies; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-04 20:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256

Eli Zaretskii <eliz@gnu.org> writes:

> When I build on MS-Windows, I see this:
>
>   make -C src VCSWITNESS='$(srcdir)/../.git/logs/HEAD' BIN_DESTDIR='/d/usr/bin/' \
>
>            ELN_DESTDIR='"/d/usr/lib/emacs/28.0.50/"' all
>                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Why is ELN_DESTDIR's value quoted twice? is that intentional?

I've no memory of that and to my test it works also removing it, so I
guess was really unintentional.

b9ccbac768 removes this double quote.

Thanks

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-04 18:56                                                       ` Eli Zaretskii
  2021-03-04 20:11                                                         ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-04 21:30                                                         ` Andy Moreton
  1 sibling, 0 replies; 179+ messages in thread
From: Andy Moreton @ 2021-03-04 21:30 UTC (permalink / raw)
  To: 46256

On Thu 04 Mar 2021, Eli Zaretskii wrote:

> I have a question about the build process of the native-comp branch:
>
> Say I bootstrapped a fresh checkout without NATIVE_FULL_AOT=1, and I
> now have a subdirectory under the native-lisp/ directory populated
> with the *.eln files of the Lisp files we preload.
>
> Now I make some change in Emacs that modifies the ABI hash, and
> rebuild.  The previous subdirectory of native-lisp/ is no longer
> valid; if I modify some of the preloaded Lisp files, a new .eln file
> is produced in a new subdirectory of native-lisp/.  But now that new
> subdirectory has only the *.eln files for those Lisp files I modified
> _after_ the ABI-changing change.  Which means most of the preloaded
> files do not have *.eln files in the native-lisp/ subdirectory that
> corresponds to the latest ABI.  Does this mean Emacs now falls back to
> using *.elc files when it produces the emacs.pdmp file?

Also, if you build out-of-tree for two different targets, the .elc files
are built for the first one, but the second target tree does not have a
native-lisp directory, and no eln files are built.

Both of these problems show that the build does not have the correct
dependencies yet.

    AndyM






^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-04 20:11                                                         ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-04 21:33                                                           ` Eli Zaretskii
  2021-03-05  9:32                                                             ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-04 21:33 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256

> From: Andrea Corallo <akrl@sdf.org>
> Cc: 46256@debbugs.gnu.org
> Date: Thu, 04 Mar 2021 20:11:27 +0000
> 
> > Now I make some change in Emacs that modifies the ABI hash, and
> > rebuild.  The previous subdirectory of native-lisp/ is no longer
> > valid; if I modify some of the preloaded Lisp files, a new .eln file
> > is produced in a new subdirectory of native-lisp/.  But now that new
> > subdirectory has only the *.eln files for those Lisp files I modified
> > _after_ the ABI-changing change.  Which means most of the preloaded
> > files do not have *.eln files in the native-lisp/ subdirectory that
> > corresponds to the latest ABI.  Does this mean Emacs now falls back to
> > using *.elc files when it produces the emacs.pdmp file?
> 
> Yes, I think so.  ATM if the ABI hash is modified something like 'make
> bootstrap' is needed to re-build all .eln.

Ouch!  We should fix that, because making ABI-breaking changes in the
tree is a frequent case during development, and bootstrap removes all
the previous binaries, which is why I never bootstrap.

So currently the only way to fill up a newly created subdirectory of
native-lisp/ is to manually delete the *.elc files of all the files in
lisp.mk's $shortlisp list, is that sufficient?





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-04 21:33                                                           ` Eli Zaretskii
@ 2021-03-05  9:32                                                             ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-05 10:09                                                               ` Pip Cet
  2021-03-05 11:55                                                               ` Eli Zaretskii
  0 siblings, 2 replies; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-05  9:32 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: 46256@debbugs.gnu.org
>> Date: Thu, 04 Mar 2021 20:11:27 +0000
>> 
>> > Now I make some change in Emacs that modifies the ABI hash, and
>> > rebuild.  The previous subdirectory of native-lisp/ is no longer
>> > valid; if I modify some of the preloaded Lisp files, a new .eln file
>> > is produced in a new subdirectory of native-lisp/.  But now that new
>> > subdirectory has only the *.eln files for those Lisp files I modified
>> > _after_ the ABI-changing change.  Which means most of the preloaded
>> > files do not have *.eln files in the native-lisp/ subdirectory that
>> > corresponds to the latest ABI.  Does this mean Emacs now falls back to
>> > using *.elc files when it produces the emacs.pdmp file?
>> 
>> Yes, I think so.  ATM if the ABI hash is modified something like 'make
>> bootstrap' is needed to re-build all .eln.
>
> Ouch!  We should fix that, because making ABI-breaking changes in the
> tree is a frequent case during development, and bootstrap removes all
> the previous binaries, which is why I never bootstrap.
>
> So currently the only way to fill up a newly created subdirectory of
> native-lisp/ is to manually delete the *.elc files of all the files in
> lisp.mk's $shortlisp list, is that sufficient?

Yes I think so.

The trouble of using make for building such a system is that make is not
aware of the .eln filename, so it should be necessary to ask the Emacs
binary about that to create dynamically the precise (multiple target)
rule.  Not very practical IMO...

In the past I've experimented with making the elc .FORCE targets and
have the Emacs decide what to do, but the downside there is that for
each file that might need compilation Emacs has to start and often
decide that nothing has to be done because the .eln is already there...
As a consequence a make invocation that was supposed to do nothing
became considerably slower.

Another option would be to invoke Emacs only once passing to it the list
of the .el files to be compiled and the parallelism requested and have
Emacs do the job.  I think this might be easier and we have in the
codebase already the all that's needed for that.  The downside is that
we'd drift away from how the vanilla build is working.

Regards

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-05  9:32                                                             ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-05 10:09                                                               ` Pip Cet
  2021-03-05 10:19                                                                 ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-05 11:55                                                               ` Eli Zaretskii
  1 sibling, 1 reply; 179+ messages in thread
From: Pip Cet @ 2021-03-05 10:09 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256

On Fri, Mar 5, 2021 at 9:33 AM Andrea Corallo via Bug reports for GNU
Emacs, the Swiss army knife of text editors <bug-gnu-emacs@gnu.org>
wrote:
> Eli Zaretskii <eliz@gnu.org> writes:
> > So currently the only way to fill up a newly created subdirectory of
> > native-lisp/ is to manually delete the *.elc files of all the files in
> > lisp.mk's $shortlisp list, is that sufficient?
>
> Yes I think so.
>
> The trouble of using make for building such a system is that make is not
> aware of the .eln filename, so it should be necessary to ask the Emacs
> binary about that to create dynamically the precise (multiple target)
> rule.  Not very practical IMO...

I do wonder whether the whole filename scheme is really the best option.

IIUC, and that's a big if in this case, the main motivation for using
hashes in the .eln filenames is that dlopen() is broken and may return
the same handle for subsequent dlopen()s of the same name, even if the
underlying file changed in between.

Merely verifying that the ABI is correct could be done at runtime, so
that's no reason to keep a hash in the filename.

So my vague idea is this:

1. implement fixed_dlopen(), which keeps track of filenames that have
been opened and, if necessary, creates a temporary file and loads that
instead of its argument.
2. compile lisp/emacs-lisp/bytecomp.el to lisp/emacs-lisp/bytecomp.elc
and native-lisp/emacs-lisp/bytecomp.eln
3. add extra code in the top level function of each .eln to check that
the ABI is correct.

This would allow us to use standard make rules. It would also make
.eln filenames predictable. It might even draw someone's attention to
the fact that dlopen() is broken and make them fix it.

I'm probably missing other good reasons for the hashed filename scheme.

Pip





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-05 10:09                                                               ` Pip Cet
@ 2021-03-05 10:19                                                                 ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-06  1:47                                                                   ` Andy Moreton
  0 siblings, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-05 10:19 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, 46256

Pip Cet <pipcet@gmail.com> writes:

> On Fri, Mar 5, 2021 at 9:33 AM Andrea Corallo via Bug reports for GNU
> Emacs, the Swiss army knife of text editors <bug-gnu-emacs@gnu.org>
> wrote:
>> Eli Zaretskii <eliz@gnu.org> writes:
>> > So currently the only way to fill up a newly created subdirectory of
>> > native-lisp/ is to manually delete the *.elc files of all the files in
>> > lisp.mk's $shortlisp list, is that sufficient?
>>
>> Yes I think so.
>>
>> The trouble of using make for building such a system is that make is not
>> aware of the .eln filename, so it should be necessary to ask the Emacs
>> binary about that to create dynamically the precise (multiple target)
>> rule.  Not very practical IMO...
>
> I do wonder whether the whole filename scheme is really the best option.
>
> IIUC, and that's a big if in this case, the main motivation for using
> hashes in the .eln filenames is that dlopen() is broken and may return
> the same handle for subsequent dlopen()s of the same name, even if the
> underlying file changed in between.

Unfortunately this was only an unfortunate discover along the road...
this design predates that.

> Merely verifying that the ABI is correct could be done at runtime, so
> that's no reason to keep a hash in the filename.
>
> So my vague idea is this:
>
> 1. implement fixed_dlopen(), which keeps track of filenames that have
> been opened and, if necessary, creates a temporary file and loads that
> instead of its argument.
> 2. compile lisp/emacs-lisp/bytecomp.el to lisp/emacs-lisp/bytecomp.elc
> and native-lisp/emacs-lisp/bytecomp.eln

So it was at the beginning, I think we moved away from that before the
odd dlopen behavior.

> 3. add extra code in the top level function of each .eln to check that
> the ABI is correct.
>
> This would allow us to use standard make rules. It would also make
> .eln filenames predictable. It might even draw someone's attention to
> the fact that dlopen() is broken and make them fix it.
>
> I'm probably missing other good reasons for the hashed filename scheme.

Yep, this was discussed in length on emacs-devel, IIRC mainly on a long
standing thread called "native compilation the bird-eye view" (or
something close).

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-05  9:32                                                             ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-05 10:09                                                               ` Pip Cet
@ 2021-03-05 11:55                                                               ` Eli Zaretskii
  2021-03-05 13:56                                                                 ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-05 11:55 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256

> From: Andrea Corallo <akrl@sdf.org>
> Cc: 46256@debbugs.gnu.org
> Date: Fri, 05 Mar 2021 09:32:35 +0000
> 
> The trouble of using make for building such a system is that make is not
> aware of the .eln filename, so it should be necessary to ask the Emacs
> binary about that to create dynamically the precise (multiple target)
> rule.  Not very practical IMO...

Why can't we have a rule in the Makefile conditioned by
HAVE_NATIVE_COMP?

> Another option would be to invoke Emacs only once passing to it the list
> of the .el files to be compiled and the parallelism requested and have
> Emacs do the job.  I think this might be easier and we have in the
> codebase already the all that's needed for that.  The downside is that
> we'd drift away from how the vanilla build is working.

Each time we add another Emacs invocation in the build process, we
make the goal of supporting cross-build farther.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-04 14:49                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-04 17:24                                                     ` Eli Zaretskii
@ 2021-03-05 13:52                                                     ` Eli Zaretskii
  2021-03-05 14:04                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-05 13:52 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton

> From: Andrea Corallo <akrl@sdf.org>
> Cc: Eli Zaretskii <eliz@gnu.org>, 46256@debbugs.gnu.org,
>         andrewjmoreton@gmail.com
> Date: Thu, 04 Mar 2021 14:49:47 +0000
> 
> I've a reproducer that is most luckily due to the same issue you are
> observing:
> 
> emacs -batch -l comp -f batch-native-compile .../emacs/lisp/progmodes/cc-engine.el
> 
> GC kicks-in and we end-up marking #<subr c-string-list-p>, we try then
> to mark its compilation unit but we segfault (backtrace below).

AFAICT, the crash I see here, while compiling subr-x.el, is not inside
GC: gc_in_progress is zero when Emacs crashes.

To make debugging easier, I started Emacs like this:

  emacs -batch -l comp -f batch-byte-native-compile-for-bootstrap ../lisp/emacs-lisp/subr-x.el

(AFAIU, using batch-byte-native-compile-for-bootstrap is currently the
only way of invoking the native compilation in the same Emacs process,
not in an async subprocess, is that right?)

It crashes inside comp--compile-ctxt-to-file, and when it does, the C
stack seems to be smashed:

  Thread 1 received signal SIGSEGV, Segmentation fault.
  0x06acac3e in ?? ()
  (gdb) bt
  #0  0x06acac3e in ?? ()
  #1  0x00010101 in ?? ()
  Backtrace stopped: previous frame inner to this frame (corrupt stack?)

  Lisp Backtrace:
  "comp--compile-ctxt-to-file" (0x82ca78)
  "comp-compile-ctxt-to-file" (0x82cc88)
  "comp-final1" (0x82cfb0)
  "comp-final" (0x82d238)
  "comp--native-compile" (0x82d468)
  "batch-native-compile" (0x82d6a0)
  "batch-byte-native-compile-for-bootstrap" (0x82d908)
  "command-line-1" (0x82e360)
  "command-line" (0x82ef08)
  "normal-top-level" (0x82f630)

I then put a breakpoint in comp--compile-ctxt-to-file and stepped
through it.  This behaves erratically: if I just step with "next", it
seems to crash inside the call to gcc_jit_context_set_int_option,
here:

  gcc_jit_context_set_int_option (comp.ctxt,
				  GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL,
				  comp.speed < 0 ? 0
				  : (comp.speed > 3 ? 3 : comp.speed));

But if I "stepi" inside gcc_jit_int_option_optimization_level, it
somehow seems to return, proceeds to this line:

  gcc_jit_context_compile_to_file (comp.ctxt,
				   GCC_JIT_OUTPUT_KIND_DYNAMIC_LIBRARY,
				   SSDATA (tmp_file));

Then something goes wrong inside it: the backtrace shows bogus
addresses (note the "0xbaadf00d" thingies):

  0x0766f180 in ?? ()
  (gdb) bt
  #0  0x0766f180 in ?? ()
  #1  0x672f756e in ?? ()
  #2  0x652f7469 in ?? ()
  #3  0x7363616d in ?? ()
  #4  0x74616e2f in ?? ()
  #5  0x2d657669 in ?? ()
  #6  0x706d6f63 in ?? ()
  #7  0x74616e2f in ?? ()
  #8  0x2d657669 in ?? ()
  #9  0x7073696c in ?? ()
  #10 0x2e38322f in ?? ()
  #11 0x30352e30 in ?? ()
  #12 0x3238312d in ?? ()
  #13 0x65306335 in ?? ()
  #14 0x75732f32 in ?? ()
  #15 0x782d7262 in ?? ()
  #16 0x6432302d in ?? ()
  #17 0x33666566 in ?? ()
  #18 0x37312d32 in ?? ()
  #19 0x62656166 in ?? ()
  #20 0x47736431 in ?? ()
  #21 0x47305561 in ?? ()
  #22 0x6e6c652e in ?? ()
  #23 0x706d742e in ?? ()
  #24 0xbaadf000 in ?? ()
  #25 0xbaadf00d in ?? ()
  #26 0xbaadf00d in ?? ()
  #27 0xbaadf00d in ?? ()
  #28 0xbaadf00d in ?? ()
  #29 0xbaadf00d in ?? ()
  #30 0xbaadf00d in ?? ()
  #31 0xbaadf00d in ?? ()
  #32 0xbaadf00d in ?? ()
  #33 0xbaadf00d in ?? ()
  #34 0xbaadf00d in ?? ()
  #35 0xbaadf00d in ?? ()
  #36 0xbaadf00d in ?? ()
  #37 0xbaadf00d in ?? ()
  #38 0xbaadf00d in ?? ()
  #39 0xbaadf00d in ?? ()
  #40 0xbaadf00d in ?? ()
  #41 0xbaadf00d in ?? ()
  #42 0xbaadf00d in ?? ()
  #43 0xbaadf00d in ?? ()
  #44 0xbaadf00d in ?? ()
  #45 0xbaadf00d in ?? ()
  #46 0xbaadf00d in ?? ()
  #47 0xbaadf00d in ?? ()
  #48 0xbaadf00d in ?? ()
  #49 0xbaadf00d in ?? ()
  #50 0xbaadf00d in ?? ()
  #51 0xbaadf00d in ?? ()
  #52 0xbaadf00d in ?? ()
  #53 0xbaadf00d in ?? ()
  #54 0xbaadf00d in ?? ()
  #55 0xbaadf00d in ?? ()
  #56 0xbaadf00d in ?? ()
  #57 0xbaadf00d in ?? ()
  #58 0xbaadf00d in ?? ()
  #59 0xbaadf00d in ?? ()
  #60 0xbaadf00d in ?? ()
  #61 0xbaadf00d in ?? ()
  #62 0xbaadf00d in ?? ()
  #63 0xbaadf00d in ?? ()
  #64 0xbaadf00d in ?? ()
  #65 0xbaadf00d in ?? ()
  #66 0xbaadf00d in ?? ()
  #67 0xbaadf00d in ?? ()
  #68 0xbaadf00d in ?? ()
  #69 0xbaadf00d in ?? ()
  #70 0xbaadf00d in ?? ()
  #71 0xbaadf00d in ?? ()
  #72 0xbaadf00d in ?? ()
  #73 0xbaadf00d in ?? ()
  #74 0xbaadf00d in ?? ()
  #75 0xbaadf00d in ?? ()
  #76 0xbaadf00d in ?? ()
  #77 0xbaadf00d in ?? ()
  #78 0xbaadf00d in ?? ()
  #79 0xbaadf00d in ?? ()
  #80 0xbaadf00d in ?? ()
  #81 0xbaadf00d in ?? ()
  #82 0xbaadf00d in ?? ()
  #83 0xbaadf00d in ?? ()
  #84 0xbaadf00d in ?? ()
  #85 0xbaadf00d in ?? ()
  #86 0xbaadf00d in ?? ()
  #87 0xbaadf00d in ?? ()
  #88 0xbaadf00d in ?? ()
  #89 0xbaadf00d in ?? ()
  #90 0xbaadf00d in ?? ()
  #91 0xbaadf00d in ?? ()
  #92 0xbaadf00d in ?? ()
  #93 0xbaadf00d in ?? ()
  #94 0xbaadf00d in ?? ()
  #95 0xbaadf00d in ?? ()
  #96 0xbaadf00d in ?? ()
  #97 0xbaadf00d in ?? ()
  #98 0xbaadf00d in ?? ()
  #99 0xbaadf00d in ?? ()
  #100 0xbaadf00d in ?? ()
  #101 0xbaadf00d in ?? ()
  #102 0xbaadf00d in ?? ()
  #103 0xbaadf00d in ?? ()
  #104 0xbaadf00d in ?? ()
  #105 0xbaadf00d in ?? ()
  #106 0xbaadf00d in ?? ()
  #107 0xbaadf00d in ?? ()
  #108 0xbaadf00d in ?? ()
  #109 0xbaadf00d in ?? ()
  #110 0xbaadf00d in ?? ()
  #111 0xbaadf00d in ?? ()
  #112 0xbaadf00d in ?? ()
  #113 0xbaadf00d in ?? ()
  #114 0xbaadf00d in ?? ()
  #115 0xbaadf00d in ?? ()
  #116 0xbaadf00d in ?? ()
  #117 0xbaadf00d in ?? ()
  #118 0xbaadf00d in ?? ()
  #119 0xbaadf00d in ?? ()
  #120 0xbaadf00d in ?? ()
  #121 0xbaadf00d in ?? ()
  #122 0xbaadf00d in ?? ()
  #123 0xbaadf00d in ?? ()
  #124 0xbaadf00d in ?? ()
  #125 0xbaadf00d in ?? ()
  #126 0xbaadf00d in ?? ()
  #127 0xbaadf00d in ?? ()
  #128 0xbaadf00d in ?? ()
  #129 0xbaadf00d in ?? ()
  #130 0xbaadf00d in ?? ()
  #131 0xbaadf00d in ?? ()
  #132 0xbaadf00d in ?? ()
  #133 0xbaadf00d in ?? ()
  #134 0xbaadf00d in ?? ()
  #135 0xbaadf00d in ?? ()
  #136 0xbaadf00d in ?? ()
  #137 0xbaadf00d in ?? ()
  #138 0xbaadf00d in ?? ()
  #139 0xbaadf00d in ?? ()
  #140 0xbaadf00d in ?? ()
  #141 0xbaadf00d in ?? ()
  #142 0xbaadf00d in ?? ()
  #143 0xbaadf00d in ?? ()
  #144 0xbaadf00d in ?? ()
  #145 0xbaadf00d in ?? ()
  #146 0xbaadf00d in ?? ()
  #147 0xbaadf00d in ?? ()
  #148 0xbaadf00d in ?? ()
  #149 0xbaadf00d in ?? ()
  #150 0xbaadf00d in ?? ()
  #151 0xbaadf00d in ?? ()
  #152 0xbaadf00d in ?? ()
  #153 0xbaadf00d in ?? ()
  #154 0xbaadf00d in ?? ()
  #155 0xbaadf00d in ?? ()
  #156 0xbaadf00d in ?? ()
  #157 0xbaadf00d in ?? ()
  #158 0xbaadf00d in ?? ()
  #159 0xbaadf00d in ?? ()
  #160 0xbaadf00d in ?? ()
  #161 0xbaadf00d in ?? ()
  #162 0xbaadf00d in ?? ()
  #163 0xbaadf00d in ?? ()
  #164 0xbaadf00d in ?? ()
  #165 0xbaadf00d in ?? ()
  #166 0xbaadf00d in ?? ()
  #167 0xbaadf00d in ?? ()
  #168 0xbaadf00d in ?? ()
  #169 0xbaadf00d in ?? ()
  #170 0xbaadf00d in ?? ()
  #171 0xbaadf00d in ?? ()
  #172 0xbaadf00d in ?? ()
  #173 0xbaadf00d in ?? ()
  #174 0xbaadf00d in ?? ()
  #175 0xbaadf00d in ?? ()
  #176 0xbaadf00d in ?? ()
  #177 0xbaadf00d in ?? ()
  #178 0xbaadf00d in ?? ()
  #179 0xbaadf00d in ?? ()
  #180 0xbaadf00d in ?? ()
  #181 0xbaadf00d in ?? ()
  #182 0xbaadf00d in ?? ()
  #183 0xbaadf00d in ?? ()
  #184 0xbaadf00d in ?? ()
  #185 0xbaadf00d in ?? ()
  #186 0xbaadf00d in ?? ()
  #187 0xbaadf00d in ?? ()
  #188 0xbaadf00d in ?? ()
  #189 0xbaadf00d in ?? ()
  #190 0xbaadf00d in ?? ()
  #191 0xbaadf00d in ?? ()
  #192 0xbaadf00d in ?? ()
  #193 0xbaadf00d in ?? ()
  #194 0xbaadf00d in ?? ()
  #195 0xbaadf00d in ?? ()
  #196 0xbaadf00d in ?? ()
  #197 0xbaadf00d in ?? ()
  #198 0xbaadf00d in ?? ()
  #199 0xbaadf00d in ?? ()
  #200 0xbaadf00d in ?? ()
  #201 0xbaadf00d in ?? ()
  #202 0xbaadf00d in ?? ()
  #203 0xbaadf00d in ?? ()
  #204 0xbaadf00d in ?? ()
  #205 0xbaadf00d in ?? ()
  #206 0xbaadf00d in ?? ()
  #207 0xbaadf00d in ?? ()
  #208 0xbaadf00d in ?? ()
  #209 0xbaadf00d in ?? ()
  #210 0xbaadf00d in ?? ()
  #211 0xbaadf00d in ?? ()
  #212 0xbaadf00d in ?? ()
  #213 0xbaadf00d in ?? ()
  #214 0xbaadf00d in ?? ()
  #215 0xbaadf00d in ?? ()
  #216 0xbaadf00d in ?? ()
  #217 0xbaadf00d in ?? ()
  #218 0xbaadf00d in ?? ()
  #219 0xbaadf00d in ?? ()
  #220 0xbaadf00d in ?? ()
  #221 0xbaadf00d in ?? ()
  #222 0xbaadf00d in ?? ()
  #223 0xbaadf00d in ?? ()
  #224 0xbaadf00d in ?? ()
  #225 0xbaadf00d in ?? ()
  #226 0xbaadf00d in ?? ()
  #227 0xbaadf00d in ?? ()
  #228 0xbaadf00d in ?? ()
  #229 0xbaadf00d in ?? ()
  #230 0xbaadf00d in ?? ()
  #231 0xbaadf00d in ?? ()
  #232 0xabababab in ?? ()
  #233 0xabababab in ?? ()
  #234 0xfeeefeee in ?? ()
  #235 0x00000000 in ?? ()

Maybe if I "stepi" inside that last libgccjit function, I will be able
to advance more.  But something is definitely fishy here, and I'm not
sure what that is.  Ideas for further debugging are welcome.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-05 11:55                                                               ` Eli Zaretskii
@ 2021-03-05 13:56                                                                 ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-05 14:54                                                                   ` Eli Zaretskii
  0 siblings, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-05 13:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: 46256@debbugs.gnu.org
>> Date: Fri, 05 Mar 2021 09:32:35 +0000
>> 
>> The trouble of using make for building such a system is that make is not
>> aware of the .eln filename, so it should be necessary to ask the Emacs
>> binary about that to create dynamically the precise (multiple target)
>> rule.  Not very practical IMO...
>
> Why can't we have a rule in the Makefile conditioned by
> HAVE_NATIVE_COMP?

We certainly can, the difficult part is to generate the rule as the .eln
filename is known only by the Emacs binary.  I'm probably missing
something.

>> Another option would be to invoke Emacs only once passing to it the list
>> of the .el files to be compiled and the parallelism requested and have
>> Emacs do the job.  I think this might be easier and we have in the
>> codebase already the all that's needed for that.  The downside is that
>> we'd drift away from how the vanilla build is working.
>
> Each time we add another Emacs invocation in the build process, we
> make the goal of supporting cross-build farther.

Point taken.

[ To be considered also that as of today libgccjit is not meant to work
for cross compilation. ]

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-05 13:52                                                     ` Eli Zaretskii
@ 2021-03-05 14:04                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-05 15:00                                                         ` Eli Zaretskii
  0 siblings, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-05 14:04 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: Eli Zaretskii <eliz@gnu.org>, 46256@debbugs.gnu.org,
>>         andrewjmoreton@gmail.com
>> Date: Thu, 04 Mar 2021 14:49:47 +0000
>> 
>> I've a reproducer that is most luckily due to the same issue you are
>> observing:
>> 
>> emacs -batch -l comp -f batch-native-compile .../emacs/lisp/progmodes/cc-engine.el
>> 
>> GC kicks-in and we end-up marking #<subr c-string-list-p>, we try then
>> to mark its compilation unit but we segfault (backtrace below).
>
> AFAICT, the crash I see here, while compiling subr-x.el, is not inside
> GC: gc_in_progress is zero when Emacs crashes.
>
> To make debugging easier, I started Emacs like this:
>
>   emacs -batch -l comp -f batch-byte-native-compile-for-bootstrap ../lisp/emacs-lisp/subr-x.el
>
> (AFAIU, using batch-byte-native-compile-for-bootstrap is currently the
> only way of invoking the native compilation in the same Emacs process,
> not in an async subprocess, is that right?)

Correct

> It crashes inside comp--compile-ctxt-to-file, and when it does, the C
> stack seems to be smashed:
>
>   Thread 1 received signal SIGSEGV, Segmentation fault.
>   0x06acac3e in ?? ()
>   (gdb) bt
>   #0  0x06acac3e in ?? ()
>   #1  0x00010101 in ?? ()
>   Backtrace stopped: previous frame inner to this frame (corrupt stack?)
>
>   Lisp Backtrace:
>   "comp--compile-ctxt-to-file" (0x82ca78)
>   "comp-compile-ctxt-to-file" (0x82cc88)
>   "comp-final1" (0x82cfb0)
>   "comp-final" (0x82d238)
>   "comp--native-compile" (0x82d468)
>   "batch-native-compile" (0x82d6a0)
>   "batch-byte-native-compile-for-bootstrap" (0x82d908)
>   "command-line-1" (0x82e360)
>   "command-line" (0x82ef08)
>   "normal-top-level" (0x82f630)
>
> I then put a breakpoint in comp--compile-ctxt-to-file and stepped
> through it.  This behaves erratically: if I just step with "next", it
> seems to crash inside the call to gcc_jit_context_set_int_option,
> here:
>
>   gcc_jit_context_set_int_option (comp.ctxt,
> 				  GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL,
> 				  comp.speed < 0 ? 0
> 				  : (comp.speed > 3 ? 3 : comp.speed));
>
> But if I "stepi" inside gcc_jit_int_option_optimization_level, it
> somehow seems to return, proceeds to this line:
>
>   gcc_jit_context_compile_to_file (comp.ctxt,
> 				   GCC_JIT_OUTPUT_KIND_DYNAMIC_LIBRARY,
> 				   SSDATA (tmp_file));
>
> Then something goes wrong inside it: the backtrace shows bogus
> addresses (note the "0xbaadf00d" thingies):
>
>   0x0766f180 in ?? ()
>   (gdb) bt
>   #0  0x0766f180 in ?? ()
>   #1  0x672f756e in ?? ()
>   #2  0x652f7469 in ?? ()
>   #3  0x7363616d in ?? ()
>   #4  0x74616e2f in ?? ()
>   #5  0x2d657669 in ?? ()
>   #6  0x706d6f63 in ?? ()
>   #7  0x74616e2f in ?? ()
>   #8  0x2d657669 in ?? ()
>   #9  0x7073696c in ?? ()
>   #10 0x2e38322f in ?? ()
>   #11 0x30352e30 in ?? ()
>   #12 0x3238312d in ?? ()
>   #13 0x65306335 in ?? ()
>   #14 0x75732f32 in ?? ()
>   #15 0x782d7262 in ?? ()
>   #16 0x6432302d in ?? ()
>   #17 0x33666566 in ?? ()
>   #18 0x37312d32 in ?? ()
>   #19 0x62656166 in ?? ()
>   #20 0x47736431 in ?? ()
>   #21 0x47305561 in ?? ()
>   #22 0x6e6c652e in ?? ()
>   #23 0x706d742e in ?? ()
>   #24 0xbaadf000 in ?? ()
>   #25 0xbaadf00d in ?? ()
>   #26 0xbaadf00d in ?? ()
>   #27 0xbaadf00d in ?? ()
>   #28 0xbaadf00d in ?? ()
>   #29 0xbaadf00d in ?? ()
>   #30 0xbaadf00d in ?? ()
>   #31 0xbaadf00d in ?? ()
>   #32 0xbaadf00d in ?? ()
>   #33 0xbaadf00d in ?? ()
>   #34 0xbaadf00d in ?? ()
>   #35 0xbaadf00d in ?? ()
>   #36 0xbaadf00d in ?? ()
>   #37 0xbaadf00d in ?? ()
>   #38 0xbaadf00d in ?? ()
>   #39 0xbaadf00d in ?? ()
>   #40 0xbaadf00d in ?? ()
>   #41 0xbaadf00d in ?? ()
>   #42 0xbaadf00d in ?? ()
>   #43 0xbaadf00d in ?? ()
>   #44 0xbaadf00d in ?? ()
>   #45 0xbaadf00d in ?? ()
>   #46 0xbaadf00d in ?? ()
>   #47 0xbaadf00d in ?? ()
>   #48 0xbaadf00d in ?? ()
>   #49 0xbaadf00d in ?? ()
>   #50 0xbaadf00d in ?? ()
>   #51 0xbaadf00d in ?? ()
>   #52 0xbaadf00d in ?? ()
>   #53 0xbaadf00d in ?? ()
>   #54 0xbaadf00d in ?? ()
>   #55 0xbaadf00d in ?? ()
>   #56 0xbaadf00d in ?? ()
>   #57 0xbaadf00d in ?? ()
>   #58 0xbaadf00d in ?? ()
>   #59 0xbaadf00d in ?? ()
>   #60 0xbaadf00d in ?? ()
>   #61 0xbaadf00d in ?? ()
>   #62 0xbaadf00d in ?? ()
>   #63 0xbaadf00d in ?? ()
>   #64 0xbaadf00d in ?? ()
>   #65 0xbaadf00d in ?? ()
>   #66 0xbaadf00d in ?? ()
>   #67 0xbaadf00d in ?? ()
>   #68 0xbaadf00d in ?? ()
>   #69 0xbaadf00d in ?? ()
>   #70 0xbaadf00d in ?? ()
>   #71 0xbaadf00d in ?? ()
>   #72 0xbaadf00d in ?? ()
>   #73 0xbaadf00d in ?? ()
>   #74 0xbaadf00d in ?? ()
>   #75 0xbaadf00d in ?? ()
>   #76 0xbaadf00d in ?? ()
>   #77 0xbaadf00d in ?? ()
>   #78 0xbaadf00d in ?? ()
>   #79 0xbaadf00d in ?? ()
>   #80 0xbaadf00d in ?? ()
>   #81 0xbaadf00d in ?? ()
>   #82 0xbaadf00d in ?? ()
>   #83 0xbaadf00d in ?? ()
>   #84 0xbaadf00d in ?? ()
>   #85 0xbaadf00d in ?? ()
>   #86 0xbaadf00d in ?? ()
>   #87 0xbaadf00d in ?? ()
>   #88 0xbaadf00d in ?? ()
>   #89 0xbaadf00d in ?? ()
>   #90 0xbaadf00d in ?? ()
>   #91 0xbaadf00d in ?? ()
>   #92 0xbaadf00d in ?? ()
>   #93 0xbaadf00d in ?? ()
>   #94 0xbaadf00d in ?? ()
>   #95 0xbaadf00d in ?? ()
>   #96 0xbaadf00d in ?? ()
>   #97 0xbaadf00d in ?? ()
>   #98 0xbaadf00d in ?? ()
>   #99 0xbaadf00d in ?? ()
>   #100 0xbaadf00d in ?? ()
>   #101 0xbaadf00d in ?? ()
>   #102 0xbaadf00d in ?? ()
>   #103 0xbaadf00d in ?? ()
>   #104 0xbaadf00d in ?? ()
>   #105 0xbaadf00d in ?? ()
>   #106 0xbaadf00d in ?? ()
>   #107 0xbaadf00d in ?? ()
>   #108 0xbaadf00d in ?? ()
>   #109 0xbaadf00d in ?? ()
>   #110 0xbaadf00d in ?? ()
>   #111 0xbaadf00d in ?? ()
>   #112 0xbaadf00d in ?? ()
>   #113 0xbaadf00d in ?? ()
>   #114 0xbaadf00d in ?? ()
>   #115 0xbaadf00d in ?? ()
>   #116 0xbaadf00d in ?? ()
>   #117 0xbaadf00d in ?? ()
>   #118 0xbaadf00d in ?? ()
>   #119 0xbaadf00d in ?? ()
>   #120 0xbaadf00d in ?? ()
>   #121 0xbaadf00d in ?? ()
>   #122 0xbaadf00d in ?? ()
>   #123 0xbaadf00d in ?? ()
>   #124 0xbaadf00d in ?? ()
>   #125 0xbaadf00d in ?? ()
>   #126 0xbaadf00d in ?? ()
>   #127 0xbaadf00d in ?? ()
>   #128 0xbaadf00d in ?? ()
>   #129 0xbaadf00d in ?? ()
>   #130 0xbaadf00d in ?? ()
>   #131 0xbaadf00d in ?? ()
>   #132 0xbaadf00d in ?? ()
>   #133 0xbaadf00d in ?? ()
>   #134 0xbaadf00d in ?? ()
>   #135 0xbaadf00d in ?? ()
>   #136 0xbaadf00d in ?? ()
>   #137 0xbaadf00d in ?? ()
>   #138 0xbaadf00d in ?? ()
>   #139 0xbaadf00d in ?? ()
>   #140 0xbaadf00d in ?? ()
>   #141 0xbaadf00d in ?? ()
>   #142 0xbaadf00d in ?? ()
>   #143 0xbaadf00d in ?? ()
>   #144 0xbaadf00d in ?? ()
>   #145 0xbaadf00d in ?? ()
>   #146 0xbaadf00d in ?? ()
>   #147 0xbaadf00d in ?? ()
>   #148 0xbaadf00d in ?? ()
>   #149 0xbaadf00d in ?? ()
>   #150 0xbaadf00d in ?? ()
>   #151 0xbaadf00d in ?? ()
>   #152 0xbaadf00d in ?? ()
>   #153 0xbaadf00d in ?? ()
>   #154 0xbaadf00d in ?? ()
>   #155 0xbaadf00d in ?? ()
>   #156 0xbaadf00d in ?? ()
>   #157 0xbaadf00d in ?? ()
>   #158 0xbaadf00d in ?? ()
>   #159 0xbaadf00d in ?? ()
>   #160 0xbaadf00d in ?? ()
>   #161 0xbaadf00d in ?? ()
>   #162 0xbaadf00d in ?? ()
>   #163 0xbaadf00d in ?? ()
>   #164 0xbaadf00d in ?? ()
>   #165 0xbaadf00d in ?? ()
>   #166 0xbaadf00d in ?? ()
>   #167 0xbaadf00d in ?? ()
>   #168 0xbaadf00d in ?? ()
>   #169 0xbaadf00d in ?? ()
>   #170 0xbaadf00d in ?? ()
>   #171 0xbaadf00d in ?? ()
>   #172 0xbaadf00d in ?? ()
>   #173 0xbaadf00d in ?? ()
>   #174 0xbaadf00d in ?? ()
>   #175 0xbaadf00d in ?? ()
>   #176 0xbaadf00d in ?? ()
>   #177 0xbaadf00d in ?? ()
>   #178 0xbaadf00d in ?? ()
>   #179 0xbaadf00d in ?? ()
>   #180 0xbaadf00d in ?? ()
>   #181 0xbaadf00d in ?? ()
>   #182 0xbaadf00d in ?? ()
>   #183 0xbaadf00d in ?? ()
>   #184 0xbaadf00d in ?? ()
>   #185 0xbaadf00d in ?? ()
>   #186 0xbaadf00d in ?? ()
>   #187 0xbaadf00d in ?? ()
>   #188 0xbaadf00d in ?? ()
>   #189 0xbaadf00d in ?? ()
>   #190 0xbaadf00d in ?? ()
>   #191 0xbaadf00d in ?? ()
>   #192 0xbaadf00d in ?? ()
>   #193 0xbaadf00d in ?? ()
>   #194 0xbaadf00d in ?? ()
>   #195 0xbaadf00d in ?? ()
>   #196 0xbaadf00d in ?? ()
>   #197 0xbaadf00d in ?? ()
>   #198 0xbaadf00d in ?? ()
>   #199 0xbaadf00d in ?? ()
>   #200 0xbaadf00d in ?? ()
>   #201 0xbaadf00d in ?? ()
>   #202 0xbaadf00d in ?? ()
>   #203 0xbaadf00d in ?? ()
>   #204 0xbaadf00d in ?? ()
>   #205 0xbaadf00d in ?? ()
>   #206 0xbaadf00d in ?? ()
>   #207 0xbaadf00d in ?? ()
>   #208 0xbaadf00d in ?? ()
>   #209 0xbaadf00d in ?? ()
>   #210 0xbaadf00d in ?? ()
>   #211 0xbaadf00d in ?? ()
>   #212 0xbaadf00d in ?? ()
>   #213 0xbaadf00d in ?? ()
>   #214 0xbaadf00d in ?? ()
>   #215 0xbaadf00d in ?? ()
>   #216 0xbaadf00d in ?? ()
>   #217 0xbaadf00d in ?? ()
>   #218 0xbaadf00d in ?? ()
>   #219 0xbaadf00d in ?? ()
>   #220 0xbaadf00d in ?? ()
>   #221 0xbaadf00d in ?? ()
>   #222 0xbaadf00d in ?? ()
>   #223 0xbaadf00d in ?? ()
>   #224 0xbaadf00d in ?? ()
>   #225 0xbaadf00d in ?? ()
>   #226 0xbaadf00d in ?? ()
>   #227 0xbaadf00d in ?? ()
>   #228 0xbaadf00d in ?? ()
>   #229 0xbaadf00d in ?? ()
>   #230 0xbaadf00d in ?? ()
>   #231 0xbaadf00d in ?? ()
>   #232 0xabababab in ?? ()
>   #233 0xabababab in ?? ()
>   #234 0xfeeefeee in ?? ()
>   #235 0x00000000 in ?? ()
>
> Maybe if I "stepi" inside that last libgccjit function, I will be able
> to advance more.  But something is definitely fishy here, and I'm not
> sure what that is.  Ideas for further debugging are welcome.

Not many here, just two Mr. obvious observations:

Recompiling comp.c at -O0 -g3 might help the broken stepping?  Is SSDATA
(tmp_file) containing something not meaningful or maybe suspicious?

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-05 13:56                                                                 ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-05 14:54                                                                   ` Eli Zaretskii
  2021-03-05 15:18                                                                     ` Pip Cet
  2021-03-05 15:26                                                                     ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 2 replies; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-05 14:54 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256

> From: Andrea Corallo <akrl@sdf.org>
> Cc: 46256@debbugs.gnu.org
> Date: Fri, 05 Mar 2021 13:56:24 +0000
> 
> > Why can't we have a rule in the Makefile conditioned by
> > HAVE_NATIVE_COMP?
> 
> We certainly can, the difficult part is to generate the rule as the .eln
> filename is known only by the Emacs binary.  I'm probably missing
> something.

Oh, you mean because of the ABI hash?  Yes, that'd preclude using Make
to decide when a .eln file needs to be regenerated.

> > Each time we add another Emacs invocation in the build process, we
> > make the goal of supporting cross-build farther.
> 
> Point taken.
> 
> [ To be considered also that as of today libgccjit is not meant to work
> for cross compilation. ]

Then perhaps we could invoke Emacs only in order to detect when the
ABI has changed.  Because when that happens, we need to regenerate all
the preloaded *.eln files anyway, so there's no need to test
individual files.  Right?





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-05 14:04                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-05 15:00                                                         ` Eli Zaretskii
  2021-03-05 15:56                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-05 15:00 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton

> From: Andrea Corallo <akrl@sdf.org>
> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> Date: Fri, 05 Mar 2021 14:04:58 +0000
> 
> > Maybe if I "stepi" inside that last libgccjit function, I will be able
> > to advance more.  But something is definitely fishy here, and I'm not
> > sure what that is.  Ideas for further debugging are welcome.
> 
> Not many here, just two Mr. obvious observations:
> 
> Recompiling comp.c at -O0 -g3 might help the broken stepping?

comp.c (and all of Emacs) is already compiled with those options, as
this is an unoptimized build.  And anyway, I'm stepping through the
code of libgccjit, not comp.c, when the crash happens.

> Is SSDATA (tmp_file) containing something not meaningful or maybe
> suspicious?

The file name is fine.

Thanks, I guess I will keep debugging this, then

(The garbled callstack hints on some memory-related issue.  Hmmm...)





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-05 14:54                                                                   ` Eli Zaretskii
@ 2021-03-05 15:18                                                                     ` Pip Cet
  2021-03-05 15:22                                                                       ` Eli Zaretskii
  2021-03-05 15:26                                                                     ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 179+ messages in thread
From: Pip Cet @ 2021-03-05 15:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, Andrea Corallo

On Fri, Mar 5, 2021 at 2:55 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Andrea Corallo <akrl@sdf.org>
> > Cc: 46256@debbugs.gnu.org
> > Date: Fri, 05 Mar 2021 13:56:24 +0000
> >
> > > Why can't we have a rule in the Makefile conditioned by
> > > HAVE_NATIVE_COMP?
> >
> > We certainly can, the difficult part is to generate the rule as the .eln
> > filename is known only by the Emacs binary.  I'm probably missing
> > something.
>
> Oh, you mean because of the ABI hash?  Yes, that'd preclude using Make
> to decide when a .eln file needs to be regenerated.

I think storing the ABI hash somewhere accessible in the build tree is
a good idea, anyway, and then we could do it with some make magic.

> > [ To be considered also that as of today libgccjit is not meant to work
> > for cross compilation. ]
>
> Then perhaps we could invoke Emacs only in order to detect when the
> ABI has changed.

IIUC, the ABI only changes when DEFUNs do, and then we regenerate most
of the Emacs binaries anyway, so we could make abi-hash depend on
gl-stamp/globals.h?

> Because when that happens, we need to regenerate all
> the preloaded *.eln files anyway, so there's no need to test
> individual files.  Right?

But do we want to keep the old files around in case the ABI changes
back? I don't think we do.

Pip





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-05 15:18                                                                     ` Pip Cet
@ 2021-03-05 15:22                                                                       ` Eli Zaretskii
  2021-03-05 15:54                                                                         ` Pip Cet
  0 siblings, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-05 15:22 UTC (permalink / raw)
  To: Pip Cet; +Cc: 46256, akrl

> From: Pip Cet <pipcet@gmail.com>
> Date: Fri, 5 Mar 2021 15:18:12 +0000
> Cc: Andrea Corallo <akrl@sdf.org>, 46256@debbugs.gnu.org
> 
> > Then perhaps we could invoke Emacs only in order to detect when the
> > ABI has changed.
> 
> IIUC, the ABI only changes when DEFUNs do, and then we regenerate most
> of the Emacs binaries anyway, so we could make abi-hash depend on
> gl-stamp/globals.h?

Why should we have the knowledge about what determines the ABI hash in
more than one place?

> > Because when that happens, we need to regenerate all
> > the preloaded *.eln files anyway, so there's no need to test
> > individual files.  Right?
> 
> But do we want to keep the old files around in case the ABI changes
> back? I don't think we do.

That's a separate issue.  It depends on whether the build included
non-preloaded files.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-05 14:54                                                                   ` Eli Zaretskii
  2021-03-05 15:18                                                                     ` Pip Cet
@ 2021-03-05 15:26                                                                     ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 0 replies; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-05 15:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: 46256@debbugs.gnu.org
>> Date: Fri, 05 Mar 2021 13:56:24 +0000
>> 
>> > Why can't we have a rule in the Makefile conditioned by
>> > HAVE_NATIVE_COMP?
>> 
>> We certainly can, the difficult part is to generate the rule as the .eln
>> filename is known only by the Emacs binary.  I'm probably missing
>> something.
>
> Oh, you mean because of the ABI hash?  Yes, that'd preclude using Make
> to decide when a .eln file needs to be regenerated.

Yep

>> > Each time we add another Emacs invocation in the build process, we
>> > make the goal of supporting cross-build farther.
>> 
>> Point taken.
>> 
>> [ To be considered also that as of today libgccjit is not meant to work
>> for cross compilation. ]
>
> Then perhaps we could invoke Emacs only in order to detect when the
> ABI has changed.  Because when that happens, we need to regenerate all
> the preloaded *.eln files anyway, so there's no need to test
> individual files.  Right?

Sounds good to me.

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-05 15:22                                                                       ` Eli Zaretskii
@ 2021-03-05 15:54                                                                         ` Pip Cet
  2021-03-05 18:44                                                                           ` Eli Zaretskii
  0 siblings, 1 reply; 179+ messages in thread
From: Pip Cet @ 2021-03-05 15:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, Andrea Corallo

On Fri, Mar 5, 2021 at 3:22 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Fri, 5 Mar 2021 15:18:12 +0000
> > Cc: Andrea Corallo <akrl@sdf.org>, 46256@debbugs.gnu.org
> >
> > > Then perhaps we could invoke Emacs only in order to detect when the
> > > ABI has changed.
> >
> > IIUC, the ABI only changes when DEFUNs do, and then we regenerate most
> > of the Emacs binaries anyway, so we could make abi-hash depend on
> > gl-stamp/globals.h?
>
> Why should we have the knowledge about what determines the ABI hash in
> more than one place?

At least in my case, I end up building several emacs binaries in a
tree, and then I have to ls -l native-lisp to find out which one is
current, and that's annoying. But Andrea pointed out, entirely
correctly, that I missed the discussion and a consensus has apparently
been reached to do it this way.

I do think it's weird to have to run Emacs to find out which
directories the Emacs binary looks at, but maybe that's just me.

> > > Because when that happens, we need to regenerate all
> > > the preloaded *.eln files anyway, so there's no need to test
> > > individual files.  Right?
> >
> > But do we want to keep the old files around in case the ABI changes
> > back? I don't think we do.
>
> That's a separate issue.

Indeed.

> It depends on whether the build included non-preloaded files.

I'm afraid I don't follow.

Pip





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-05 15:00                                                         ` Eli Zaretskii
@ 2021-03-05 15:56                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-05 18:46                                                             ` Eli Zaretskii
  0 siblings, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-05 15:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
>> Date: Fri, 05 Mar 2021 14:04:58 +0000
>> 
>> > Maybe if I "stepi" inside that last libgccjit function, I will be able
>> > to advance more.  But something is definitely fishy here, and I'm not
>> > sure what that is.  Ideas for further debugging are welcome.
>> 
>> Not many here, just two Mr. obvious observations:
>> 
>> Recompiling comp.c at -O0 -g3 might help the broken stepping?
>
> comp.c (and all of Emacs) is already compiled with those options, as
> this is an unoptimized build.  And anyway, I'm stepping through the
> code of libgccjit, not comp.c, when the crash happens.

If it's crashing in libgccjit that sounds like a libgccjit bug.  Just to
mention, using `comp-libgccjit-reproducer' might be helpful here to
produce a libgccjit only reproducer (assuming it manages to create one
before crashing).

If the reproducer is created I can have a look here to see if that's
reproducible if you like.

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-05 15:54                                                                         ` Pip Cet
@ 2021-03-05 18:44                                                                           ` Eli Zaretskii
  0 siblings, 0 replies; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-05 18:44 UTC (permalink / raw)
  To: Pip Cet; +Cc: 46256, akrl

> From: Pip Cet <pipcet@gmail.com>
> Date: Fri, 5 Mar 2021 15:54:07 +0000
> Cc: Andrea Corallo <akrl@sdf.org>, 46256@debbugs.gnu.org
> 
> > It depends on whether the build included non-preloaded files.
> 
> I'm afraid I don't follow.

Preloaded files are dumped into emacs.pdmp, so we don't need to keep
them to be able to run previously-built executables.  But files that
are not dumped are still valuable if you want to run those
executables.

All this assumed you value previously built binaries (which I do).





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-05 15:56                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-05 18:46                                                             ` Eli Zaretskii
  2021-03-05 19:22                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-05 18:46 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton

> From: Andrea Corallo <akrl@sdf.org>
> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> Date: Fri, 05 Mar 2021 15:56:54 +0000
> 
> If it's crashing in libgccjit that sounds like a libgccjit bug.

If so, the bug is a glaring one, because the crash smashes the stack.

> Just to mention, using `comp-libgccjit-reproducer' might be helpful
> here to produce a libgccjit only reproducer (assuming it manages to
> create one before crashing).
> 
> If the reproducer is created I can have a look here to see if that's
> reproducible if you like.

Where do I find instructions to create a reproducer?





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-05 18:46                                                             ` Eli Zaretskii
@ 2021-03-05 19:22                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-05 20:31                                                                 ` Eli Zaretskii
  2021-03-06 14:38                                                                 ` Eli Zaretskii
  0 siblings, 2 replies; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-05 19:22 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
>> Date: Fri, 05 Mar 2021 15:56:54 +0000
>> 
>> If it's crashing in libgccjit that sounds like a libgccjit bug.
>
> If so, the bug is a glaring one, because the crash smashes the stack.
>
>> Just to mention, using `comp-libgccjit-reproducer' might be helpful
>> here to produce a libgccjit only reproducer (assuming it manages to
>> create one before crashing).
>> 
>> If the reproducer is created I can have a look here to see if that's
>> reproducible if you like.
>
> Where do I find instructions to create a reproducer?

What we have as a doc is directly in the docstring of
`comp-libgccjit-reproducer', I guess we could improve it.

Essentially having it bound to t while compiling produces a C file
deposed where the .eln target directory.

This file ELNFILENAME_libgccjit_repro.c can be just compiled linking
against libgccjit to obtain the reproducer.

libgccjit should never segfault so if this crashes is clearly a bug.

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-05 19:22                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-05 20:31                                                                 ` Eli Zaretskii
  2021-03-05 22:25                                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-06 14:38                                                                 ` Eli Zaretskii
  1 sibling, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-05 20:31 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton

> From: Andrea Corallo <akrl@sdf.org>
> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> Date: Fri, 05 Mar 2021 19:22:34 +0000
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Where do I find instructions to create a reproducer?
> 
> What we have as a doc is directly in the docstring of
> `comp-libgccjit-reproducer', I guess we could improve it.
> 
> Essentially having it bound to t while compiling produces a C file
> deposed where the .eln target directory.
> 
> This file ELNFILENAME_libgccjit_repro.c can be just compiled linking
> against libgccjit to obtain the reproducer.
> 
> libgccjit should never segfault so if this crashes is clearly a bug.

Thanks, will do.

One more question: does our code arrange for libgccjit to free
heap-allocated buffers that Emacs allocates, or vice versa (libgccjit
allocates memory that Emacs then frees)?  And do we arrange for any
callbacks from libgccjit, i.e. does libgccjit call functions
implemented in Emacs?  If the answer to any of these questions is YES,
could you please point me to the relevant places in the code?





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-05 20:31                                                                 ` Eli Zaretskii
@ 2021-03-05 22:25                                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-06  7:39                                                                     ` Eli Zaretskii
  0 siblings, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-05 22:25 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
>> Date: Fri, 05 Mar 2021 19:22:34 +0000
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > Where do I find instructions to create a reproducer?
>> 
>> What we have as a doc is directly in the docstring of
>> `comp-libgccjit-reproducer', I guess we could improve it.
>> 
>> Essentially having it bound to t while compiling produces a C file
>> deposed where the .eln target directory.
>> 
>> This file ELNFILENAME_libgccjit_repro.c can be just compiled linking
>> against libgccjit to obtain the reproducer.
>> 
>> libgccjit should never segfault so if this crashes is clearly a bug.
>
> Thanks, will do.
>
> One more question: does our code arrange for libgccjit to free
> heap-allocated buffers that Emacs allocates, or vice versa (libgccjit
> allocates memory that Emacs then frees)?

No, in libgccjit we always copy the input buffers as soon as they are
passed, and only these copies are used and handled inside libgccjit
afterwards.

> And do we arrange for any
> callbacks from libgccjit, i.e. does libgccjit call functions
> implemented in Emacs?

No, libgccjit does not offer callbacks at its interface, all is simply
syncronous.

For these two reasons the reproducer (if produced) is typically a good
reproducer to debug in isolation any libgccjit issue.

Thanks

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-04  8:30                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-04 11:54                                             ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-06  0:33                                             ` Andy Moreton
  2021-03-06  7:42                                               ` Eli Zaretskii
  1 sibling, 1 reply; 179+ messages in thread
From: Andy Moreton @ 2021-03-06  0:33 UTC (permalink / raw)
  To: 46256

On Thu 04 Mar 2021, Andrea Corallo via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:

> Andrea Corallo via "Bug reports for GNU Emacs, the Swiss army knife of
> text editors" <bug-gnu-emacs@gnu.org> writes:
>
> [...]
>
>> PS ATM I see a crash too in my 32bit wide-int setup here, this is while
>> executing a top_level_run function loading a .eln file.  I need to
>> compile a more recent gdb to look into this further but it looks
>> something basic is going wrong there.
>
> Ok, I think this issue was that `comp-abi-hash' was not accounting for
> '--with-wide-int' and on my system a wide-int binary was loading a
> non-wide-int .eln.  With 6444f69de2 I added
> `system-configuration-options' as an input to the hash.
>
> This is a conservative choice, we may want to look only at
> '--with-wide-int' but I'm wondering if that's really the only sensitive
> input therefore having `system-configuration-options' in the equation
> looked safer to me at least for now.

I agree that it is a conservative choice, but that still misses features
that are enabled/disabled by default in the configury.

Thus the ABI hash should also include `system-configuration-features'.

    AndyM







^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-05 10:19                                                                 ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-06  1:47                                                                   ` Andy Moreton
  2021-03-06  9:54                                                                     ` Pip Cet
  0 siblings, 1 reply; 179+ messages in thread
From: Andy Moreton @ 2021-03-06  1:47 UTC (permalink / raw)
  To: 46256

On Fri 05 Mar 2021, Andrea Corallo via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:

> Pip Cet <pipcet@gmail.com> writes:
>
>> On Fri, Mar 5, 2021 at 9:33 AM Andrea Corallo via Bug reports for GNU
>> Emacs, the Swiss army knife of text editors <bug-gnu-emacs@gnu.org>
>> wrote:
>>> Eli Zaretskii <eliz@gnu.org> writes:
>>> > So currently the only way to fill up a newly created subdirectory of
>>> > native-lisp/ is to manually delete the *.elc files of all the files in
>>> > lisp.mk's $shortlisp list, is that sufficient?
>>>
>>> Yes I think so.
>>>
>>> The trouble of using make for building such a system is that make is not
>>> aware of the .eln filename, so it should be necessary to ask the Emacs
>>> binary about that to create dynamically the precise (multiple target)
>>> rule.  Not very practical IMO...
>>
>> I do wonder whether the whole filename scheme is really the best option.
>>
>> IIUC, and that's a big if in this case, the main motivation for using
>> hashes in the .eln filenames is that dlopen() is broken and may return
>> the same handle for subsequent dlopen()s of the same name, even if the
>> underlying file changed in between.
>
> Unfortunately this was only an unfortunate discover along the road...
> this design predates that.

Can you explain what the problem is with dlopen ? I have not found a
complete and precise description of the problem in earlier messages as a
reproducer.

Is the problem that dlopen resolves to use an unlinked file kept alive
by having open handles, rather than a new file with the filename used
by the old file before it was unlinked ?

>> Merely verifying that the ABI is correct could be done at runtime, so
>> that's no reason to keep a hash in the filename.
>>
>> So my vague idea is this:
>>
>> 1. implement fixed_dlopen(), which keeps track of filenames that have
>> been opened and, if necessary, creates a temporary file and loads that
>> instead of its argument.
>> 2. compile lisp/emacs-lisp/bytecomp.el to lisp/emacs-lisp/bytecomp.elc
>> and native-lisp/emacs-lisp/bytecomp.eln
>
> So it was at the beginning, I think we moved away from that before the
> odd dlopen behavior.

As above, this odd dlopen behaviour needs to be fully explained to
ensure that design choices are not driven by possible misunderstandings.

>> 3. add extra code in the top level function of each .eln to check that
>> the ABI is correct.
>>
>> This would allow us to use standard make rules. It would also make
>> .eln filenames predictable. It might even draw someone's attention to
>> the fact that dlopen() is broken and make them fix it.
>>
>> I'm probably missing other good reasons for the hashed filename scheme.
>
> Yep, this was discussed in length on emacs-devel, IIRC mainly on a long
> standing thread called "native compilation the bird-eye view" (or
> something close).

Thread "Native compilation: the bird-eye view" starts here:
https://lists.gnu.org/archive/html/emacs-devel/2020-05/msg02186.html

I agree with Pip that using standard make rules eases several development
pains and should be used if at all possible.

    AndyM






^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-05 22:25                                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-06  7:39                                                                     ` Eli Zaretskii
  0 siblings, 0 replies; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-06  7:39 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton

> From: Andrea Corallo <akrl@sdf.org>
> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> Date: Fri, 05 Mar 2021 22:25:17 +0000
> 
> > One more question: does our code arrange for libgccjit to free
> > heap-allocated buffers that Emacs allocates, or vice versa (libgccjit
> > allocates memory that Emacs then frees)?
> 
> No, in libgccjit we always copy the input buffers as soon as they are
> passed, and only these copies are used and handled inside libgccjit
> afterwards.
> 
> > And do we arrange for any
> > callbacks from libgccjit, i.e. does libgccjit call functions
> > implemented in Emacs?
> 
> No, libgccjit does not offer callbacks at its interface, all is simply
> syncronous.

Thanks, those 2 were places where potential bugs, in particular
Windows-specific ones, could hide.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06  0:33                                             ` Andy Moreton
@ 2021-03-06  7:42                                               ` Eli Zaretskii
  2021-03-06 12:09                                                 ` Andy Moreton
  0 siblings, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-06  7:42 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 46256

> From: Andy Moreton <andrewjmoreton@gmail.com>
> Date: Sat, 06 Mar 2021 00:33:10 +0000
> 
> > This is a conservative choice, we may want to look only at
> > '--with-wide-int' but I'm wondering if that's really the only sensitive
> > input therefore having `system-configuration-options' in the equation
> > looked safer to me at least for now.
> 
> I agree that it is a conservative choice, but that still misses features
> that are enabled/disabled by default in the configury.
> 
> Thus the ABI hash should also include `system-configuration-features'.

Which of the features you think might affect the ABI?  I reviewed them
a couple of days ago and didn't find any I could convince myself
should affect that, but maybe I missed something.

Or maybe we should ask Andrea to try to provide a more detailed
definition of what exactly constitutes the "ABI" in this case.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06  1:47                                                                   ` Andy Moreton
@ 2021-03-06  9:54                                                                     ` Pip Cet
  2021-03-06 10:30                                                                       ` Eli Zaretskii
  2021-03-06 12:15                                                                       ` Andy Moreton
  0 siblings, 2 replies; 179+ messages in thread
From: Pip Cet @ 2021-03-06  9:54 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 46256

On Sat, Mar 6, 2021 at 1:48 AM Andy Moreton <andrewjmoreton@gmail.com> wrote:
> On Fri 05 Mar 2021, Andrea Corallo via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:

[Does anyone know where this via "name" comes from? I believe it is
Google's joke which somehow makes it through into the gmail
interface...]

> Is the problem that dlopen resolves to use an unlinked file kept alive
> by having open handles, rather than a new file with the filename used
> by the old file before it was unlinked ?

I believe so, and that's what I think we can work around.

IIUC, we don't actually call dlclose() until we GC (and might not do
so even then, since GC is conservative).

> >> Merely verifying that the ABI is correct could be done at runtime, so
> >> that's no reason to keep a hash in the filename.
> >>
> >> So my vague idea is this:
> >>
> >> 1. implement fixed_dlopen(), which keeps track of filenames that have
> >> been opened and, if necessary, creates a temporary file and loads that
> >> instead of its argument.
> >> 2. compile lisp/emacs-lisp/bytecomp.el to lisp/emacs-lisp/bytecomp.elc
> >> and native-lisp/emacs-lisp/bytecomp.eln
> >
> > So it was at the beginning, I think we moved away from that before the
> > odd dlopen behavior.
>
> As above, this odd dlopen behaviour needs to be fully explained to
> ensure that design choices are not driven by possible misunderstandings.

I'm unsure what Andrea is saying here; is the dlopen thing relevant to
the decision to use hashes in names, or isn't it?

> >> 3. add extra code in the top level function of each .eln to check that
> >> the ABI is correct.
> >>
> >> This would allow us to use standard make rules. It would also make
> >> .eln filenames predictable. It might even draw someone's attention to
> >> the fact that dlopen() is broken and make them fix it.
> >>
> >> I'm probably missing other good reasons for the hashed filename scheme.
> >
> > Yep, this was discussed in length on emacs-devel, IIRC mainly on a long
> > standing thread called "native compilation the bird-eye view" (or
> > something close).
>
> Thread "Native compilation: the bird-eye view" starts here:
> https://lists.gnu.org/archive/html/emacs-devel/2020-05/msg02186.html

Thanks for the link. I do think it would be good to summarize the
reasons for the hash-based naming scheme somewhere, because I've read
the thread and all I've taken away is that the dlopen oddity requires
a workaround (but, really, a different one). I don't think I've read
all of the followup threads, though.

> I agree with Pip that using standard make rules eases several development
> pains and should be used if at all possible.

What I think should be discussed, or should have been discussed, is
whether we really need hashes in the names of files in the Emacs build
tree. Whether we need them for installed files, or for files in the
eln cache, is a separate issue.

A second question is whether it's really worth it to build the elc and
eln files at the same time. Make would be a lot happier not having two
targets in the rule, and so would I.

But if this has been discussed and resolved, we merely need to
document the decision and the reasons for it rather than reopening
discussion because I missed it the first time around. If someone can
provide a link to the relevant messages, I'd be glad to try.

Pip





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06  9:54                                                                     ` Pip Cet
@ 2021-03-06 10:30                                                                       ` Eli Zaretskii
  2021-03-06 12:15                                                                       ` Andy Moreton
  1 sibling, 0 replies; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-06 10:30 UTC (permalink / raw)
  To: Pip Cet; +Cc: 46256, andrewjmoreton

> From: Pip Cet <pipcet@gmail.com>
> Date: Sat, 6 Mar 2021 09:54:22 +0000
> Cc: 46256@debbugs.gnu.org
> 
> On Sat, Mar 6, 2021 at 1:48 AM Andy Moreton <andrewjmoreton@gmail.com> wrote:
> > On Fri 05 Mar 2021, Andrea Corallo via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:
> 
> [Does anyone know where this via "name" comes from? I believe it is
> Google's joke which somehow makes it through into the gmail
> interface...]

It's the GNU Mailman's maintainers reaction to the DMARC/DKIM lunacy.
See

  https://lists.gnu.org/archive/html/savannah-hackers-public/2019-06/msg00018.html





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06  7:42                                               ` Eli Zaretskii
@ 2021-03-06 12:09                                                 ` Andy Moreton
  2021-03-06 13:05                                                   ` Eli Zaretskii
  0 siblings, 1 reply; 179+ messages in thread
From: Andy Moreton @ 2021-03-06 12:09 UTC (permalink / raw)
  To: 46256

On Sat 06 Mar 2021, Eli Zaretskii wrote:

>> From: Andy Moreton <andrewjmoreton@gmail.com>
>> Date: Sat, 06 Mar 2021 00:33:10 +0000
>> 
>> > This is a conservative choice, we may want to look only at
>> > '--with-wide-int' but I'm wondering if that's really the only sensitive
>> > input therefore having `system-configuration-options' in the equation
>> > looked safer to me at least for now.
>> 
>> I agree that it is a conservative choice, but that still misses features
>> that are enabled/disabled by default in the configury.
>> 
>> Thus the ABI hash should also include `system-configuration-features'.
>
> Which of the features you think might affect the ABI?  I reviewed them
> a couple of days ago and didn't find any I could convince myself
> should affect that, but maybe I missed something.

As a reasonably recent example, adding bignum support by default added
"GMP" to `system-configuration-features' when rebuilding after pulling
upstream changes, without the user changing the configure command line
saved in `system-configuration-options'.

That kind of change may not happen often, but for developers building
commits near the switchover point (e.g. git bisect), it adds a silent
incompatibility that may be hard to diagnose.

> Or maybe we should ask Andrea to try to provide a more detailed
> definition of what exactly constitutes the "ABI" in this case.

Adding some design notes notes to the repo on this and other aspects of
the design would indeed be useful.

    AndyM






^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06  9:54                                                                     ` Pip Cet
  2021-03-06 10:30                                                                       ` Eli Zaretskii
@ 2021-03-06 12:15                                                                       ` Andy Moreton
  2021-03-06 13:10                                                                         ` Eli Zaretskii
  2021-03-07  9:22                                                                         ` Pip Cet
  1 sibling, 2 replies; 179+ messages in thread
From: Andy Moreton @ 2021-03-06 12:15 UTC (permalink / raw)
  To: 46256

On Sat 06 Mar 2021, Pip Cet wrote:

> On Sat, Mar 6, 2021 at 1:48 AM Andy Moreton <andrewjmoreton@gmail.com> wrote:
>> On Fri 05 Mar 2021, Andrea Corallo via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:
>
> [Does anyone know where this via "name" comes from? I believe it is
> Google's joke which somehow makes it through into the gmail
> interface...]
>
>> Is the problem that dlopen resolves to use an unlinked file kept alive
>> by having open handles, rather than a new file with the filename used
>> by the old file before it was unlinked ?
>
> I believe so, and that's what I think we can work around.
>
> IIUC, we don't actually call dlclose() until we GC (and might not do
> so even then, since GC is conservative).

In that case keeping the handles open is the real bug here, and it would
be better to focus on how to ensure that resources are released corectly.

Is there a similar issue in the dynamic modules interface ?

    AndyM






^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06 12:09                                                 ` Andy Moreton
@ 2021-03-06 13:05                                                   ` Eli Zaretskii
  2021-03-06 15:46                                                     ` Andy Moreton
  0 siblings, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-06 13:05 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 46256

> From: Andy Moreton <andrewjmoreton@gmail.com>
> Date: Sat, 06 Mar 2021 12:09:30 +0000
> 
> >> Thus the ABI hash should also include `system-configuration-features'.
> >
> > Which of the features you think might affect the ABI?  I reviewed them
> > a couple of days ago and didn't find any I could convince myself
> > should affect that, but maybe I missed something.
> 
> As a reasonably recent example, adding bignum support by default added
> "GMP" to `system-configuration-features' when rebuilding after pulling
> upstream changes, without the user changing the configure command line
> saved in `system-configuration-options'.

AFAIU, such changes only affect *.eln files if we add new primitives
to Emacs or change existing primitives, and those changes are already
captured by the ABI hash.  Right?





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06 12:15                                                                       ` Andy Moreton
@ 2021-03-06 13:10                                                                         ` Eli Zaretskii
  2021-03-06 15:18                                                                           ` Andy Moreton
  2021-03-07  9:22                                                                         ` Pip Cet
  1 sibling, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-06 13:10 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 46256

> From: Andy Moreton <andrewjmoreton@gmail.com>
> Date: Sat, 06 Mar 2021 12:15:27 +0000
> 
> > IIUC, we don't actually call dlclose() until we GC (and might not do
> > so even then, since GC is conservative).
> 
> In that case keeping the handles open is the real bug here, and it would
> be better to focus on how to ensure that resources are released corectly.
> 
> Is there a similar issue in the dynamic modules interface ?

Which problem is that?  At least on MS-Windows, a DLL remains open for
as long as the program that loaded it keeps running.  How is the
situation discussed here different?





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-05 19:22                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-05 20:31                                                                 ` Eli Zaretskii
@ 2021-03-06 14:38                                                                 ` Eli Zaretskii
  2021-03-06 15:35                                                                   ` Eli Zaretskii
  2021-03-06 18:30                                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 2 replies; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-06 14:38 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton

[-- Attachment #1: Type: text/plain, Size: 1663 bytes --]

> From: Andrea Corallo <akrl@sdf.org>
> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> Date: Fri, 05 Mar 2021 19:22:34 +0000
> 
> What we have as a doc is directly in the docstring of
> `comp-libgccjit-reproducer', I guess we could improve it.
> 
> Essentially having it bound to t while compiling produces a C file
> deposed where the .eln target directory.

The reproducer file is attached.  It is large, so I compressed it.
Let me know if you see there anything that could explain the problem.

> This file ELNFILENAME_libgccjit_repro.c can be just compiled linking
> against libgccjit to obtain the reproducer.

"To obtain the reproducer" meaning that the compiled and linked
program should crash in the same way is Emacs does?  I thought we
crash while compiling the file and linking it to produce a shared
library, not while running it.  Right?

> libgccjit should never segfault so if this crashes is clearly a bug.

Let's see what you can find in the reproducer file.

Meanwhile, I see that:

 . the file has DOS CRLF end-of-line format, because libgccjit opens
   the reproducer file in the default text mode
 . the file includes control characters: ^A, ^B, ^M, and others --
   what are those for?  I hope we cannot have ^Z there, because AFAIK
   text-mode writes will stop at the first such byte

More generally, what is the relation between the contents of the
reproducer file and the source the native compilation sees when we
call gcc_jit_context_compile_to_file?  Do we submit a similar file to
GCC or something?  IOW, can any weird characters I see in the
reproducer be relevant to the actual compilation (and thus to the
crash)?

Thanks.

[-- Attachment #2: subr-x-02dfef32-17faeb1d_libgccjit_repro.c.xz --]
[-- Type: application/octet-stream, Size: 179472 bytes --]

^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06 13:10                                                                         ` Eli Zaretskii
@ 2021-03-06 15:18                                                                           ` Andy Moreton
  2021-03-06 18:37                                                                             ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Andy Moreton @ 2021-03-06 15:18 UTC (permalink / raw)
  To: 46256

On Sat 06 Mar 2021, Eli Zaretskii wrote:

>> From: Andy Moreton <andrewjmoreton@gmail.com>
>> Date: Sat, 06 Mar 2021 12:15:27 +0000
>> 
>> > IIUC, we don't actually call dlclose() until we GC (and might not do
>> > so even then, since GC is conservative).
>> 
>> In that case keeping the handles open is the real bug here, and it would
>> be better to focus on how to ensure that resources are released corectly.
>> 
>> Is there a similar issue in the dynamic modules interface ?
>
> Which problem is that?  At least on MS-Windows, a DLL remains open for
> as long as the program that loaded it keeps running.  How is the
> situation discussed here different?

Keeping the DLL loaded happens with build-time linking, but this
discussion is about runtime-linking of shared libraries: dlopen, dlsym,
dlclose (or for Windows: LoadLibary, GetProcAddress, FreeLibrary).

We need to hear from Andrea to be sure of the precise details.

    AndyM






^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06 14:38                                                                 ` Eli Zaretskii
@ 2021-03-06 15:35                                                                   ` Eli Zaretskii
  2021-03-06 17:47                                                                     ` Eli Zaretskii
  2021-03-06 18:30                                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-06 15:35 UTC (permalink / raw)
  To: akrl; +Cc: 46256, andrewjmoreton

> Date: Sat, 06 Mar 2021 16:38:52 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> 
> > What we have as a doc is directly in the docstring of
> > `comp-libgccjit-reproducer', I guess we could improve it.
> > 
> > Essentially having it bound to t while compiling produces a C file
> > deposed where the .eln target directory.
> 
> The reproducer file is attached.  It is large, so I compressed it.
> Let me know if you see there anything that could explain the problem.

More info: if I set comp-speed to 0 or 1, the compilation of subr-x.el
doesn't crash.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06 13:05                                                   ` Eli Zaretskii
@ 2021-03-06 15:46                                                     ` Andy Moreton
  2021-03-06 19:31                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Andy Moreton @ 2021-03-06 15:46 UTC (permalink / raw)
  To: 46256

On Sat 06 Mar 2021, Eli Zaretskii wrote:

>> From: Andy Moreton <andrewjmoreton@gmail.com>
>> Date: Sat, 06 Mar 2021 12:09:30 +0000
>> 
>> >> Thus the ABI hash should also include `system-configuration-features'.
>> >
>> > Which of the features you think might affect the ABI?  I reviewed them
>> > a couple of days ago and didn't find any I could convince myself
>> > should affect that, but maybe I missed something.
>> 
>> As a reasonably recent example, adding bignum support by default added
>> "GMP" to `system-configuration-features' when rebuilding after pulling
>> upstream changes, without the user changing the configure command line
>> saved in `system-configuration-options'.
>
> AFAIU, such changes only affect *.eln files if we add new primitives
> to Emacs or change existing primitives, and those changes are already
> captured by the ABI hash.  Right?

Perhaps you are right, but we need some input from Andrea on whether
these changes affect the ABI.

    AndyM






^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06 15:35                                                                   ` Eli Zaretskii
@ 2021-03-06 17:47                                                                     ` Eli Zaretskii
  2021-03-06 18:31                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-06 17:47 UTC (permalink / raw)
  To: akrl; +Cc: 46256, andrewjmoreton

> Date: Sat, 06 Mar 2021 17:35:15 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> 
> > The reproducer file is attached.  It is large, so I compressed it.
> > Let me know if you see there anything that could explain the problem.
> 
> More info: if I set comp-speed to 0 or 1, the compilation of subr-x.el
> doesn't crash.

Here's the smallest part of subr-x which causes the crash:

;;; subr-x.el --- extra Lisp functions  -*- lexical-binding:t -*-

(defun internal--build-bindings (bindings)
  "Check and build conditional value forms for BINDINGS."
  (let ((prev-var t))
    (mapcar (lambda (binding)
              (let ((binding (internal--build-binding binding prev-var)))
                (setq prev-var (car binding))
                binding))
            bindings)))

Interestingly, if I remove the first line, there's no crash.  So
lexical-binding has something to do with this.

I cannot see what could trigger the crash.  The fact that 'binding' is
used both as an argument and as the variable which is bound to the
return value, perhaps?

Let me know if you want the C reproducer for this minimal file.

Thanks.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06 14:38                                                                 ` Eli Zaretskii
  2021-03-06 15:35                                                                   ` Eli Zaretskii
@ 2021-03-06 18:30                                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-06 18:44                                                                     ` Eli Zaretskii
  1 sibling, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-06 18:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
>> Date: Fri, 05 Mar 2021 19:22:34 +0000
>> 
>> What we have as a doc is directly in the docstring of
>> `comp-libgccjit-reproducer', I guess we could improve it.
>> 
>> Essentially having it bound to t while compiling produces a C file
>> deposed where the .eln target directory.
>
> The reproducer file is attached.  It is large, so I compressed it.
> Let me know if you see there anything that could explain the problem.
>
>> This file ELNFILENAME_libgccjit_repro.c can be just compiled linking
>> against libgccjit to obtain the reproducer.
>
> "To obtain the reproducer" meaning that the compiled and linked
> program should crash in the same way is Emacs does?  I thought we
> crash while compiling the file and linking it to produce a shared
> library, not while running it.  Right?

Yes, the compiled program when executed will replay the same compilation
attempted by Emacs and therefore if is a libgccjit fault it should
crash.

Does the reproducer crash when executed on your system?

>> libgccjit should never segfault so if this crashes is clearly a bug.
>
> Let's see what you can find in the reproducer file.

Thanks, I'm going to have a look if I can reproduce here.

> Meanwhile, I see that:
>
>  . the file has DOS CRLF end-of-line format, because libgccjit opens
>    the reproducer file in the default text mode
>  . the file includes control characters: ^A, ^B, ^M, and others --
>    what are those for?  I hope we cannot have ^Z there, because AFAIK
>    text-mode writes will stop at the first such byte
>
> More generally, what is the relation between the contents of the
> reproducer file and the source the native compilation sees when we
> call gcc_jit_context_compile_to_file?  Do we submit a similar file to
> GCC or something?

The reproducer file is a file that is meant to recreate the same
libgccjit IR and attempt a compilation with that.  In practice it calls
the same functions we call at the interface with libgccjit describing
the code we want to compile and attempt to perform a compilation.

> IOW, can any weird characters I see in the
> reproducer be relevant to the actual compilation (and thus to the
> crash)?

I'm not sure why these characters are there but I don't think they are
relevant to the crash.

Thanks!

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06 17:47                                                                     ` Eli Zaretskii
@ 2021-03-06 18:31                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-06 18:48                                                                         ` Eli Zaretskii
  0 siblings, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-06 18:31 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Sat, 06 Mar 2021 17:35:15 +0200
>> From: Eli Zaretskii <eliz@gnu.org>
>> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
>> 
>> > The reproducer file is attached.  It is large, so I compressed it.
>> > Let me know if you see there anything that could explain the problem.
>> 
>> More info: if I set comp-speed to 0 or 1, the compilation of subr-x.el
>> doesn't crash.
>
> Here's the smallest part of subr-x which causes the crash:
>
> ;;; subr-x.el --- extra Lisp functions  -*- lexical-binding:t -*-
>
> (defun internal--build-bindings (bindings)
>   "Check and build conditional value forms for BINDINGS."
>   (let ((prev-var t))
>     (mapcar (lambda (binding)
>               (let ((binding (internal--build-binding binding prev-var)))
>                 (setq prev-var (car binding))
>                 binding))
>             bindings)))
>
> Interestingly, if I remove the first line, there's no crash.  So
> lexical-binding has something to do with this.
>
> I cannot see what could trigger the crash.  The fact that 'binding' is
> used both as an argument and as the variable which is bound to the
> return value, perhaps?
>
> Let me know if you want the C reproducer for this minimal file.

Yes please.

Thanks

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06 15:18                                                                           ` Andy Moreton
@ 2021-03-06 18:37                                                                             ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 0 replies; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-06 18:37 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 46256

Andy Moreton <andrewjmoreton@gmail.com> writes:

> On Sat 06 Mar 2021, Eli Zaretskii wrote:
>
>>> From: Andy Moreton <andrewjmoreton@gmail.com>
>>> Date: Sat, 06 Mar 2021 12:15:27 +0000
>>> 
>>> > IIUC, we don't actually call dlclose() until we GC (and might not do
>>> > so even then, since GC is conservative).
>>> 
>>> In that case keeping the handles open is the real bug here, and it would
>>> be better to focus on how to ensure that resources are released corectly.
>>> 
>>> Is there a similar issue in the dynamic modules interface ?
>>
>> Which problem is that?  At least on MS-Windows, a DLL remains open for
>> as long as the program that loaded it keeps running.  How is the
>> situation discussed here different?
>
> Keeping the DLL loaded happens with build-time linking, but this
> discussion is about runtime-linking of shared libraries: dlopen, dlsym,
> dlclose (or for Windows: LoadLibary, GetProcAddress, FreeLibrary).
>
> We need to hear from Andrea to be sure of the precise details.
>
>     AndyM

Hi Andy,

Each eln file is 'dlclosed' if/when the compilation unit (CU) is garbage
collected.

The the CU is a Lisp object and every native compiled Lisp function
holds a reference to it, as a consequence the CU is GC'ed only when we
have no more native compiled Lisp function belonging to it live.

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06 18:30                                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-06 18:44                                                                     ` Eli Zaretskii
  2021-03-06 19:21                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-06 18:44 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton

> From: Andrea Corallo <akrl@sdf.org>
> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> Date: Sat, 06 Mar 2021 18:30:05 +0000
> 
> >> This file ELNFILENAME_libgccjit_repro.c can be just compiled linking
> >> against libgccjit to obtain the reproducer.
> >
> > "To obtain the reproducer" meaning that the compiled and linked
> > program should crash in the same way is Emacs does?  I thought we
> > crash while compiling the file and linking it to produce a shared
> > library, not while running it.  Right?
> 
> Yes, the compiled program when executed will replay the same compilation
> attempted by Emacs and therefore if is a libgccjit fault it should
> crash.
> 
> Does the reproducer crash when executed on your system?

Yes, it does.

> > More generally, what is the relation between the contents of the
> > reproducer file and the source the native compilation sees when we
> > call gcc_jit_context_compile_to_file?  Do we submit a similar file to
> > GCC or something?
> 
> The reproducer file is a file that is meant to recreate the same
> libgccjit IR and attempt a compilation with that.  In practice it calls
> the same functions we call at the interface with libgccjit describing
> the code we want to compile and attempt to perform a compilation.

But any issues caused by actually writing the reproducer to a disk
file, like text-mode conversions and quirks, aren't relevant, right?





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06 18:31                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-06 18:48                                                                         ` Eli Zaretskii
  2021-03-06 19:19                                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-06 18:48 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton

[-- Attachment #1: Type: text/plain, Size: 2868 bytes --]

> From: Andrea Corallo <akrl@sdf.org>
> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> Date: Sat, 06 Mar 2021 18:31:05 +0000
> 
> > ;;; subr-x.el --- extra Lisp functions  -*- lexical-binding:t -*-
> >
> > (defun internal--build-bindings (bindings)
> >   "Check and build conditional value forms for BINDINGS."
> >   (let ((prev-var t))
> >     (mapcar (lambda (binding)
> >               (let ((binding (internal--build-binding binding prev-var)))
> >                 (setq prev-var (car binding))
> >                 binding))
> >             bindings)))
> >
> > Interestingly, if I remove the first line, there's no crash.  So
> > lexical-binding has something to do with this.
> >
> > I cannot see what could trigger the crash.  The fact that 'binding' is
> > used both as an argument and as the variable which is bound to the
> > return value, perhaps?
> >
> > Let me know if you want the C reproducer for this minimal file.
> 
> Yes please.

Attached below.

When compiled with -O2 and linked against libgccjit, it crashes with
the following backtrace:

  Program received signal SIGSEGV, Segmentation fault.
  0x70f5ac3e in libgccjit-0!_Z17gimple_build_callP9tree_nodejz ()
     from D:\usr\bin\libgccjit-0.dll
  (gdb) bt
  #0  0x70f5ac3e in libgccjit-0!_Z17gimple_build_callP9tree_nodejz ()
     from D:\usr\bin\libgccjit-0.dll
  #1  0x7190fa7b in libgccjit-0!_ZN19evrp_range_analyzer5leaveEP15basic_block_def () from D:\usr\bin\libgccjit-0.dll
  #2  0x71910eef in libgccjit-0!_Z36stmt_uses_0_or_null_in_undefined_wayP6gimple () from D:\usr\bin\libgccjit-0.dll
  #3  0x710fba2c in libgccjit-0!_Z16execute_one_passP8opt_pass ()
     from D:\usr\bin\libgccjit-0.dll
  #4  0x710fc171 in libgccjit-0!_Z16execute_one_passP8opt_pass ()
     from D:\usr\bin\libgccjit-0.dll
  #5  0x710fc181 in libgccjit-0!_Z16execute_one_passP8opt_pass ()
     from D:\usr\bin\libgccjit-0.dll
  #6  0x710fc1ad in libgccjit-0!_Z17execute_pass_listP8functionP8opt_pass ()
     from D:\usr\bin\libgccjit-0.dll
  #7  0x70e3770d in libgccjit-0!_ZN11cgraph_node6expandEv ()
     from D:\usr\bin\libgccjit-0.dll
  #8  0x70e386c9 in libgccjit-0!_ZN12symbol_table15output_weakrefsEv ()
     from D:\usr\bin\libgccjit-0.dll
  #9  0x70e3a6f1 in libgccjit-0!_ZN12symbol_table25finalize_compilation_unitEv
      () from D:\usr\bin\libgccjit-0.dll
  #10 0x711bc551 in libgccjit-0!_ZN5timer3popE12timevar_id_t ()
     from D:\usr\bin\libgccjit-0.dll
  #11 0x71b29e4c in libgccjit-0!_ZN6toplev4mainEiPPc ()
     from D:\usr\bin\libgccjit-0.dll
  #12 0x70da78ca in libgccjit-0!_ZN3gcc3jit8playback7context7compileEv ()
     from D:\usr\bin\libgccjit-0.dll
  #13 0x70d9b836 in libgccjit-0!_ZN3gcc3jit9recording7context7compileEv ()
     from D:\usr\bin\libgccjit-0.dll
  #14 0x70d8e193 in libgccjit-0!gcc_jit_context_compile ()
     from D:\usr\bin\libgccjit-0.dll
  #15 0x00445f16 in main ()


[-- Attachment #2: subr-x-4ecfe746-fd9c72a9_libgccjit_repro.c.xz --]
[-- Type: application/octet-stream, Size: 103936 bytes --]

^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06 18:48                                                                         ` Eli Zaretskii
@ 2021-03-06 19:19                                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-06 19:40                                                                             ` Pip Cet
  2021-03-06 20:08                                                                             ` Eli Zaretskii
  0 siblings, 2 replies; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-06 19:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
>> Date: Sat, 06 Mar 2021 18:31:05 +0000
>> 
>> > ;;; subr-x.el --- extra Lisp functions  -*- lexical-binding:t -*-
>> >
>> > (defun internal--build-bindings (bindings)
>> >   "Check and build conditional value forms for BINDINGS."
>> >   (let ((prev-var t))
>> >     (mapcar (lambda (binding)
>> >               (let ((binding (internal--build-binding binding prev-var)))
>> >                 (setq prev-var (car binding))
>> >                 binding))
>> >             bindings)))
>> >
>> > Interestingly, if I remove the first line, there's no crash.  So
>> > lexical-binding has something to do with this.
>> >
>> > I cannot see what could trigger the crash.  The fact that 'binding' is
>> > used both as an argument and as the variable which is bound to the
>> > return value, perhaps?
>> >
>> > Let me know if you want the C reproducer for this minimal file.
>> 
>> Yes please.
>
> Attached below.
>
> When compiled with -O2 and linked against libgccjit, it crashes with
> the following backtrace:

Okay I believe this is clearly a libgccjit bug.  The attached reproducer
is not crashing on my 32bit setup based on a recent GCC trunk.

Could you remind me exactly which libgccjit version are you using?

Thanks

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06 18:44                                                                     ` Eli Zaretskii
@ 2021-03-06 19:21                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-06 20:10                                                                         ` Eli Zaretskii
  0 siblings, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-06 19:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
>> Date: Sat, 06 Mar 2021 18:30:05 +0000
>> 
>> >> This file ELNFILENAME_libgccjit_repro.c can be just compiled linking
>> >> against libgccjit to obtain the reproducer.
>> >
>> > "To obtain the reproducer" meaning that the compiled and linked
>> > program should crash in the same way is Emacs does?  I thought we
>> > crash while compiling the file and linking it to produce a shared
>> > library, not while running it.  Right?
>> 
>> Yes, the compiled program when executed will replay the same compilation
>> attempted by Emacs and therefore if is a libgccjit fault it should
>> crash.
>> 
>> Does the reproducer crash when executed on your system?
>
> Yes, it does.
>
>> > More generally, what is the relation between the contents of the
>> > reproducer file and the source the native compilation sees when we
>> > call gcc_jit_context_compile_to_file?  Do we submit a similar file to
>> > GCC or something?
>> 
>> The reproducer file is a file that is meant to recreate the same
>> libgccjit IR and attempt a compilation with that.  In practice it calls
>> the same functions we call at the interface with libgccjit describing
>> the code we want to compile and attempt to perform a compilation.
>
> But any issues caused by actually writing the reproducer to a disk
> file, like text-mode conversions and quirks, aren't relevant, right?

Yes that's correct, but unless the C compiler can't parse correctly this
file it should work.  Do you think this is the case?

Thanks

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06 15:46                                                     ` Andy Moreton
@ 2021-03-06 19:31                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 0 replies; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-06 19:31 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 46256

Andy Moreton <andrewjmoreton@gmail.com> writes:

> On Sat 06 Mar 2021, Eli Zaretskii wrote:
>
>>> From: Andy Moreton <andrewjmoreton@gmail.com>
>>> Date: Sat, 06 Mar 2021 12:09:30 +0000
>>> 
>>> >> Thus the ABI hash should also include `system-configuration-features'.
>>> >
>>> > Which of the features you think might affect the ABI?  I reviewed them
>>> > a couple of days ago and didn't find any I could convince myself
>>> > should affect that, but maybe I missed something.
>>> 
>>> As a reasonably recent example, adding bignum support by default added
>>> "GMP" to `system-configuration-features' when rebuilding after pulling
>>> upstream changes, without the user changing the configure command line
>>> saved in `system-configuration-options'.
>>
>> AFAIU, such changes only affect *.eln files if we add new primitives
>> to Emacs or change existing primitives, and those changes are already
>> captured by the ABI hash.  Right?
>
> Perhaps you are right, but we need some input from Andrea on whether
> these changes affect the ABI.

Eli is correct.  I think we can define the ABI as:

1- the list of all primitives and their signatures.

2- the eln load mechanism we implement.

3- low level details of how Lisp objects are represented in case these
   are directly manipulated by opencoded code (ATM integer and conses).

1 is accounted automatically in the `comp-abi-hash' computation.  For 2
and 3 we are responsible to manually bump a new `comp-abi-hash'
(leveraging ABI_VERSION).  So yeah I think we should be fine ATM.

Thanks

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06 19:19                                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-06 19:40                                                                             ` Pip Cet
  2021-03-06 19:48                                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-06 20:08                                                                             ` Eli Zaretskii
  1 sibling, 1 reply; 179+ messages in thread
From: Pip Cet @ 2021-03-06 19:40 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: andrewjmoreton, 46256

On Sat, Mar 6, 2021 at 7:20 PM Andrea Corallo via Bug reports for GNU
Emacs, the Swiss army knife of text editors <bug-gnu-emacs@gnu.org>
wrote:
> Okay I believe this is clearly a libgccjit bug.  The attached reproducer
> is not crashing on my 32bit setup based on a recent GCC trunk.

It's crashing for me with a Feb 15 build of libgccjit from gcc trunk,
but not with a newer build from gcc trunk, so it looks like a gcc bug
that has since been fixed.

Pip





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06 19:40                                                                             ` Pip Cet
@ 2021-03-06 19:48                                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-06 20:24                                                                                 ` Eli Zaretskii
  0 siblings, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-06 19:48 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, andrewjmoreton, 46256

Pip Cet <pipcet@gmail.com> writes:

> On Sat, Mar 6, 2021 at 7:20 PM Andrea Corallo via Bug reports for GNU
> Emacs, the Swiss army knife of text editors <bug-gnu-emacs@gnu.org>
> wrote:
>> Okay I believe this is clearly a libgccjit bug.  The attached reproducer
>> is not crashing on my 32bit setup based on a recent GCC trunk.
>
> It's crashing for me with a Feb 15 build of libgccjit from gcc trunk,
> but not with a newer build from gcc trunk, so it looks like a gcc bug
> that has since been fixed.

I think what you see should be PR99126 [1] that was fixed recently on
trunk.

IIRC (might be wrong) Eli is on a GCC 9, where at the time I could not
reproduce this bug.  If we discover PR99126 shows-up in versions other
than 10 we might want to relax the version check around the workaround
we have in place.

  Andrea

[1] <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99126>





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06 19:19                                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-06 19:40                                                                             ` Pip Cet
@ 2021-03-06 20:08                                                                             ` Eli Zaretskii
  2021-03-06 20:19                                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-06 20:08 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton

> From: Andrea Corallo <akrl@sdf.org>
> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> Date: Sat, 06 Mar 2021 19:19:16 +0000
> 
> > When compiled with -O2 and linked against libgccjit, it crashes with
> > the following backtrace:
> 
> Okay I believe this is clearly a libgccjit bug.  The attached reproducer
> is not crashing on my 32bit setup based on a recent GCC trunk.
> 
> Could you remind me exactly which libgccjit version are you using?

It's from GCC 9.2.0 (yours is probably GCC 10 or even more recent?).
The GCC 9.2.0 tree, and in particular libgccjit itself, was patched to
build successfully on Windows, but looking at the patches I cannot see
anything that could explain crashes in just one or a small number of
Lisp files.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06 19:21                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-06 20:10                                                                         ` Eli Zaretskii
  2021-03-06 20:26                                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-06 20:10 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton

> From: Andrea Corallo <akrl@sdf.org>
> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> Date: Sat, 06 Mar 2021 19:21:26 +0000
> 
> >> The reproducer file is a file that is meant to recreate the same
> >> libgccjit IR and attempt a compilation with that.  In practice it calls
> >> the same functions we call at the interface with libgccjit describing
> >> the code we want to compile and attempt to perform a compilation.
> >
> > But any issues caused by actually writing the reproducer to a disk
> > file, like text-mode conversions and quirks, aren't relevant, right?
> 
> Yes that's correct, but unless the C compiler can't parse correctly this
> file it should work.  Do you think this is the case?

No, I don't.  But I could try removing those funny characters, if you
think it would help.  Can I remove any characters from those strings
without introducing unrelated problems?





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06 20:08                                                                             ` Eli Zaretskii
@ 2021-03-06 20:19                                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 0 replies; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-06 20:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
>> Date: Sat, 06 Mar 2021 19:19:16 +0000
>> 
>> > When compiled with -O2 and linked against libgccjit, it crashes with
>> > the following backtrace:
>> 
>> Okay I believe this is clearly a libgccjit bug.  The attached reproducer
>> is not crashing on my 32bit setup based on a recent GCC trunk.
>> 
>> Could you remind me exactly which libgccjit version are you using?
>
> It's from GCC 9.2.0 (yours is probably GCC 10 or even more recent?).
> The GCC 9.2.0 tree, and in particular libgccjit itself, was patched to
> build successfully on Windows, but looking at the patches I cannot see
> anything that could explain crashes in just one or a small number of
> Lisp files.

I can build 9.2.0 and try but will take a while on my 32bit env, will
report (mine on this setup was a very recent trunk).

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06 19:48                                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-06 20:24                                                                                 ` Eli Zaretskii
  2021-03-06 20:31                                                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-06 20:24 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton, pipcet

> From: Andrea Corallo <akrl@sdf.org>
> Cc: Eli Zaretskii <eliz@gnu.org>, 46256@debbugs.gnu.org,
>         andrewjmoreton@gmail.com
> Date: Sat, 06 Mar 2021 19:48:53 +0000
> 
> > It's crashing for me with a Feb 15 build of libgccjit from gcc trunk,
> > but not with a newer build from gcc trunk, so it looks like a gcc bug
> > that has since been fixed.
> 
> I think what you see should be PR99126 [1] that was fixed recently on
> trunk.
> 
> IIRC (might be wrong) Eli is on a GCC 9, where at the time I could not
> reproduce this bug.

Yes, I'm using GCC 9.2.0.

> If we discover PR99126 shows-up in versions other than 10 we might
> want to relax the version check around the workaround we have in
> place.

It looks like that: I patched comp.c to take the workaround regardless
of the libgccjit version, and the result of running under GDB is:

  libgccjit.so: note: disable pass tree-isolate-paths for functions in the range of [0, 4294967295]
  [Inferior 1 (process 5640) exited normally]

It also completed much faster than the buggy version, which probably
means the buggy code has some kind of infinite recursion or other
issue that causes the run to be much more expensive.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06 20:10                                                                         ` Eli Zaretskii
@ 2021-03-06 20:26                                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 0 replies; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-06 20:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
>> Date: Sat, 06 Mar 2021 19:21:26 +0000
>> 
>> >> The reproducer file is a file that is meant to recreate the same
>> >> libgccjit IR and attempt a compilation with that.  In practice it calls
>> >> the same functions we call at the interface with libgccjit describing
>> >> the code we want to compile and attempt to perform a compilation.
>> >
>> > But any issues caused by actually writing the reproducer to a disk
>> > file, like text-mode conversions and quirks, aren't relevant, right?
>> 
>> Yes that's correct, but unless the C compiler can't parse correctly this
>> file it should work.  Do you think this is the case?
>
> No, I don't.  But I could try removing those funny characters, if you
> think it would help.  Can I remove any characters from those strings
> without introducing unrelated problems?

I think it should not introduce problems but I expect it not to help,
sorry I don't have a certain answers for these, you might want try if
you like and it's quick.

Thanks

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06 20:24                                                                                 ` Eli Zaretskii
@ 2021-03-06 20:31                                                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-06 20:53                                                                                     ` Eli Zaretskii
  0 siblings, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-06 20:31 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton, pipcet

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: Eli Zaretskii <eliz@gnu.org>, 46256@debbugs.gnu.org,
>>         andrewjmoreton@gmail.com
>> Date: Sat, 06 Mar 2021 19:48:53 +0000
>> 
>> > It's crashing for me with a Feb 15 build of libgccjit from gcc trunk,
>> > but not with a newer build from gcc trunk, so it looks like a gcc bug
>> > that has since been fixed.
>> 
>> I think what you see should be PR99126 [1] that was fixed recently on
>> trunk.
>> 
>> IIRC (might be wrong) Eli is on a GCC 9, where at the time I could not
>> reproduce this bug.
>
> Yes, I'm using GCC 9.2.0.
>
>> If we discover PR99126 shows-up in versions other than 10 we might
>> want to relax the version check around the workaround we have in
>> place.
>
> It looks like that: I patched comp.c to take the workaround regardless
> of the libgccjit version, and the result of running under GDB is:
>
>   libgccjit.so: note: disable pass tree-isolate-paths for functions in the range of [0, 4294967295]
>   [Inferior 1 (process 5640) exited normally]
>
> It also completed much faster than the buggy version, which probably
> means the buggy code has some kind of infinite recursion or other
> issue that causes the run to be much more expensive.

Nice very interesting.  I know the bug is there in 9 too (the builtin
trap is not initialized) but don't know why I could not exercise it on
my setup with the first reproducer I found.

I guess we'll have to extend the workaround to all pre 11 GCC when
configuring Emacs with wide int...

Thanks

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06 20:31                                                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-06 20:53                                                                                     ` Eli Zaretskii
  2021-03-06 21:02                                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-07 18:56                                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 2 replies; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-06 20:53 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton, pipcet

> From: Andrea Corallo <akrl@sdf.org>
> Cc: pipcet@gmail.com, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> Date: Sat, 06 Mar 2021 20:31:02 +0000
> 
> I guess we'll have to extend the workaround to all pre 11 GCC when
> configuring Emacs with wide int...

I guess so.

Btw, do we have a way to force non-default compilation conditions for
a particular .el file, via file-local variables?  I'm thinking about
setting comp-speed and comp-native-driver-options.  That would help
users who cannot change the C code and/or don't want to customize
these variables globally for all compilations.

Btw2, why are the *.eln files so big? do they include debug info, and
if so, how to request a compilation that doesn't emit debug info?





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06 20:53                                                                                     ` Eli Zaretskii
@ 2021-03-06 21:02                                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-07  5:55                                                                                         ` Eli Zaretskii
  2021-03-07 18:56                                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-06 21:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton, pipcet

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: pipcet@gmail.com, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
>> Date: Sat, 06 Mar 2021 20:31:02 +0000
>> 
>> I guess we'll have to extend the workaround to all pre 11 GCC when
>> configuring Emacs with wide int...
>
> I guess so.
>
> Btw, do we have a way to force non-default compilation conditions for
> a particular .el file, via file-local variables?  I'm thinking about
> setting comp-speed and comp-native-driver-options.  That would help
> users who cannot change the C code and/or don't want to customize
> these variables globally for all compilations.

Not ATM.  I guess should be easy to implement if we have a draft of
interface we like to expose.  Perhaps we should have a feature bug for
this?

> Btw2, why are the *.eln files so big? do they include debug info, and
> if so, how to request a compilation that doesn't emit debug info?

If they were compiled with comp-debug 0 they should have no debug symbol
(should be easy to verify with objdump tho).

IME even if it can vary they are often like ~2-3x the size of a .elc,
but thinking about on a different architecture and with wide this might
change measurably.

Thanks

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06 21:02                                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-07  5:55                                                                                         ` Eli Zaretskii
  2021-03-07  6:57                                                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-07  5:55 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton, pipcet

> From: Andrea Corallo <akrl@sdf.org>
> Cc: pipcet@gmail.com, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> Date: Sat, 06 Mar 2021 21:02:17 +0000
> 
> > Btw, do we have a way to force non-default compilation conditions for
> > a particular .el file, via file-local variables?  I'm thinking about
> > setting comp-speed and comp-native-driver-options.  That would help
> > users who cannot change the C code and/or don't want to customize
> > these variables globally for all compilations.
> 
> Not ATM.  I guess should be easy to implement if we have a draft of
> interface we like to expose.  Perhaps we should have a feature bug for
> this?

Done.

Another idea I had related to this: since there seem to be stability
issues with even the recent versions of libgccjit, we should perhaps
automatically add a .el file whose native-compilation failed to the
list in comp-deferred-compilation-deny-list, so that the same Emacs
session won't try native-compilation of the same .el file again.
WDYT?

(Btw, the "deferred" part in the name of the variable sounds redundant
to me, since we always compile asynchronously, except during
bootstrap, which has a separate variable anyway.)

> > Btw2, why are the *.eln files so big? do they include debug info, and
> > if so, how to request a compilation that doesn't emit debug info?
> 
> If they were compiled with comp-debug 0 they should have no debug symbol
> (should be easy to verify with objdump tho).

I didn't change comp-debug from its default, so it should be 0.

> IME even if it can vary they are often like ~2-3x the size of a .elc,
> but thinking about on a different architecture and with wide this might
> change measurably.

A data point: subr-x.elc is 16247 bytes, whereas the corresponding
.eln file (for 32-bit wide-int architecture) is 90631 bytes, a 5.5
factor.

If I look at the file with 'size', I get the following numbers:

   text    data     bss     dec     hex filename
  63951     788   24784   89523   15db3 subr-x-02dfef32-17faeb1d.eln





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-07  5:55                                                                                         ` Eli Zaretskii
@ 2021-03-07  6:57                                                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-07  7:40                                                                                             ` Eli Zaretskii
  0 siblings, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-07  6:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton, pipcet

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: pipcet@gmail.com, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
>> Date: Sat, 06 Mar 2021 21:02:17 +0000
>>
>> > Btw, do we have a way to force non-default compilation conditions for
>> > a particular .el file, via file-local variables?  I'm thinking about
>> > setting comp-speed and comp-native-driver-options.  That would help
>> > users who cannot change the C code and/or don't want to customize
>> > these variables globally for all compilations.
>>
>> Not ATM.  I guess should be easy to implement if we have a draft of
>> interface we like to expose.  Perhaps we should have a feature bug for
>> this?
>
> Done.
>
> Another idea I had related to this: since there seem to be stability
> issues with even the recent versions of libgccjit, we should perhaps
> automatically add a .el file whose native-compilation failed to the
> list in comp-deferred-compilation-deny-list, so that the same Emacs
> session won't try native-compilation of the same .el file again.
> WDYT?

Sounds good, will do.

> (Btw, the "deferred" part in the name of the variable sounds redundant
> to me, since we always compile asynchronously, except during
> bootstrap, which has a separate variable anyway.)

Well we expose also `native-compile' to compile synchronously (IIRC
that's also what package.el does if asked).

>> > Btw2, why are the *.eln files so big? do they include debug info, and
>> > if so, how to request a compilation that doesn't emit debug info?
>>
>> If they were compiled with comp-debug 0 they should have no debug symbol
>> (should be easy to verify with objdump tho).
>
> I didn't change comp-debug from its default, so it should be 0.
>
>> IME even if it can vary they are often like ~2-3x the size of a .elc,
>> but thinking about on a different architecture and with wide this might
>> change measurably.
>
> A data point: subr-x.elc is 16247 bytes, whereas the corresponding
> .eln file (for 32-bit wide-int architecture) is 90631 bytes, a 5.5
> factor.
>
> If I look at the file with 'size', I get the following numbers:
>
>    text    data     bss     dec     hex filename
>   63951     788   24784   89523   15db3 subr-x-02dfef32-17faeb1d.eln

What's the size of the corresponding .elc ?

Thanks

  Andrea






^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-07  6:57                                                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-07  7:40                                                                                             ` Eli Zaretskii
  2021-03-07 19:05                                                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-07  7:40 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton, pipcet

> From: Andrea Corallo <akrl@sdf.org>
> Cc: pipcet@gmail.com, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> Date: Sun, 07 Mar 2021 06:57:24 +0000
> 
> > Another idea I had related to this: since there seem to be stability
> > issues with even the recent versions of libgccjit, we should perhaps
> > automatically add a .el file whose native-compilation failed to the
> > list in comp-deferred-compilation-deny-list, so that the same Emacs
> > session won't try native-compilation of the same .el file again.
> > WDYT?
> 
> Sounds good, will do.

Thanks.

> > (Btw, the "deferred" part in the name of the variable sounds redundant
> > to me, since we always compile asynchronously, except during
> > bootstrap, which has a separate variable anyway.)
> 
> Well we expose also `native-compile' to compile synchronously (IIRC
> that's also what package.el does if asked).

I guess I'm confused: what's the difference between native-compile and
the deferred compilation?  I though the deferred compilation just uses
native-compile.  Or is there a different command for that?

> > A data point: subr-x.elc is 16247 bytes, whereas the corresponding
> > .eln file (for 32-bit wide-int architecture) is 90631 bytes, a 5.5
> > factor.
> >
> > If I look at the file with 'size', I get the following numbers:
> >
> >    text    data     bss     dec     hex filename
> >   63951     788   24784   89523   15db3 subr-x-02dfef32-17faeb1d.eln
> 
> What's the size of the corresponding .elc ?

See above: 16247 bytes.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06 12:15                                                                       ` Andy Moreton
  2021-03-06 13:10                                                                         ` Eli Zaretskii
@ 2021-03-07  9:22                                                                         ` Pip Cet
  1 sibling, 0 replies; 179+ messages in thread
From: Pip Cet @ 2021-03-07  9:22 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 46256

On Sat, Mar 6, 2021 at 12:16 PM Andy Moreton <andrewjmoreton@gmail.com> wrote:
> On Sat 06 Mar 2021, Pip Cet wrote:
> > On Sat, Mar 6, 2021 at 1:48 AM Andy Moreton <andrewjmoreton@gmail.com> wrote:
> >> Is the problem that dlopen resolves to use an unlinked file kept alive
> >> by having open handles, rather than a new file with the filename used
> >> by the old file before it was unlinked ?
> >
> > I believe so, and that's what I think we can work around.
> >
> > IIUC, we don't actually call dlclose() until we GC (and might not do
> > so even then, since GC is conservative).
>
> In that case keeping the handles open is the real bug here, and it would
> be better to focus on how to ensure that resources are released corectly.

I'm not sure I follow that argument. If I load subr.eln, hack on
subr.el, recompile subr.eln, and want to reload it, we can't dlclose()
the old subr.eln until long after we've dlopen()ed the new one. I
guess we could load subr.elc, then dlclose(), then dlopen() subr.eln?
Are you saying that's something we should do?

Pip





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-03 18:43                                   ` Eli Zaretskii
  2021-03-03 19:46                                     ` Eli Zaretskii
@ 2021-03-07 17:59                                     ` Eli Zaretskii
  2021-03-07 18:53                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-07 17:59 UTC (permalink / raw)
  To: akrl; +Cc: 46256, andrewjmoreton

> Date: Wed, 03 Mar 2021 20:43:01 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> 
>        Warning (comp): comp.h:70: Emacs fatal error: assertion failed: NATIVE_COMP_UNITP (a)^M
> 
> I also see similar picture in emacs_backtrace.txt:
> 
>   emacs_abort at src/w32fns.c:10947
>   terminate_due_to_signal at src/emacs.c:417
>   die at src/alloc.c:7452
>   XNATIVE_COMP_UNIT at src/comp.h:70
>   load_comp_unit at src/comp.c:4766
>   syms_of_comp at src/comp.c:5077
>   Fload at src/lread.c:1548
> 
> (My Emacs is compiled with --enable-checking=yes.)

I still keep seeing this from time to time, even though I have a local
patch to disable the tree-isolate-paths pass.  Suggestions for
debugging this are welcome.

(I must say that the way the async compilations are run makes it hard
to track down fatal errors, because I don't even have an easy way of
knowing which .el file was being compiled when the crash happened.
Any enhancements of the logging and the diagnostic messages to help in
these matters will be very welcome.  E.g., how about introducing an
intermediate log level that would just show the currently compiled .el
file's name?)





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-07 17:59                                     ` Eli Zaretskii
@ 2021-03-07 18:53                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-07 19:15                                         ` Eli Zaretskii
  0 siblings, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-07 18:53 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Wed, 03 Mar 2021 20:43:01 +0200
>> From: Eli Zaretskii <eliz@gnu.org>
>> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
>> 
>>        Warning (comp): comp.h:70: Emacs fatal error: assertion failed: NATIVE_COMP_UNITP (a)^M
>> 
>> I also see similar picture in emacs_backtrace.txt:
>> 
>>   emacs_abort at src/w32fns.c:10947
>>   terminate_due_to_signal at src/emacs.c:417
>>   die at src/alloc.c:7452
>>   XNATIVE_COMP_UNIT at src/comp.h:70
>>   load_comp_unit at src/comp.c:4766
>>   syms_of_comp at src/comp.c:5077
>>   Fload at src/lread.c:1548
>> 
>> (My Emacs is compiled with --enable-checking=yes.)

[ Is a while I don not run with --enable-checking=yes, next compilation
  configure it]

> I still keep seeing this from time to time, even though I have a local
> patch to disable the tree-isolate-paths pass.  Suggestions for
> debugging this are welcome.
>
> (I must say that the way the async compilations are run makes it hard
> to track down fatal errors, because I don't even have an easy way of
> knowing which .el file was being compiled when the crash happened.
> Any enhancements of the logging and the diagnostic messages to help in
> these matters will be very welcome.  E.g., how about introducing an
> intermediate log level that would just show the currently compiled .el
> file's name?)

Setting `comp-async-jobs-number' to 1 and looking into the
*Async-native-compile-log* what we are looking for in this case?

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-06 20:53                                                                                     ` Eli Zaretskii
  2021-03-06 21:02                                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-07 18:56                                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-07 19:08                                                                                         ` Eli Zaretskii
  1 sibling, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-07 18:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton, pipcet

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: pipcet@gmail.com, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
>> Date: Sat, 06 Mar 2021 20:31:02 +0000
>> 
>> I guess we'll have to extend the workaround to all pre 11 GCC when
>> configuring Emacs with wide int...
>
> I guess so.

38b4ac3e6b should do this.

Thanks

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-07  7:40                                                                                             ` Eli Zaretskii
@ 2021-03-07 19:05                                                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 0 replies; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-07 19:05 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton, pipcet

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: pipcet@gmail.com, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
>> Date: Sun, 07 Mar 2021 06:57:24 +0000
>> 
>> > Another idea I had related to this: since there seem to be stability
>> > issues with even the recent versions of libgccjit, we should perhaps
>> > automatically add a .el file whose native-compilation failed to the
>> > list in comp-deferred-compilation-deny-list, so that the same Emacs
>> > session won't try native-compilation of the same .el file again.
>> > WDYT?
>> 
>> Sounds good, will do.
>
> Thanks.
>
>> > (Btw, the "deferred" part in the name of the variable sounds redundant
>> > to me, since we always compile asynchronously, except during
>> > bootstrap, which has a separate variable anyway.)
>> 
>> Well we expose also `native-compile' to compile synchronously (IIRC
>> that's also what package.el does if asked).
>
> I guess I'm confused: what's the difference between native-compile and
> the deferred compilation?  I though the deferred compilation just uses
> native-compile.  Or is there a different command for that?

We have two API entries for native compilation: `native-compile'
(synchronous) and `native-compile-async' (indeed asynchronous).

Deferred compilation is the mechanism to trigger automatically an async
native compilation (through `native-compile-async') for bytecodes being
loaded when the native code is not present (+ have the hotswap performed
when finished native compiling).

>> > A data point: subr-x.elc is 16247 bytes, whereas the corresponding
>> > .eln file (for 32-bit wide-int architecture) is 90631 bytes, a 5.5
>> > factor.
>> >
>> > If I look at the file with 'size', I get the following numbers:
>> >
>> >    text    data     bss     dec     hex filename
>> >   63951     788   24784   89523   15db3 subr-x-02dfef32-17faeb1d.eln
>> 
>> What's the size of the corresponding .elc ?
>
> See above: 16247 bytes.

Sorry for the sloppy answer this morning I was rushing :/ Looking at it
my subr-x.eln is pretty much big the same (the factor depends on the
single file).  I believe in general bytecode is a more compact
representation than native code.

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-07 18:56                                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-07 19:08                                                                                         ` Eli Zaretskii
  0 siblings, 0 replies; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-07 19:08 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton, pipcet

> From: Andrea Corallo <akrl@sdf.org>
> Cc: pipcet@gmail.com, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> Date: Sun, 07 Mar 2021 18:56:22 +0000
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> From: Andrea Corallo <akrl@sdf.org>
> >> Cc: pipcet@gmail.com, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> >> Date: Sat, 06 Mar 2021 20:31:02 +0000
> >> 
> >> I guess we'll have to extend the workaround to all pre 11 GCC when
> >> configuring Emacs with wide int...
> >
> > I guess so.
> 
> 38b4ac3e6b should do this.

Thanks.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-07 18:53                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-07 19:15                                         ` Eli Zaretskii
  2021-03-07 20:16                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-07 19:15 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton

> From: Andrea Corallo <akrl@sdf.org>
> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> Date: Sun, 07 Mar 2021 18:53:50 +0000
> 
> > (I must say that the way the async compilations are run makes it hard
> > to track down fatal errors, because I don't even have an easy way of
> > knowing which .el file was being compiled when the crash happened.
> > Any enhancements of the logging and the diagnostic messages to help in
> > these matters will be very welcome.  E.g., how about introducing an
> > intermediate log level that would just show the currently compiled .el
> > file's name?)
> 
> Setting `comp-async-jobs-number' to 1 and looking into the
> *Async-native-compile-log* what we are looking for in this case?

Will try that next time.  But meanwhile, I got the same problem while
rebuilding after your comp.c change.  This time I clearly saw that
cc-mode.el is being compiled:

  Dumping under the name emacs.pdmp
  Dumping fingerprint: 3869e2b5d74557015002c58022bce8f2e19ba06f1636f4182d7837703d6
  22982
  Dump complete
  Byte counts: header=100 hot=7960440 discardable=158256 cold=4873352
  Reloc counts: hot=501593 discardable=5154
  Adding name emacs-28.0.50.22.exe
  Adding name emacs-28.0.50.22.pdmp
  cp -f emacs.pdmp bootstrap-emacs.pdmp
  make[1]: Leaving directory `/d/gnu/git/emacs/native-comp/src'
  make -C lisp all
  make[1]: Entering directory `/d/gnu/git/emacs/native-comp/lisp'
  make -C ../leim all EMACS="../src/emacs.exe"
  make -C ../admin/grammars all EMACS="../../src/emacs.exe"
  make[2]: Entering directory `/d/gnu/git/emacs/native-comp/admin/grammars'
  make[2]: Nothing to be done for `all'.
  make[2]: Leaving directory `/d/gnu/git/emacs/native-comp/admin/grammars'
  make[2]: Entering directory `/d/gnu/git/emacs/native-comp/leim'
  make[2]: Nothing to be done for `all'.
  make[2]: Leaving directory `/d/gnu/git/emacs/native-comp/leim'
  make[2]: Entering directory `/d/gnu/git/emacs/native-comp/lisp'
  make[2]: Nothing to be done for `compile-targets'.
  make[2]: Leaving directory `/d/gnu/git/emacs/native-comp/lisp'
  make[2]: Entering directory `/d/gnu/git/emacs/native-comp/lisp'
  make[2]: Nothing to be done for `compile-targets'.
  make[2]: Leaving directory `/d/gnu/git/emacs/native-comp/lisp'
  make[2]: Entering directory `/d/gnu/git/emacs/native-comp/lisp'
    ELC      progmodes/cc-mode.elc

  comp.h:70: Emacs fatal error: assertion failed: NATIVE_COMP_UNITP (a)

Attaching a debugger produces the backtrace shown below.  I will leave
this process captured in the debugger, in case you want me to look at
some variables and report their values.

  Thread 1 received signal SIGTRAP, Trace/breakpoint trap.
  [Switching to Thread 5500.0x114c]
  0x7c90120f in ntdll!DbgBreakPoint () from C:\WINDOWS\system32\ntdll.dll
  (gdb) bt
  #0  0x7c90120f in ntdll!DbgBreakPoint () from C:\WINDOWS\system32\ntdll.dll
  #1  0x0135a63b in emacs_abort () at w32fns.c:10914
  #2  0x0115c637 in terminate_due_to_signal (sig=22, backtrace_limit=2147483647)
      at emacs.c:417
  #3  0x0121c026 in die (msg=0x1782af2 <targets+1266> "NATIVE_COMP_UNITP (a)",
      file=0x1782aeb <targets+1259> "comp.h", line=70) at alloc.c:7452
  #4  0x012cf582 in XNATIVE_COMP_UNIT (a=XIL(0x6f04860091b9000)) at comp.h:70
  #5  0x012df324 in load_comp_unit (comp_u=0x6f33918, loading_dump=false,
      late_load=false) at comp.c:4821
  #6  0x012e0c55 in Fnative_elisp_load (filename=XIL(0x80000000092db190),
      late_load=XIL(0)) at comp.c:5122
  #7  0x012ab823 in Fload (file=XIL(0x800000000929e140), noerror=XIL(0),
      nomessage=XIL(0x30), nosuffix=XIL(0), must_suffix=XIL(0)) at lread.c:1548
  #8  0x0125dea0 in eval_sub (form=XIL(0xc000000006f72cb0)) at eval.c:2498
  #9  0x01255e60 in Fprogn (body=XIL(0xc000000006f72d30)) at eval.c:471
  #10 0x01255b71 in Fif (args=XIL(0xc000000006f733c0)) at eval.c:427
  #11 0x0125d89b in eval_sub (form=XIL(0xc000000006f73380)) at eval.c:2437
  #12 0x01255e60 in Fprogn (body=XIL(0)) at eval.c:471
  #13 0x01258b68 in Flet (args=XIL(0xc000000006f73370)) at eval.c:1057
  #14 0x0125d89b in eval_sub (form=XIL(0xc000000006f73280)) at eval.c:2437
  #15 0x01255e60 in Fprogn (body=XIL(0xc000000006f72d70)) at eval.c:471
  #16 0x0125d89b in eval_sub (form=XIL(0xc000000006f73170)) at eval.c:2437
  #17 0x01255b0d in Fif (args=XIL(0xc000000006f73160)) at eval.c:426
  #18 0x0125d89b in eval_sub (form=XIL(0xc000000006f730c0)) at eval.c:2437
  #19 0x01255e60 in Fprogn (body=XIL(0)) at eval.c:471
  #20 0x0126184b in funcall_lambda (fun=XIL(0xc000000006f72e20), nargs=1,
      arg_vector=0x82bd40) at eval.c:3286
  #21 0x01260eb0 in apply_lambda (fun=XIL(0xc000000006f72e20),
      args=XIL(0xc000000006dce870), count=74) at eval.c:3158
  #22 0x0125e5d7 in eval_sub (form=XIL(0xc000000006dce860)) at eval.c:2561
  #23 0x0125d117 in Feval (form=XIL(0xc000000006dce860), lexical=XIL(0))
      at eval.c:2313
  #24 0x7058ea18 in F627974652d636f6d70696c652d6576616c_byte_compile_eval_0 ()
     from d:\usr\eli\.emacs.d\eln-cache\28.0.50-19fa14f1\bytecomp-12882072-bfe84587.eln
  #25 0x01260731 in funcall_subr (subr=0x6781548, numargs=1, args=0x82c1c0)
      at eval.c:3084
  #26 0x012601a5 in Ffuncall (nargs=2, args=0x82c1b8) at eval.c:3009
  #27 0x012cbaa4 in exec_byte_code (bytestr=XIL(0x8000000006978310),
      vector=XIL(0xa000000006f338b0), maxdepth=make_fixnum(6),
      args_template=make_fixnum(257), nargs=1, args=0x82c788) at bytecode.c:632
  #28 0x01260cea in fetch_and_exec_byte_code (fun=XIL(0xa000000006f338e8),
      syms_left=make_fixnum(257), nargs=1, args=0x82c780) at eval.c:3133
  #29 0x01261267 in funcall_lambda (fun=XIL(0xa000000006f338e8), nargs=1,
      arg_vector=0x82c780) at eval.c:3214
  #30 0x01260215 in Ffuncall (nargs=2, args=0x82c778) at eval.c:3013
  #31 0x7058a138 in F627974652d636f6d70696c652d726563757273652d746f706c6576656c_byte_compile_recurse_toplevel_0 ()
     from d:\usr\eli\.emacs.d\eln-cache\28.0.50-19fa14f1\bytecomp-12882072-bfe84587.eln
  #32 0x01260760 in funcall_subr (subr=0x676d838, numargs=2, args=0x82c980)
      at eval.c:3086
  #33 0x012601a5 in Ffuncall (nargs=3, args=0x82c978) at eval.c:3009
  #34 0x012cbaa4 in exec_byte_code (bytestr=XIL(0x8000000006978320),
      vector=XIL(0xa000000006bad3f8), maxdepth=make_fixnum(7),
      args_template=make_fixnum(128), nargs=1, args=0x82cfb8) at bytecode.c:632
  #35 0x01260cea in fetch_and_exec_byte_code (fun=XIL(0xa000000006bad430),
      syms_left=make_fixnum(128), nargs=1, args=0x82cfb8) at eval.c:3133
  #36 0x01261267 in funcall_lambda (fun=XIL(0xa000000006bad430), nargs=1,
      arg_vector=0x82cfb8) at eval.c:3214
  #37 0x01260215 in Ffuncall (nargs=2, args=0x82cfb0) at eval.c:3013
  #38 0x0125e79e in Fapply (nargs=2, args=0x82cfb0) at eval.c:2596
  #39 0x0125f441 in apply1 (fn=XIL(0xa000000006bad430),
      arg=XIL(0xc000000006dc7930)) at eval.c:2855
  #40 0x0125904c in Fmacroexpand (form=XIL(0xc000000006dc7920),
      environment=XIL(0xc000000006f6de10)) at eval.c:1142
  #41 0x01260760 in funcall_subr (subr=0x172f780 <Smacroexpand>, numargs=2,
      args=0x82d270) at eval.c:3086
  #42 0x012601a5 in Ffuncall (nargs=3, args=0x82d268) at eval.c:3009
  #43 0x06a53c20 in F6d6163726f6578702d6d6163726f657870616e64_macroexp_macroexpand_0 ()
     from d:\gnu\git\emacs\native-comp\native-lisp\28.0.50-19fa14f1\macroexp-2c3e1495-db4ee70f.eln
  #44 0x01260760 in funcall_subr (subr=0x5f12cb4, numargs=2, args=0x82d4b0)
      at eval.c:3086
  #45 0x012601a5 in Ffuncall (nargs=3, args=0x82d4a8) at eval.c:3009
  #46 0x7058a0c5 in F627974652d636f6d70696c652d726563757273652d746f706c6576656c_byte_compile_recurse_toplevel_0 ()
     from d:\usr\eli\.emacs.d\eln-cache\28.0.50-19fa14f1\bytecomp-12882072-bfe84587.eln
  #47 0x01260760 in funcall_subr (subr=0x676d838, numargs=2, args=0x82d698)
      at eval.c:3086
  #48 0x012601a5 in Ffuncall (nargs=3, args=0x82d690) at eval.c:3009
  #49 0x012cbaa4 in exec_byte_code (bytestr=XIL(0x8000000006970bc0),
      vector=XIL(0xa000000006f1ab58), maxdepth=make_fixnum(4),
      args_template=make_fixnum(257), nargs=1, args=0x82dc40) at bytecode.c:632
  #50 0x01260cea in fetch_and_exec_byte_code (fun=XIL(0xa000000006f33880),
      syms_left=make_fixnum(257), nargs=1, args=0x82dc38) at eval.c:3133
  #51 0x01261267 in funcall_lambda (fun=XIL(0xa000000006f33880), nargs=1,
      arg_vector=0x82dc38) at eval.c:3214
  #52 0x01260215 in Ffuncall (nargs=2, args=0x82dc30) at eval.c:3013
  #53 0x0125f4b8 in call1 (fn=XIL(0xa000000006f33880),
      arg1=XIL(0xc000000006dc7920)) at eval.c:2869
  #54 0x01274ff2 in mapcar1 (leni=2, vals=0x82dd20, fn=XIL(0xa000000006f33880),
      seq=XIL(0xc000000006dc78e0)) at fns.c:2742
  #55 0x0127566b in Fmapcar (function=XIL(0xa000000006f33880),
      sequence=XIL(0xc000000006dc78e0)) at fns.c:2798
  #56 0x7058a1a6 in F627974652d636f6d70696c652d726563757273652d746f706c6576656c_byte_compile_recurse_toplevel_0 ()
     from d:\usr\eli\.emacs.d\eln-cache\28.0.50-19fa14f1\bytecomp-12882072-bfe84587.eln
  #57 0x01260760 in funcall_subr (subr=0x676d838, numargs=2, args=0x82dfd0)
      at eval.c:3086
  #58 0x012601a5 in Ffuncall (nargs=3, args=0x82dfc8) at eval.c:3009
  #59 0x7059c5f3 in F627974652d636f6d70696c652d746f706c6576656c2d66696c652d666f726d_byte_compile_toplevel_file_form_0 ()
     from d:\usr\eli\.emacs.d\eln-cache\28.0.50-19fa14f1\bytecomp-12882072-bfe84587.eln
  #60 0x01260731 in funcall_subr (subr=0x6783f18, numargs=1, args=0x82e1e8)
      at eval.c:3084
  #61 0x012601a5 in Ffuncall (nargs=2, args=0x82e1e0) at eval.c:3009
  #62 0x7059999c in F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_43 ()
     from d:\usr\eli\.emacs.d\eln-cache\28.0.50-19fa14f1\bytecomp-12882072-bfe84587.eln
  #63 0x01260731 in funcall_subr (subr=0x686e450, numargs=1, args=0x82e418)
      at eval.c:3084
  #64 0x012601a5 in Ffuncall (nargs=2, args=0x82e410) at eval.c:3009
  #65 0x7059a9f1 in F627974652d636f6d70696c652d66726f6d2d627566666572_byte_compile_from_buffer_0 ()
     from d:\usr\eli\.emacs.d\eln-cache\28.0.50-19fa14f1\bytecomp-12882072-bfe84587.eln
  #66 0x01260731 in funcall_subr (subr=0x6783d58, numargs=1, args=0x82e6b8)
      at eval.c:3084
  #67 0x012601a5 in Ffuncall (nargs=2, args=0x82e6b0) at eval.c:3009
  #68 0x70597650 in F627974652d636f6d70696c652d66696c65_byte_compile_file_0 ()
     from d:\usr\eli\.emacs.d\eln-cache\28.0.50-19fa14f1\bytecomp-12882072-bfe84587.eln
  #69 0x01260760 in funcall_subr (subr=0x6783cd8, numargs=1, args=0x82e8d0)
      at eval.c:3086
  #70 0x012601a5 in Ffuncall (nargs=2, args=0x82e8c8) at eval.c:3009
  #71 0x705bb19f in F62617463682d627974652d636f6d70696c652d66696c65_batch_byte_compile_file_0 ()
     from d:\usr\eli\.emacs.d\eln-cache\28.0.50-19fa14f1\bytecomp-12882072-bfe84587.eln
  #72 0x01260731 in funcall_subr (subr=0x68d04b8, numargs=1, args=0x82eb08)
      at eval.c:3084
  #73 0x012601a5 in Ffuncall (nargs=2, args=0x82eb00) at eval.c:3009
  #74 0x705bac80 in F62617463682d627974652d636f6d70696c65_batch_byte_compile_0
      ()
     from d:\usr\eli\.emacs.d\eln-cache\28.0.50-19fa14f1\bytecomp-12882072-bfe84587.eln
  #75 0x01260731 in funcall_subr (subr=0x68d0478, numargs=0, args=0x82ed68)
      at eval.c:3084
  #76 0x012601a5 in Ffuncall (nargs=1, args=0x82ed60) at eval.c:3009
  #77 0x6ec8a3c8 in F62617463682d627974652d6e61746976652d636f6d70696c652d666f722d626f6f747374726170_batch_byte_native_compile_for_bootstrap_0 ()
     from d:\gnu\git\emacs\native-comp\native-lisp\28.0.50-19fa14f1\comp-7672a6ed-f22bd338.eln
  #78 0x01260715 in funcall_subr (subr=0x6dabb48, numargs=0, args=0x82eff8)
      at eval.c:3082
  #79 0x012601a5 in Ffuncall (nargs=1, args=0x82eff0) at eval.c:3009
  #80 0x6cb55791 in F636f6d6d616e642d6c696e652d31_command_line_1_0 ()
     from d:\gnu\git\emacs\native-comp\native-lisp\28.0.50-19fa14f1\startup-bbc6ea72-9be7c541.eln
  #81 0x01260731 in funcall_subr (subr=0x5db726c, numargs=1, args=0x82f3e8)
      at eval.c:3084
  #82 0x012601a5 in Ffuncall (nargs=2, args=0x82f3e0) at eval.c:3009
  #83 0x6cb4b0a9 in F636f6d6d616e642d6c696e65_command_line_0 ()
     from d:\gnu\git\emacs\native-comp\native-lisp\28.0.50-19fa14f1\startup-bbc6ea72-9be7c541.eln
  #84 0x01260715 in funcall_subr (subr=0x5dc92bc, numargs=0, args=0x82f638)
      at eval.c:3082
  #85 0x012601a5 in Ffuncall (nargs=1, args=0x82f630) at eval.c:3009
  #86 0x6cb45d8b in F6e6f726d616c2d746f702d6c6576656c_normal_top_level_0 ()
     from d:\gnu\git\emacs\native-comp\native-lisp\28.0.50-19fa14f1\startup-bbc6ea72-9be7c541.eln
  #87 0x0125dc63 in eval_sub (form=XIL(0xc000000005db0d7c)) at eval.c:2481
  #88 0x0125d117 in Feval (form=XIL(0xc000000005db0d7c), lexical=XIL(0))
      at eval.c:2313
  #89 0x01164353 in top_level_2 () at keyboard.c:1103
  #90 0x0125a135 in internal_condition_case (bfun=0x1164320 <top_level_2>,
      handlers=XIL(0x90), hfun=0x1163ad1 <cmd_error>) at eval.c:1448
  #91 0x011643cd in top_level_1 (ignore=XIL(0)) at keyboard.c:1111
  #92 0x01259218 in internal_catch (tag=XIL(0xedf0),
      func=0x1164359 <top_level_1>, arg=XIL(0)) at eval.c:1198
  #93 0x01164225 in command_loop () at keyboard.c:1072
  #94 0x01163561 in recursive_edit_1 () at keyboard.c:720
  #95 0x011637cf in Frecursive_edit () at keyboard.c:789
  #96 0x0115ee6e in main (argc=11, argv=0xa4ea88) at emacs.c:2095

  Lisp Backtrace:
  "load" (0x82b378)
  "if" (0x82b5a8)
  "let" (0x82b838)
  "progn" (0x82b9e8)
  "if" (0x82bb98)
  "cc-bytecomp-load" (0x82bd40)
  "byte-compile-eval" (0x82c1c0)
  0x6f338e8 PVEC_COMPILED
  "byte-compile-recurse-toplevel" (0x82c980)
  0x6bad430 PVEC_COMPILED
  "macroexpand" (0x82d270)
  "macroexp-macroexpand" (0x82d4b0)
  "byte-compile-recurse-toplevel" (0x82d698)
  0x6f33880 PVEC_COMPILED
  "byte-compile-recurse-toplevel" (0x82dfd0)
  "byte-compile-toplevel-file-form" (0x82e1e8)
  0x686e450 PVEC_SUBR
  "byte-compile-from-buffer" (0x82e6b8)
  "byte-compile-file" (0x82e8d0)
  "batch-byte-compile-file" (0x82eb08)
  "batch-byte-compile" (0x82ed68)
  "batch-byte-native-compile-for-bootstrap" (0x82eff8)
  "command-line-1" (0x82f3e8)
  "command-line" (0x82f638)
  "normal-top-level" (0x82f728)






^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-07 19:15                                         ` Eli Zaretskii
@ 2021-03-07 20:16                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-07 21:27                                             ` Pip Cet
  2021-03-08 15:07                                             ` Eli Zaretskii
  0 siblings, 2 replies; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-07 20:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
>> Date: Sun, 07 Mar 2021 18:53:50 +0000
>> 
>> > (I must say that the way the async compilations are run makes it hard
>> > to track down fatal errors, because I don't even have an easy way of
>> > knowing which .el file was being compiled when the crash happened.
>> > Any enhancements of the logging and the diagnostic messages to help in
>> > these matters will be very welcome.  E.g., how about introducing an
>> > intermediate log level that would just show the currently compiled .el
>> > file's name?)
>> 
>> Setting `comp-async-jobs-number' to 1 and looking into the
>> *Async-native-compile-log* what we are looking for in this case?
>
> Will try that next time.  But meanwhile, I got the same problem while
> rebuilding after your comp.c change.  This time I clearly saw that
> cc-mode.el is being compiled:
>
>   Dumping under the name emacs.pdmp
>   Dumping fingerprint: 3869e2b5d74557015002c58022bce8f2e19ba06f1636f4182d7837703d6
>   22982
>   Dump complete
>   Byte counts: header=100 hot=7960440 discardable=158256 cold=4873352
>   Reloc counts: hot=501593 discardable=5154
>   Adding name emacs-28.0.50.22.exe
>   Adding name emacs-28.0.50.22.pdmp
>   cp -f emacs.pdmp bootstrap-emacs.pdmp
>   make[1]: Leaving directory `/d/gnu/git/emacs/native-comp/src'
>   make -C lisp all
>   make[1]: Entering directory `/d/gnu/git/emacs/native-comp/lisp'
>   make -C ../leim all EMACS="../src/emacs.exe"
>   make -C ../admin/grammars all EMACS="../../src/emacs.exe"
>   make[2]: Entering directory `/d/gnu/git/emacs/native-comp/admin/grammars'
>   make[2]: Nothing to be done for `all'.
>   make[2]: Leaving directory `/d/gnu/git/emacs/native-comp/admin/grammars'
>   make[2]: Entering directory `/d/gnu/git/emacs/native-comp/leim'
>   make[2]: Nothing to be done for `all'.
>   make[2]: Leaving directory `/d/gnu/git/emacs/native-comp/leim'
>   make[2]: Entering directory `/d/gnu/git/emacs/native-comp/lisp'
>   make[2]: Nothing to be done for `compile-targets'.
>   make[2]: Leaving directory `/d/gnu/git/emacs/native-comp/lisp'
>   make[2]: Entering directory `/d/gnu/git/emacs/native-comp/lisp'
>   make[2]: Nothing to be done for `compile-targets'.
>   make[2]: Leaving directory `/d/gnu/git/emacs/native-comp/lisp'
>   make[2]: Entering directory `/d/gnu/git/emacs/native-comp/lisp'
>     ELC      progmodes/cc-mode.elc
>
>   comp.h:70: Emacs fatal error: assertion failed: NATIVE_COMP_UNITP (a)
>
> Attaching a debugger produces the backtrace shown below.  I will leave
> this process captured in the debugger, in case you want me to look at
> some variables and report their values.
>
>   Thread 1 received signal SIGTRAP, Trace/breakpoint trap.
>   [Switching to Thread 5500.0x114c]
>   0x7c90120f in ntdll!DbgBreakPoint () from C:\WINDOWS\system32\ntdll.dll
>   (gdb) bt
>   #0  0x7c90120f in ntdll!DbgBreakPoint () from C:\WINDOWS\system32\ntdll.dll
>   #1  0x0135a63b in emacs_abort () at w32fns.c:10914
>   #2  0x0115c637 in terminate_due_to_signal (sig=22, backtrace_limit=2147483647)
>       at emacs.c:417
>   #3  0x0121c026 in die (msg=0x1782af2 <targets+1266> "NATIVE_COMP_UNITP (a)",
>       file=0x1782aeb <targets+1259> "comp.h", line=70) at alloc.c:7452
>   #4  0x012cf582 in XNATIVE_COMP_UNIT (a=XIL(0x6f04860091b9000)) at comp.h:70
>   #5  0x012df324 in load_comp_unit (comp_u=0x6f33918, loading_dump=false,
>       late_load=false) at comp.c:4821
>   #6  0x012e0c55 in Fnative_elisp_load (filename=XIL(0x80000000092db190),
>       late_load=XIL(0)) at comp.c:5122

What I think is going on here:

The same .eln file is loaded two times, we detect that and try to reuse
the same compilation unit (the Lisp object) instead of a new one.

We keep a pointer to the compilation unit representing the .eln file in
each .eln.  Here we read it and we have it into 'saved_cu', we try to
dereference it and extract the CU with XNATIVE_COMP_UNIT but something
goes wrong.

This object might have been GC'ed for some reason and we might be
looking at the same GC issue I've seen on 32bit wide-int (my guess).
*If* this is the case the question is: why is the CU GC'ed?

Thanks

  Andrea







^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-07 20:16                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-07 21:27                                             ` Pip Cet
  2021-03-07 21:47                                               ` Pip Cet
  2021-03-07 21:51                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-08 15:07                                             ` Eli Zaretskii
  1 sibling, 2 replies; 179+ messages in thread
From: Pip Cet @ 2021-03-07 21:27 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: andrewjmoreton, 46256

[-- Attachment #1: Type: text/plain, Size: 1384 bytes --]

On Sun, Mar 7, 2021 at 8:17 PM Andrea Corallo via Bug reports for GNU
Emacs, the Swiss army knife of text editors <bug-gnu-emacs@gnu.org>
wrote:
> Eli Zaretskii <eliz@gnu.org> writes:
> >> From: Andrea Corallo <akrl@sdf.org>
> >> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> >> Date: Sun, 07 Mar 2021 18:53:50 +0000
> What I think is going on here:
>
> The same .eln file is loaded two times, we detect that and try to reuse
> the same compilation unit (the Lisp object) instead of a new one.
>
> We keep a pointer to the compilation unit representing the .eln file in
> each .eln.  Here we read it and we have it into 'saved_cu', we try to
> dereference it and extract the CU with XNATIVE_COMP_UNIT but something
> goes wrong.
>
> This object might have been GC'ed for some reason and we might be
> looking at the same GC issue I've seen on 32bit wide-int (my guess).
> *If* this is the case the question is: why is the CU GC'ed?

Why wouldn't it be? I'm trying to follow along here :-)

What I'm thinking is the CU got GC'ed, which is perfectly okay, but we
never set its COMP_UNIT_SYM pointer to Qnil. Then we dlopen() the same
file again, get the old handle, read the stale COMP_UNIT_SYM pointer,
and dereference it.

We should verify that the CU is indeed a different PVEC type now
(ideally, PVEC_FREE), and then do something like the attached patch,
shouldn't we?

Pip

[-- Attachment #2: 0001-Fix-a-GC-issue-bug-46256.patch --]
[-- Type: text/x-patch, Size: 964 bytes --]

From 23496d8793c475b451a0d8fc35b2781e9c4746a9 Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet@gmail.com>
Date: Sun, 7 Mar 2021 21:26:29 +0000
Subject: [PATCH] Fix a GC issue (bug#46256)

* src/alloc.c (cleanup_vector): Clear the comp unit pointer in the
dynamically-loaded object.
---
 src/alloc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/alloc.c b/src/alloc.c
index af08336177070..e039441ea826c 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -3155,9 +3155,12 @@ cleanup_vector (struct Lisp_Vector *vector)
   else if (NATIVE_COMP_FLAG
 	   && PSEUDOVECTOR_TYPEP (&vector->header, PVEC_NATIVE_COMP_UNIT))
     {
+#define COMP_UNIT_SYM "comp_unit"
       struct Lisp_Native_Comp_Unit *cu =
 	PSEUDOVEC_STRUCT (vector, Lisp_Native_Comp_Unit);
       eassert (cu->handle);
+      Lisp_Object *saved_cu = dynlib_sym (cu->handle, COMP_UNIT_SYM);
+      *saved_cu = Qnil;
       dynlib_close (cu->handle);
     }
   else if (NATIVE_COMP_FLAG
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-07 21:27                                             ` Pip Cet
@ 2021-03-07 21:47                                               ` Pip Cet
  2021-03-07 21:51                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 0 replies; 179+ messages in thread
From: Pip Cet @ 2021-03-07 21:47 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: andrewjmoreton, 46256

On Sun, Mar 7, 2021 at 9:27 PM Pip Cet <pipcet@gmail.com> wrote:
> What I'm thinking is the CU got GC'ed, which is perfectly okay, but we
> never set its COMP_UNIT_SYM pointer to Qnil. Then we dlopen() the same
> file again, get the old handle, read the stale COMP_UNIT_SYM pointer,
> and dereference it.
>
> We should verify that the CU is indeed a different PVEC type now
> (ideally, PVEC_FREE), and then do something like the attached patch,
> shouldn't we?

I can reproduce this issue by replacing the single call of dlopen() in
dynlib_open with two calls, and I have it open in a debugger if any
further information is required.

I'll prepare a proper patch next, but until then, can someone help me
out and explain why dynlib_close() returns 0 for success on Windows,
but 1 for success on POSIX systems?

Pip





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-07 21:27                                             ` Pip Cet
  2021-03-07 21:47                                               ` Pip Cet
@ 2021-03-07 21:51                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-07 22:16                                                 ` Pip Cet
  2021-03-08  3:31                                                 ` Eli Zaretskii
  1 sibling, 2 replies; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-07 21:51 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, andrewjmoreton, 46256

Pip Cet <pipcet@gmail.com> writes:

> On Sun, Mar 7, 2021 at 8:17 PM Andrea Corallo via Bug reports for GNU
> Emacs, the Swiss army knife of text editors <bug-gnu-emacs@gnu.org>
> wrote:
>> Eli Zaretskii <eliz@gnu.org> writes:
>> >> From: Andrea Corallo <akrl@sdf.org>
>> >> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
>> >> Date: Sun, 07 Mar 2021 18:53:50 +0000
>> What I think is going on here:
>>
>> The same .eln file is loaded two times, we detect that and try to reuse
>> the same compilation unit (the Lisp object) instead of a new one.
>>
>> We keep a pointer to the compilation unit representing the .eln file in
>> each .eln.  Here we read it and we have it into 'saved_cu', we try to
>> dereference it and extract the CU with XNATIVE_COMP_UNIT but something
>> goes wrong.
>>
>> This object might have been GC'ed for some reason and we might be
>> looking at the same GC issue I've seen on 32bit wide-int (my guess).
>> *If* this is the case the question is: why is the CU GC'ed?
>
> Why wouldn't it be? I'm trying to follow along here :-)

If the CU was GC'ed the eln should have been dlclosed.  If that's the
case at the next load we should get a fresh handle and 'saved_cu' should
be NULL (ops!  Qnil... :/) because static allocated.

Here what we see is that we are loading two times without dlclosing and
the object pointed by 'cu_saved' has some issue.

So thinking about: the fact that the eln was never dlclosed should be
prove that the CU was not GC'ed and so I was wrong.  This suggests also
that before further talking stupid I'd better say I'm done for the day
:)

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-07 21:51                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-07 22:16                                                 ` Pip Cet
  2021-03-08 13:26                                                   ` Eli Zaretskii
  2021-03-08  3:31                                                 ` Eli Zaretskii
  1 sibling, 1 reply; 179+ messages in thread
From: Pip Cet @ 2021-03-07 22:16 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: andrewjmoreton, 46256

On Sun, Mar 7, 2021 at 9:51 PM Andrea Corallo <akrl@sdf.org> wrote:
> Pip Cet <pipcet@gmail.com> writes:
> > On Sun, Mar 7, 2021 at 8:17 PM Andrea Corallo via Bug reports for GNU
> > Emacs, the Swiss army knife of text editors <bug-gnu-emacs@gnu.org>
> > wrote:
> >> Eli Zaretskii <eliz@gnu.org> writes:
> >> >> From: Andrea Corallo <akrl@sdf.org>
> >> >> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> >> >> Date: Sun, 07 Mar 2021 18:53:50 +0000
> >> What I think is going on here:
> >>
> >> The same .eln file is loaded two times, we detect that and try to reuse
> >> the same compilation unit (the Lisp object) instead of a new one.
> >>
> >> We keep a pointer to the compilation unit representing the .eln file in
> >> each .eln.  Here we read it and we have it into 'saved_cu', we try to
> >> dereference it and extract the CU with XNATIVE_COMP_UNIT but something
> >> goes wrong.
> >>
> >> This object might have been GC'ed for some reason and we might be
> >> looking at the same GC issue I've seen on 32bit wide-int (my guess).
> >> *If* this is the case the question is: why is the CU GC'ed?
> >
> > Why wouldn't it be? I'm trying to follow along here :-)
>
> If the CU was GC'ed the eln should have been dlclosed.

Wait, I thought this was on Windows?

> If that's the
> case at the next load we should get a fresh handle

You're assuming
1. FreeLibrary() succeeded
2. The module's refcount was 1
3. The module wasn't pinned.

If any of these assumptions is violated, the behavior would be
precisely as observed.

It's easy enough to test this: we can put a printf in dynlib_open
which tells us whether we see the same handle more than once.

> and 'saved_cu' should
> be NULL (ops!  Qnil... :/) because static allocated.

Well, for one reason or another, it wasn't reset to Qnil.

> Here what we see is that we are loading two times without dlclosing and
> the object pointed by 'cu_saved' has some issue.

I don't think so. I think we called dynlib_close(), it didn't actually
unmap the library, and everything else follows.

> So thinking about: the fact that the eln was never dlclosed should be
> prove that the CU was not GC'ed and so I was wrong.

I don't think you were wrong.

Pip





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-07 21:51                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-07 22:16                                                 ` Pip Cet
@ 2021-03-08  3:31                                                 ` Eli Zaretskii
  2021-03-08  5:54                                                   ` Pip Cet
  1 sibling, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-08  3:31 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton, pipcet

> From: Andrea Corallo <akrl@sdf.org>
> Cc: Eli Zaretskii <eliz@gnu.org>, 46256@debbugs.gnu.org,
>         andrewjmoreton@gmail.com
> Date: Sun, 07 Mar 2021 21:51:15 +0000
> 
> > Why wouldn't it be? I'm trying to follow along here :-)
> 
> If the CU was GC'ed the eln should have been dlclosed.  If that's the
> case at the next load we should get a fresh handle and 'saved_cu' should
> be NULL (ops!  Qnil... :/) because static allocated.
> 
> Here what we see is that we are loading two times without dlclosing and
> the object pointed by 'cu_saved' has some issue.
> 
> So thinking about: the fact that the eln was never dlclosed should be
> prove that the CU was not GC'ed and so I was wrong.  This suggests also
> that before further talking stupid I'd better say I'm done for the day
> :)

Thanks.  Please tell me if you need me to provide some further data
from this crashed session.  If not, I will end the debugging session
and will try to find a recipe for reproducing the crash, so we could
see which of the guesses are true.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08  3:31                                                 ` Eli Zaretskii
@ 2021-03-08  5:54                                                   ` Pip Cet
  2021-03-08  6:48                                                     ` Pip Cet
  2021-03-08 14:11                                                     ` Eli Zaretskii
  0 siblings, 2 replies; 179+ messages in thread
From: Pip Cet @ 2021-03-08  5:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton, Andrea Corallo

On Mon, Mar 8, 2021 at 3:31 AM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Andrea Corallo <akrl@sdf.org>
> > Cc: Eli Zaretskii <eliz@gnu.org>, 46256@debbugs.gnu.org,
> >         andrewjmoreton@gmail.com
> > Date: Sun, 07 Mar 2021 21:51:15 +0000
> >
> > > Why wouldn't it be? I'm trying to follow along here :-)
> >
> > If the CU was GC'ed the eln should have been dlclosed.  If that's the
> > case at the next load we should get a fresh handle and 'saved_cu' should
> > be NULL (ops!  Qnil... :/) because static allocated.
> >
> > Here what we see is that we are loading two times without dlclosing and
> > the object pointed by 'cu_saved' has some issue.
> >
> > So thinking about: the fact that the eln was never dlclosed should be
> > prove that the CU was not GC'ed and so I was wrong.  This suggests also
> > that before further talking stupid I'd better say I'm done for the day
> > :)
>
> Thanks.  Please tell me if you need me to provide some further data
> from this crashed session.  If not, I will end the debugging session

Do you have to end the crashed session to start a new one? I think we
should keep it open for a while longer (or create a core dump, if that
works?) and still try to test whether it's the
dynlib_close()-might-not-close bug.

> and will try to find a recipe for reproducing the crash, so we could
> see which of the guesses are true.

Here's what I did to reproduce the crash:
1. create a file "test.el":

;; -*- lexical-binding: t; -*-
(defun testfun (x)
  (1+ x))

2. Evaluate:

(require 'comp)
(native-elisp-load (native-compile "test.el"))
(testfun 3)
(fmakunbound 'testfun)
(garbage-collect)
(native-elisp-load (native-compile "test.el"))

Note that this might not always work because of conservative GC.

Pip





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08  5:54                                                   ` Pip Cet
@ 2021-03-08  6:48                                                     ` Pip Cet
  2021-03-08 10:14                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-09  8:32                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-08 14:11                                                     ` Eli Zaretskii
  1 sibling, 2 replies; 179+ messages in thread
From: Pip Cet @ 2021-03-08  6:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton, Andrea Corallo

[-- Attachment #1: Type: text/plain, Size: 1166 bytes --]

On Mon, Mar 8, 2021 at 5:54 AM Pip Cet <pipcet@gmail.com> wrote:
> Note that this might not always work because of conservative GC.

If it doesn't work, can you simply retry a few times? Eventually there
shouldn't be references to the stale native_comp_unit on the stack.

However, I think I've worked out why dynlib_close doesn't do its job:

Fnative_elisp_load creates a comp unit, but, if the shared library has
already been initialized, it doesn't set that comp unit's comp_unit
variable to point to the new comp unit; instead, it will continue
pointing to the first comp unit which still has it open.

Then, the original comp unit is unloaded but not the new one created
by Fnative_elisp_load. We call dynlib_close() once, but we called it
twice before, leaving the shared library open and initialized.

Then, we try to load the comp unit again, and follow the stale
comp_unit variable pointing to the original comp unit.

Fix should be as attached. Note the fix is, at worst, harmless (unless
I messed up), so we should apply it anyway just because it's good not
to leave stale pointers lying around even if we hope that the OS
unmaps them at some point.

Pip

[-- Attachment #2: 0001-Fix-stale-pointers-in-comp-units-causing-crashes-bug.patch --]
[-- Type: text/x-patch, Size: 2001 bytes --]

From a2ab9701e48e5443809664d50c924b9d83062b4e Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet@gmail.com>
Date: Sun, 7 Mar 2021 21:26:29 +0000
Subject: [PATCH] Fix stale pointers in comp units causing crashes  (bug#46256)

* src/alloc.c (cleanup_vector): Call unload_comp_unit.
* src/comp.c (unload_comp_unit): New function.
---
 src/alloc.c |  3 +--
 src/comp.c  | 14 ++++++++++++++
 src/comp.h  |  2 ++
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/src/alloc.c b/src/alloc.c
index af08336177070..fee8cc08aa483 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -3157,8 +3157,7 @@ cleanup_vector (struct Lisp_Vector *vector)
     {
       struct Lisp_Native_Comp_Unit *cu =
 	PSEUDOVEC_STRUCT (vector, Lisp_Native_Comp_Unit);
-      eassert (cu->handle);
-      dynlib_close (cu->handle);
+      unload_comp_unit (cu);
     }
   else if (NATIVE_COMP_FLAG
 	   && PSEUDOVECTOR_TYPEP (&vector->header, PVEC_SUBR))
diff --git a/src/comp.c b/src/comp.c
index b68adf31d68bd..c9e068b90aa2c 100644
--- a/src/comp.c
+++ b/src/comp.c
@@ -4936,6 +4936,20 @@ load_comp_unit (struct Lisp_Native_Comp_Unit *comp_u, bool loading_dump,
   return res;
 }
 
+void
+unload_comp_unit (struct Lisp_Native_Comp_Unit *cu)
+{
+  if (cu->handle == NULL)
+    return;
+
+  Lisp_Object *saved_cu = dynlib_sym (cu->handle, COMP_UNIT_SYM);
+  Lisp_Object this_cu;
+  XSETNATIVE_COMP_UNIT (this_cu, cu);
+  if (EQ (this_cu, *saved_cu))
+    *saved_cu = Qnil;
+  dynlib_close (cu->handle);
+}
+
 Lisp_Object
 native_function_doc (Lisp_Object function)
 {
diff --git a/src/comp.h b/src/comp.h
index f7d17f398c75d..d01bc17565d7d 100644
--- a/src/comp.h
+++ b/src/comp.h
@@ -78,6 +78,8 @@ XNATIVE_COMP_UNIT (Lisp_Object a)
 extern Lisp_Object load_comp_unit (struct Lisp_Native_Comp_Unit *comp_u,
 				   bool loading_dump, bool late_load);
 
+extern void unload_comp_unit (struct Lisp_Native_Comp_Unit *);
+
 extern Lisp_Object native_function_doc (Lisp_Object function);
 
 extern void syms_of_comp (void);
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08  6:48                                                     ` Pip Cet
@ 2021-03-08 10:14                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-08 10:45                                                         ` Pip Cet
  2021-03-09  8:32                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-08 10:14 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, andrewjmoreton, 46256

[-- Attachment #1: Type: text/plain, Size: 2052 bytes --]

Pip Cet <pipcet@gmail.com> writes:

> On Mon, Mar 8, 2021 at 5:54 AM Pip Cet <pipcet@gmail.com> wrote:
>> Note that this might not always work because of conservative GC.
>
> If it doesn't work, can you simply retry a few times? Eventually there
> shouldn't be references to the stale native_comp_unit on the stack.
>
> However, I think I've worked out why dynlib_close doesn't do its job:
>
> Fnative_elisp_load creates a comp unit, but, if the shared library has
> already been initialized, it doesn't set that comp unit's comp_unit
> variable to point to the new comp unit; instead, it will continue
> pointing to the first comp unit which still has it open.
>
> Then, the original comp unit is unloaded but not the new one created
> by Fnative_elisp_load. We call dynlib_close() once, but we called it
> twice before, leaving the shared library open and initialized.
>
> Then, we try to load the comp unit again, and follow the stale
> comp_unit variable pointing to the original comp unit.
>
> Fix should be as attached. Note the fix is, at worst, harmless (unless
> I messed up), so we should apply it anyway just because it's good not
> to leave stale pointers lying around even if we hope that the OS
> unmaps them at some point.
>
> Pip

Hi Pip,

thanks for the analysis, I'm not sure I followed 100% so I'll repeat to
make sure we are on the same page, please correct me in case.

IIUC (and make sense to me) the issue is that we are leaving two pointer
pointing to the same handle: One is in the CU_2 allocated by
'Fnative_elisp_load' and later discarded by 'load_comp_unit' when
reloading the same filename.  The other is the original CU_1 created the
first time this filename was loaded.

When CU_2 will be GC'ed because discarded we'll get the problem because
we'll dlclose the handle.  Is this correct?

In case isn't the attached curing the issue as well?

Thanks

  Andrea

PS I couldn't reproduce using the lisp reproducer both on my 64bit both
on my 32bit system (I left it looping for a while), is that reproducer
working for you?


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: dlclose.patch --]
[-- Type: text/x-diff, Size: 1066 bytes --]

diff --git a/src/alloc.c b/src/alloc.c
index af08336177..0bea10610f 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -3157,8 +3157,8 @@ cleanup_vector (struct Lisp_Vector *vector)
     {
       struct Lisp_Native_Comp_Unit *cu =
 	PSEUDOVEC_STRUCT (vector, Lisp_Native_Comp_Unit);
-      eassert (cu->handle);
-      dynlib_close (cu->handle);
+      if (cu->handle)
+	dynlib_close (cu->handle);
     }
   else if (NATIVE_COMP_FLAG
 	   && PSEUDOVECTOR_TYPEP (&vector->header, PVEC_SUBR))
diff --git a/src/comp.c b/src/comp.c
index e6f672de25..abc3535dc6 100644
--- a/src/comp.c
+++ b/src/comp.c
@@ -4832,6 +4832,10 @@ load_comp_unit (struct Lisp_Native_Comp_Unit *comp_u, bool loading_dump,
        We must *never* mess with static pointers in an already loaded
        eln.  */
     {
+      /* Invalidate the handle for the CU we are leaving for garbage
+	 collection.  */
+      comp_u->handle = NULL;
+      /* Swap CU for the old one.  */
       comp_u_lisp_obj = *saved_cu;
       comp_u = XNATIVE_COMP_UNIT (comp_u_lisp_obj);
       comp_u->loaded_once = true;

^ permalink raw reply related	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08 10:14                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-08 10:45                                                         ` Pip Cet
  2021-03-08 15:02                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-09 12:36                                                           ` Eli Zaretskii
  0 siblings, 2 replies; 179+ messages in thread
From: Pip Cet @ 2021-03-08 10:45 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: andrewjmoreton, 46256

On Mon, Mar 8, 2021 at 10:14 AM Andrea Corallo <akrl@sdf.org> wrote:
> Hi Pip,
>
> thanks for the analysis, I'm not sure I followed 100% so I'll repeat to
> make sure we are on the same page, please correct me in case.

Thanks for that!

> IIUC (and make sense to me) the issue is that we are leaving two pointer
> pointing to the same handle: One is in the CU_2 allocated by
> 'Fnative_elisp_load' and later discarded by 'load_comp_unit' when
> reloading the same filename.  The other is the original CU_1 created the
> first time this filename was loaded.
>
> When CU_2 will be GC'ed because discarded we'll get the problem because
> we'll dlclose the handle.  Is this correct?

CU_1 is GC'ed first. CU_2, for whatever reason, isn't  GC'ed in the same cycle.

> In case isn't the attached curing the issue as well?

I don't think so. The problem is that we have an invalid Lisp_Object
in the shared library, not that we're calling dlclose() too often..

Again, there's no real cost to fixing this: at best, we avoid a
catastrophic use-after-free. At worst, we nulled out a word of memory
only for it to be unmapped a moment later, no harm done.

> PS I couldn't reproduce using the lisp reproducer both on my 64bit both
> on my 32bit system (I left it looping for a while), is that reproducer
> working for you?

Have you modified dynlib_open() to leak the shared object? That's what
I think might be happening for Eli, so it makes sense to test with a
double dlopen() call, as I did.

FWIW, I suspect the reproducer should crash with your patch applied,
but I can't test right now :-)

Thanks

Pip





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-07 22:16                                                 ` Pip Cet
@ 2021-03-08 13:26                                                   ` Eli Zaretskii
  2021-03-08 13:52                                                     ` Pip Cet
  0 siblings, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-08 13:26 UTC (permalink / raw)
  To: Pip Cet; +Cc: 46256, andrewjmoreton, akrl

> From: Pip Cet <pipcet@gmail.com>
> Date: Sun, 7 Mar 2021 22:16:58 +0000
> Cc: Eli Zaretskii <eliz@gnu.org>, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> 
> > >> This object might have been GC'ed for some reason and we might be
> > >> looking at the same GC issue I've seen on 32bit wide-int (my guess).
> > >> *If* this is the case the question is: why is the CU GC'ed?
> > >
> > > Why wouldn't it be? I'm trying to follow along here :-)
> >
> > If the CU was GC'ed the eln should have been dlclosed.
> 
> Wait, I thought this was on Windows?

Yes, and...?

> > If that's the
> > case at the next load we should get a fresh handle
> 
> You're assuming
> 1. FreeLibrary() succeeded
> 2. The module's refcount was 1
> 3. The module wasn't pinned.
> 
> If any of these assumptions is violated, the behavior would be
> precisely as observed.

Why would any of the above assumptions be violated?





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08 13:26                                                   ` Eli Zaretskii
@ 2021-03-08 13:52                                                     ` Pip Cet
  2021-03-08 14:39                                                       ` Eli Zaretskii
  0 siblings, 1 reply; 179+ messages in thread
From: Pip Cet @ 2021-03-08 13:52 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton, Andrea Corallo

On Mon, Mar 8, 2021 at 1:26 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Sun, 7 Mar 2021 22:16:58 +0000
> > Cc: Eli Zaretskii <eliz@gnu.org>, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> > > If the CU was GC'ed the eln should have been dlclosed.
> >
> > Wait, I thought this was on Windows?
>
> Yes, and...?

No dlclose() on Windows. FreeLibrary() is documented to behave
differently from dlclose(), and I don't have a Windows system to test
whether it's actually different in practice.

> > > If that's the
> > > case at the next load we should get a fresh handle
> >
> > You're assuming
> > 1. FreeLibrary() succeeded
> > 2. The module's refcount was 1
> > 3. The module wasn't pinned.
> >
> > If any of these assumptions is violated, the behavior would be
> > precisely as observed.
>
> Why would any of the above assumptions be violated?

I have several suspicions, including "because the second compilation
unit referring to the same handle hasn't been collected". Because that
is definitely a bug, and one we should fix, and then we can debug this
issue more if and when it reappears.

Pip





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08  5:54                                                   ` Pip Cet
  2021-03-08  6:48                                                     ` Pip Cet
@ 2021-03-08 14:11                                                     ` Eli Zaretskii
  2021-03-08 14:27                                                       ` Pip Cet
  1 sibling, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-08 14:11 UTC (permalink / raw)
  To: Pip Cet; +Cc: 46256, andrewjmoreton, akrl

> From: Pip Cet <pipcet@gmail.com>
> Date: Mon, 8 Mar 2021 05:54:28 +0000
> Cc: Andrea Corallo <akrl@sdf.org>, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> 
> > Thanks.  Please tell me if you need me to provide some further data
> > from this crashed session.  If not, I will end the debugging session
> 
> Do you have to end the crashed session to start a new one?

I can start a new one, but I cannot (easily) build a modified Emacs as
long as the crashed session runs.  And even if I do build a new
version, as soon as I "git pull", the sources will not match the
binary in the debugging session, and debugging becomes ... interesting.

> I think we should keep it open for a while longer (or create a core
> dump, if that works?) and still try to test whether it's the
> dynlib_close()-might-not-close bug.

Core dumps aren't supported on Windows.  As for testing the dynlib
hypothesis: how can this session help?  If this is the problem, it
already happened, and the Emacs process is already all but dead: it
hit a fatal assertion violation.  I cannot run the debuggee anymore,
all I can do is examine existing variables.  If there are some
variables you want me to examine, please tell, and I will report their
values.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08 14:11                                                     ` Eli Zaretskii
@ 2021-03-08 14:27                                                       ` Pip Cet
  2021-03-08 18:06                                                         ` Eli Zaretskii
  2021-03-08 18:13                                                         ` Eli Zaretskii
  0 siblings, 2 replies; 179+ messages in thread
From: Pip Cet @ 2021-03-08 14:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton, Andrea Corallo

On Mon, Mar 8, 2021 at 2:12 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > I think we should keep it open for a while longer (or create a core
> > dump, if that works?) and still try to test whether it's the
> > dynlib_close()-might-not-close bug.
>
> Core dumps aren't supported on Windows.

Thanks, I did not know that.

> As for testing the dynlib
> hypothesis: how can this session help?  If this is the problem, it
> already happened, and the Emacs process is already all but dead: it
> hit a fatal assertion violation.  I cannot run the debuggee anymore,
> all I can do is examine existing variables.  If there are some
> variables you want me to examine, please tell, and I will report their
> values.

I would be interested in the pseudovector type of the variable that is
supposed to be a comp_unit, but isn't. I think that's all the
information of value that debuggee still has...

Pip





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08 13:52                                                     ` Pip Cet
@ 2021-03-08 14:39                                                       ` Eli Zaretskii
  2021-03-08 14:50                                                         ` Pip Cet
  0 siblings, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-08 14:39 UTC (permalink / raw)
  To: Pip Cet; +Cc: 46256, andrewjmoreton, akrl

> From: Pip Cet <pipcet@gmail.com>
> Date: Mon, 8 Mar 2021 13:52:49 +0000
> Cc: Andrea Corallo <akrl@sdf.org>, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> 
> > > Wait, I thought this was on Windows?
> >
> > Yes, and...?
> 
> No dlclose() on Windows.

Why does this matter in this case?  (And I do have dlclose in a
standard library that comes with MinGW, btw.  Not that it's relevant.)

> FreeLibrary() is documented to behave differently from dlclose()

It is?  In what way?

> and I don't have a Windows system to test whether it's actually
> different in practice.

Well, how about explaining the details in terms that are simple enough
that I could understand and do the testing?  Until now, you and Andrea
have been talking Chinese as far as I'm concerned.  Please be aware
that I don't know half the details you two do about native-comp
internals, and will never be able to know that: too many other things
on my plate and too little time.  Can you perhaps explain the issue
without alluding to CU_1, CU_2, Fnative_elisp_load etc., and without
assuming that their interactions are common knowledge?

> > Why would any of the above assumptions be violated?
> 
> I have several suspicions, including "because the second compilation
> unit referring to the same handle hasn't been collected". Because that
> is definitely a bug, and one we should fix, and then we can debug this
> issue more if and when it reappears.

More Chinese.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08 14:39                                                       ` Eli Zaretskii
@ 2021-03-08 14:50                                                         ` Pip Cet
  2021-03-08 15:14                                                           ` Eli Zaretskii
  2021-03-08 17:40                                                           ` Eli Zaretskii
  0 siblings, 2 replies; 179+ messages in thread
From: Pip Cet @ 2021-03-08 14:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton, Andrea Corallo

On Mon, Mar 8, 2021 at 2:39 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Mon, 8 Mar 2021 13:52:49 +0000
> > Cc: Andrea Corallo <akrl@sdf.org>, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> >
> > > > Wait, I thought this was on Windows?
> > >
> > > Yes, and...?
> >
> > No dlclose() on Windows.
>
> Why does this matter in this case?  (And I do have dlclose in a
> standard library that comes with MinGW, btw.  Not that it's relevant.)

We don't use dlclose() on Windows. FreeLibrary() is documented not to
unload the library in certain cases, and to return a failure code.

> > FreeLibrary() is documented to behave differently from dlclose()
>
> It is?  In what way?

Libraries can be pinned, and it can fail without a clear list of
potential failure reasons in the documentation. Not that dlclose() is
better according to the documentation, but there, the source is
available...

> > and I don't have a Windows system to test whether it's actually
> > different in practice.
>
> Well, how about explaining the details in terms that are simple enough
> that I could understand and do the testing?

Excellent idea. I'll try!

> Until now, you and Andrea
> have been talking Chinese as far as I'm concerned.  Please be aware
> that I don't know half the details you two do about native-comp
> internals, and will never be able to know that: too many other things
> on my plate and too little time.  Can you perhaps explain the issue
> without alluding to CU_1, CU_2, Fnative_elisp_load etc., and without
> assuming that their interactions are common knowledge?

Native-comp uses a pseudo-vector type representing a dlopen()ed handle.

In addition to the handle being stored in the pseudo-vector, a pointer
to the pseudo-vector is stored in the data space belonging to the
handle. I'll refer to that as the "reverse pointer" because I can't
think of a better term right now.

When we cleanup the pseudo-vector, we don't reset the reverse pointer
to NULL, or Qnil.

That is because we assume that the dlclose() we perform on cleanup
will unmap the data space belonging to the handle, anyway.

That assumption is wrong in certain specific circumstances.

In those circumstances, the reverse pointer is dereferenced after the
vector has been deallocated. It points to a random different vector
now.

> > > Why would any of the above assumptions be violated?
> >
> > I have several suspicions, including "because the second compilation
> > unit referring to the same handle hasn't been collected". Because that
> > is definitely a bug, and one we should fix, and then we can debug this
> > issue more if and when it reappears.
>
> More Chinese.

(Upon rereading, I agree. My bad.)

One of the circumstances in which the assumption (that the reverse
pointer won't be used) becomes invalid is when two pseudo-vectors
share a handle (and, thus, a reverse pointer). But the reverse pointer
can only point to one of them, and it might be the wrong one.

My patch, thus, resets the reverse pointer to Qnil when cleanup is
performed. In addition, it does so only if the reverse pointer
actually pointed to the pseudo-vector being cleaned-up, rather than to
a different one, to handle a corner case in the code.

Pip





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08 10:45                                                         ` Pip Cet
@ 2021-03-08 15:02                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-08 15:09                                                             ` Pip Cet
  2021-03-09 12:36                                                           ` Eli Zaretskii
  1 sibling, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-08 15:02 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, andrewjmoreton, 46256

Pip Cet <pipcet@gmail.com> writes:

> On Mon, Mar 8, 2021 at 10:14 AM Andrea Corallo <akrl@sdf.org> wrote:
>> Hi Pip,
>>
>> thanks for the analysis, I'm not sure I followed 100% so I'll repeat to
>> make sure we are on the same page, please correct me in case.
>
> Thanks for that!
>
>> IIUC (and make sense to me) the issue is that we are leaving two pointer
>> pointing to the same handle: One is in the CU_2 allocated by
>> 'Fnative_elisp_load' and later discarded by 'load_comp_unit' when
>> reloading the same filename.  The other is the original CU_1 created the
>> first time this filename was loaded.
>>
>> When CU_2 will be GC'ed because discarded we'll get the problem because
>> we'll dlclose the handle.  Is this correct?
>
> CU_1 is GC'ed first. CU_2, for whatever reason, isn't  GC'ed in the same cycle.
>
>> In case isn't the attached curing the issue as well?
>
> I don't think so. The problem is that we have an invalid Lisp_Object
> in the shared library, not that we're calling dlclose() too often..
>
> Again, there's no real cost to fixing this: at best, we avoid a
> catastrophic use-after-free. At worst, we nulled out a word of memory
> only for it to be unmapped a moment later, no harm done.
>
>> PS I couldn't reproduce using the lisp reproducer both on my 64bit both
>> on my 32bit system (I left it looping for a while), is that reproducer
>> working for you?
>
> Have you modified dynlib_open() to leak the shared object? That's what
> I think might be happening for Eli, so it makes sense to test with a
> double dlopen() call, as I did.

No, because I failed to understand why calling 'dlopen' two times in a
row on the same filename should make any difference as I expect the
second call to just return the same handle as the first.

I'm sure I'm missing something here or I misunderstood your suggestion:

> I can reproduce this issue by replacing the single call of dlopen() in
> dynlib_open with two calls


Thanks

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-07 20:16                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-07 21:27                                             ` Pip Cet
@ 2021-03-08 15:07                                             ` Eli Zaretskii
  1 sibling, 0 replies; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-08 15:07 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton

> From: Andrea Corallo <akrl@sdf.org>
> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> Date: Sun, 07 Mar 2021 20:16:40 +0000
> 
> >   #1  0x0135a63b in emacs_abort () at w32fns.c:10914
> >   #2  0x0115c637 in terminate_due_to_signal (sig=22, backtrace_limit=2147483647)
> >       at emacs.c:417
> >   #3  0x0121c026 in die (msg=0x1782af2 <targets+1266> "NATIVE_COMP_UNITP (a)",
> >       file=0x1782aeb <targets+1259> "comp.h", line=70) at alloc.c:7452
> >   #4  0x012cf582 in XNATIVE_COMP_UNIT (a=XIL(0x6f04860091b9000)) at comp.h:70
> >   #5  0x012df324 in load_comp_unit (comp_u=0x6f33918, loading_dump=false,
> >       late_load=false) at comp.c:4821
> >   #6  0x012e0c55 in Fnative_elisp_load (filename=XIL(0x80000000092db190),
> >       late_load=XIL(0)) at comp.c:5122
> 
> What I think is going on here:
> 
> The same .eln file is loaded two times, we detect that and try to reuse
> the same compilation unit (the Lisp object) instead of a new one.
> 
> We keep a pointer to the compilation unit representing the .eln file in
> each .eln.  Here we read it and we have it into 'saved_cu', we try to
> dereference it and extract the CU with XNATIVE_COMP_UNIT but something
> goes wrong.
> 
> This object might have been GC'ed for some reason and we might be
> looking at the same GC issue I've seen on 32bit wide-int (my guess).
> *If* this is the case the question is: why is the CU GC'ed?

Can you please step back a notch for a moment and help me understand
how this machinery works?  Because I'm looking at the code, and I'm
confused.

For example, I see this:

  Lisp_Object *saved_cu = dynlib_sym (handle, COMP_UNIT_SYM);
  comp_u->loaded_once = !NILP (*saved_cu);

But dynlib_sym doesn't return a pointer to a Lisp_Object, it returns a
pointer to a function or a variable inside the .eln shared library.
So how is this TRT?

A few lines later we do this:

      comp_u_lisp_obj = *saved_cu;
      comp_u = XNATIVE_COMP_UNIT (comp_u_lisp_obj);

But if saved_cu is NOT a pointer to a Lisp_Object, then how do we
expect XNATIVE_COMP_UNIT not to crash?

What am I missing?





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08 15:02                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-08 15:09                                                             ` Pip Cet
  2021-03-08 15:38                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Pip Cet @ 2021-03-08 15:09 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: andrewjmoreton, 46256

[-- Attachment #1: Type: text/plain, Size: 1058 bytes --]

On Mon, Mar 8, 2021 at 3:03 PM Andrea Corallo <akrl@sdf.org> wrote:
> Pip Cet <pipcet@gmail.com> writes:
> > On Mon, Mar 8, 2021 at 10:14 AM Andrea Corallo <akrl@sdf.org> wrote:
> > Have you modified dynlib_open() to leak the shared object? That's what
> > I think might be happening for Eli, so it makes sense to test with a
> > double dlopen() call, as I did.
>
> No, because I failed to understand why calling 'dlopen' two times in a
> row on the same filename should make any difference as I expect the
> second call to just return the same handle as the first.

It does.

What changes is that the next time we load the library, the first
(leaky) dlopen() will have kept it in memory, so the third and fourth
calls to dlopen() would also return the same handle as the first and
second calls did.

> I'm sure I'm missing something here or I misunderstood your suggestion:

I don't know whether you are, it's possible I am confused. What I do
know is if I apply the attached patch and run the reproducer, it
crashes rapidly, usually on the first run.

Pip

[-- Attachment #2: dup-dlopen.diff --]
[-- Type: text/x-patch, Size: 300 bytes --]

diff --git a/src/dynlib.c b/src/dynlib.c
index 1338e9109c91a..d29bdb1e86d0a 100644
--- a/src/dynlib.c
+++ b/src/dynlib.c
@@ -270,6 +270,7 @@ dynlib_close (dynlib_handle_ptr h)
 dynlib_handle_ptr
 dynlib_open (const char *path)
 {
+  dlopen (path, RTLD_LAZY);
   return dlopen (path, RTLD_LAZY);
 }
 

^ permalink raw reply related	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08 14:50                                                         ` Pip Cet
@ 2021-03-08 15:14                                                           ` Eli Zaretskii
  2021-03-08 17:40                                                           ` Eli Zaretskii
  1 sibling, 0 replies; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-08 15:14 UTC (permalink / raw)
  To: Pip Cet; +Cc: 46256, andrewjmoreton, akrl

> From: Pip Cet <pipcet@gmail.com>
> Date: Mon, 8 Mar 2021 14:50:50 +0000
> Cc: Andrea Corallo <akrl@sdf.org>, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> 
> > > No dlclose() on Windows.
> >
> > Why does this matter in this case?  (And I do have dlclose in a
> > standard library that comes with MinGW, btw.  Not that it's relevant.)
> 
> We don't use dlclose() on Windows. FreeLibrary() is documented not to
> unload the library in certain cases, and to return a failure code.

You will need to show how those cases could happen in our scenario,
given the way we call FreeLibrary and other related APIs.  Otherwise,
I don't see how these subtleties are relevant.

> > > FreeLibrary() is documented to behave differently from dlclose()
> >
> > It is?  In what way?
> 
> Libraries can be pinned

We never call the API that could result in a library being pinned,
certainly not in the scenario we are talking about.  At least that's
my reading of the code.  Again, if you can describe the situation
where such pinning could happen, please do.  If that happens, it's
probably a bug, because we have no reason to pin a DLL.

> > Well, how about explaining the details in terms that are simple enough
> > that I could understand and do the testing?
> 
> Excellent idea. I'll try!

Thanks, I will study this later.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08 15:09                                                             ` Pip Cet
@ 2021-03-08 15:38                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 0 replies; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-08 15:38 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, andrewjmoreton, 46256

Pip Cet <pipcet@gmail.com> writes:

> On Mon, Mar 8, 2021 at 3:03 PM Andrea Corallo <akrl@sdf.org> wrote:
>> Pip Cet <pipcet@gmail.com> writes:
>> > On Mon, Mar 8, 2021 at 10:14 AM Andrea Corallo <akrl@sdf.org> wrote:
>> > Have you modified dynlib_open() to leak the shared object? That's what
>> > I think might be happening for Eli, so it makes sense to test with a
>> > double dlopen() call, as I did.
>>
>> No, because I failed to understand why calling 'dlopen' two times in a
>> row on the same filename should make any difference as I expect the
>> second call to just return the same handle as the first.
>
> It does.
>
> What changes is that the next time we load the library, the first
> (leaky) dlopen() will have kept it in memory, so the third and fourth
> calls to dlopen() would also return the same handle as the first and
> second calls did.

Ah okay, IIUC the intent is to change the number of allocation so the
internal reference counter of GLIBC doesn't go to zero?

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08 14:50                                                         ` Pip Cet
  2021-03-08 15:14                                                           ` Eli Zaretskii
@ 2021-03-08 17:40                                                           ` Eli Zaretskii
  1 sibling, 0 replies; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-08 17:40 UTC (permalink / raw)
  To: Pip Cet; +Cc: 46256, andrewjmoreton, akrl

> From: Pip Cet <pipcet@gmail.com>
> Date: Mon, 8 Mar 2021 14:50:50 +0000
> Cc: Andrea Corallo <akrl@sdf.org>, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> 
> > Well, how about explaining the details in terms that are simple enough
> > that I could understand and do the testing?
> 
> Excellent idea. I'll try!

Thanks.  Somewhat clearer now, but I'm still not out of the woods
yet.  Bear with me.

> Native-comp uses a pseudo-vector type representing a dlopen()ed handle.

You mean, the Lisp_Native_Comp_Unit structure?  If so, it doesn't
really represent a handle, AFAIU, it represents a .eln file we
loaded.  Right?

> In addition to the handle being stored in the pseudo-vector, a pointer
> to the pseudo-vector is stored in the data space belonging to the
> handle. I'll refer to that as the "reverse pointer" because I can't
> think of a better term right now.

Why do we need this reverse pointer, and how do we use it?

> When we cleanup the pseudo-vector, we don't reset the reverse pointer
> to NULL, or Qnil.

What do you mean by "cleanup" here?  Under what circumstances does it
happen?

And no, Qnil won't do, because a Lisp_Object can be wider than a
pointer (it is in the 32-bit build --with-wide-int).  NULL is fine.

> That is because we assume that the dlclose() we perform on cleanup
> will unmap the data space belonging to the handle, anyway.

But the call to dlclose doesn't happen immediately, it only happens in
GC.  Right?

(A nit: please don't use "foo()" to refer to a function, because that
looks like a call to 'foo' with no arguments, which is not what you
mean.)

> That assumption is wrong in certain specific circumstances.
> 
> In those circumstances, the reverse pointer is dereferenced after the
> vector has been deallocated. It points to a random different vector
> now.

I need to understand the circumstances under which this could happen.
If the vector has been deallocated, it means it was GC'ed, right?  And
if it was GC'ed, how come the .eln was not unloaded?

> One of the circumstances in which the assumption (that the reverse
> pointer won't be used) becomes invalid is when two pseudo-vectors
> share a handle (and, thus, a reverse pointer).

How can that happen? can you describe a series of events that make
this possible?

> My patch, thus, resets the reverse pointer to Qnil when cleanup is
> performed.

You can't use Qnil, for the reasons described above.

P.S. The stuff described in this sub-thread should eventually find its
way into comments in comp.c.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08 14:27                                                       ` Pip Cet
@ 2021-03-08 18:06                                                         ` Eli Zaretskii
  2021-03-08 18:15                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-08 18:18                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-08 18:13                                                         ` Eli Zaretskii
  1 sibling, 2 replies; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-08 18:06 UTC (permalink / raw)
  To: Pip Cet; +Cc: 46256, andrewjmoreton, akrl

> From: Pip Cet <pipcet@gmail.com>
> Date: Mon, 8 Mar 2021 14:27:19 +0000
> Cc: Andrea Corallo <akrl@sdf.org>, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> 
> I would be interested in the pseudovector type of the variable that is
> supposed to be a comp_unit, but isn't. I think that's all the
> information of value that debuggee still has...

You mean, *saved_cu?  It cannot be anything interesting, because the
pointer is garbled:

  (gdb) p *saved_cu
  $9 = XIL(0x6f04860091b9000)
  (gdb) xtype
  Lisp_Symbol
  (gdb) xsymbol
  $10 = (struct Lisp_Symbol *) 0xaa21360
  Cannot access memory at address 0xaa21368

Since this is a 32-bit build, no Lisp object can have the high 24 bits
non-zero, so 0x6f04860091b9000 cannot be a valid object.

Another factoid that may be of interest is this.  At the beginning of
load_comp_unit we do:

  dynlib_handle_ptr handle = comp_u->handle;

So:

  (gdb) p/x comp_u->handle
  $13 = 0x6a580000

Now, on Windows, the "handle" returned by LoadLibrary is just the
memory address where the library is loaded.  However, "info shared" in
GDB doesn't show _any_ .eln library loaded at that address.  The
closest one is this:

  From        To          Syms Read   Shared Object Library
  0x6a581000  0x6a5bacd8  Yes         d:\usr\eli\.emacs.d\eln-cache\28.0.50-19fa14f1\cc-align-bb265728-bd3550a3.eln

whose address is 4KB higher.  That probably means the CU represented
by comp_u was unloaded, right?

Anything else we could glean from that crashed Emacs?





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08 14:27                                                       ` Pip Cet
  2021-03-08 18:06                                                         ` Eli Zaretskii
@ 2021-03-08 18:13                                                         ` Eli Zaretskii
  2021-03-08 20:53                                                           ` Pip Cet
  1 sibling, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-08 18:13 UTC (permalink / raw)
  To: Pip Cet; +Cc: 46256, andrewjmoreton, akrl

> From: Pip Cet <pipcet@gmail.com>
> Date: Mon, 8 Mar 2021 14:27:19 +0000
> Cc: Andrea Corallo <akrl@sdf.org>, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> 
> I would be interested in the pseudovector type of the variable that is
> supposed to be a comp_unit, but isn't. I think that's all the
> information of value that debuggee still has...

You mean, *saved_cu?  It cannot be anything interesting, because the
pointer is garbled:

  (gdb) p *saved_cu
  $9 = XIL(0x6f04860091b9000)
  (gdb) xtype
  Lisp_Symbol
  (gdb) xsymbol
  $10 = (struct Lisp_Symbol *) 0xaa21360
  Cannot access memory at address 0xaa21368

Since this is a 32-bit build, no Lisp object can have the high 24 bits
non-zero, so 0x6f04860091b9000 cannot be a valid object.

Another factoid that may be of interest is this.  At the beginning of
load_comp_unit we do:

  dynlib_handle_ptr handle = comp_u->handle;

So:

  (gdb) p/x comp_u->handle
  $13 = 0x6a580000

Now, on Windows the "handle" returned by LoadLibrary is just the
memory address where the library is loaded.  However, "info shared" in
GDB doesn't show _any_ .eln library loaded at that address.  The
closest one is this:

  From        To          Syms Read   Shared Object Library
  0x6a581000  0x6a5bacd8  Yes         d:\usr\eli\.emacs.d\eln-cache\28.0.50-19fa14f1\cc-align-bb265728-bd3550a3.eln

whose address is 4KB higher.  That probably means the CU represented
by comp_u was unloaded, right?

But:

  (gdb) p comp_u->file
  $15 = XIL(0x80000000092db190)
  (gdb) xtype
  Lisp_String
  (gdb) xstring
  $16 = (struct Lisp_String *) 0x92db190
  "d:/usr/eli/.emacs.d/eln-cache/28.0.50-19fa14f1/cc-align-bb265728-bd3550a3.eln"

Surprise! it's the same .eln file as is now loaded 4KB higher.

Anything else we could glean from that crashed Emacs session?





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08 18:06                                                         ` Eli Zaretskii
@ 2021-03-08 18:15                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-08 20:37                                                             ` Eli Zaretskii
  2021-03-08 18:18                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-08 18:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton, Pip Cet

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Pip Cet <pipcet@gmail.com>
>> Date: Mon, 8 Mar 2021 14:27:19 +0000
>> Cc: Andrea Corallo <akrl@sdf.org>, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
>> 
>> I would be interested in the pseudovector type of the variable that is
>> supposed to be a comp_unit, but isn't. I think that's all the
>> information of value that debuggee still has...
>
> You mean, *saved_cu?  It cannot be anything interesting, because the
> pointer is garbled:
>
>   (gdb) p *saved_cu
>   $9 = XIL(0x6f04860091b9000)
>   (gdb) xtype
>   Lisp_Symbol
>   (gdb) xsymbol
>   $10 = (struct Lisp_Symbol *) 0xaa21360
>   Cannot access memory at address 0xaa21368
>
> Since this is a 32-bit build, no Lisp object can have the high 24 bits
> non-zero, so 0x6f04860091b9000 cannot be a valid object.
>
> Another factoid that may be of interest is this.  At the beginning of
> load_comp_unit we do:
>
>   dynlib_handle_ptr handle = comp_u->handle;
>
> So:
>
>   (gdb) p/x comp_u->handle
>   $13 = 0x6a580000
>
> Now, on Windows, the "handle" returned by LoadLibrary is just the
> memory address where the library is loaded.  However, "info shared" in
> GDB doesn't show _any_ .eln library loaded at that address.  The
> closest one is this:
>
>   From        To          Syms Read   Shared Object Library
>   0x6a581000  0x6a5bacd8  Yes         d:\usr\eli\.emacs.d\eln-cache\28.0.50-19fa14f1\cc-align-bb265728-bd3550a3.eln
>
> whose address is 4KB higher.  That probably means the CU represented
> by comp_u was unloaded, right?

I guess this suggests 0x6a580000 was a previously infact a mapped eln
that got unmapped.

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08 18:06                                                         ` Eli Zaretskii
  2021-03-08 18:15                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-08 18:18                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-08 18:32                                                             ` Pip Cet
  2021-03-08 20:49                                                             ` Eli Zaretskii
  1 sibling, 2 replies; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-08 18:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton, Pip Cet

Eli Zaretskii <eliz@gnu.org> writes:

> Anything else we could glean from that crashed Emacs?

Not on my side thanks

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08 18:18                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-08 18:32                                                             ` Pip Cet
  2021-03-08 20:47                                                               ` Eli Zaretskii
  2021-03-08 20:49                                                             ` Eli Zaretskii
  1 sibling, 1 reply; 179+ messages in thread
From: Pip Cet @ 2021-03-08 18:32 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: andrewjmoreton, 46256

On Mon, Mar 8, 2021 at 6:18 PM Andrea Corallo <akrl@sdf.org> wrote:
> Eli Zaretskii <eliz@gnu.org> writes:
>
> > Anything else we could glean from that crashed Emacs?
>
> Not on my side thanks

What's saved_cu, actually?

Pip





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08 18:15                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-08 20:37                                                             ` Eli Zaretskii
  2021-03-09  7:03                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-08 20:37 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton, pipcet

> From: Andrea Corallo <akrl@sdf.org>
> Cc: Pip Cet <pipcet@gmail.com>, 46256@debbugs.gnu.org,
>         andrewjmoreton@gmail.com
> Date: Mon, 08 Mar 2021 18:15:21 +0000
> 
> >   From        To          Syms Read   Shared Object Library
> >   0x6a581000  0x6a5bacd8  Yes         d:\usr\eli\.emacs.d\eln-cache\28.0.50-19fa14f1\cc-align-bb265728-bd3550a3.eln
> >
> > whose address is 4KB higher.  That probably means the CU represented
> > by comp_u was unloaded, right?
> 
> I guess this suggests 0x6a580000 was a previously infact a mapped eln
> that got unmapped.

Can you tell why are we loading the same .eln files more than once?
What are the reasons for unloading a .eln file which was once loaded
into a session?





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08 18:32                                                             ` Pip Cet
@ 2021-03-08 20:47                                                               ` Eli Zaretskii
  2021-03-08 20:50                                                                 ` Pip Cet
  0 siblings, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-08 20:47 UTC (permalink / raw)
  To: Pip Cet; +Cc: 46256, andrewjmoreton, akrl

> From: Pip Cet <pipcet@gmail.com>
> Date: Mon, 8 Mar 2021 18:32:55 +0000
> Cc: Eli Zaretskii <eliz@gnu.org>, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> 
> What's saved_cu, actually?

Not sure I understand the question.  Is the below what you wanted to
see?

  (gdb) p saved_cu
  $6 = (Lisp_Object *) 0x6a5b2e7c <comp_unit>





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08 18:18                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-08 18:32                                                             ` Pip Cet
@ 2021-03-08 20:49                                                             ` Eli Zaretskii
  2021-03-09  8:35                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-08 20:49 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton, pipcet

> From: Andrea Corallo <akrl@sdf.org>
> Cc: Pip Cet <pipcet@gmail.com>, 46256@debbugs.gnu.org,
>         andrewjmoreton@gmail.com
> Date: Mon, 08 Mar 2021 18:18:16 +0000
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Anything else we could glean from that crashed Emacs?
> 
> Not on my side thanks

Anything you'd like me to try to find out in the next session,
assuming I'll be able to reproduce this assertion violation?





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08 20:47                                                               ` Eli Zaretskii
@ 2021-03-08 20:50                                                                 ` Pip Cet
  2021-03-09  8:28                                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Pip Cet @ 2021-03-08 20:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton, Andrea Corallo

[-- Attachment #1: Type: text/plain, Size: 547 bytes --]

On Mon, Mar 8, 2021 at 8:47 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Mon, 8 Mar 2021 18:32:55 +0000
> > Cc: Eli Zaretskii <eliz@gnu.org>, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> >
> > What's saved_cu, actually?
>
> Not sure I understand the question.  Is the below what you wanted to
> see?
>
>   (gdb) p saved_cu
>   $6 = (Lisp_Object *) 0x6a5b2e7c <comp_unit>

Yes! I believe we've found another bug.

We were allocating comp_unit as a Lisp_Object **, but it's actually a
Lisp_Object.

Pip

[-- Attachment #2: 0001-Try-to-fix-GC-crash-bug-46256.patch --]
[-- Type: text/x-patch, Size: 712 bytes --]

From cc717daba81fb39bf8ad8e85d46de384bb6fe47a Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet@gmail.com>
Date: Mon, 8 Mar 2021 20:49:59 +0000
Subject: [PATCH] Try to fix GC crash (bug#46256)

* src/comp.c (emit_ctxt_code): Allocate comp_unit as a Lisp_Object,
not a pointer to pointer to Lisp_Object.
---
 src/comp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/comp.c b/src/comp.c
index c9e068b90aa2c..799cfdc88b55d 100644
--- a/src/comp.c
+++ b/src/comp.c
@@ -2774,7 +2774,7 @@ emit_ctxt_code (void)
 	comp.ctxt,
 	NULL,
 	GCC_JIT_GLOBAL_EXPORTED,
-	gcc_jit_type_get_pointer (comp.lisp_obj_ptr_type),
+	comp.lisp_obj_type,
 	COMP_UNIT_SYM);
 
   declare_imported_data ();
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08 18:13                                                         ` Eli Zaretskii
@ 2021-03-08 20:53                                                           ` Pip Cet
  0 siblings, 0 replies; 179+ messages in thread
From: Pip Cet @ 2021-03-08 20:53 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton, Andrea Corallo

On Mon, Mar 8, 2021 at 6:13 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Mon, 8 Mar 2021 14:27:19 +0000
> > Cc: Andrea Corallo <akrl@sdf.org>, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> >
> > I would be interested in the pseudovector type of the variable that is
> > supposed to be a comp_unit, but isn't. I think that's all the
> > information of value that debuggee still has...
>
> You mean, *saved_cu?  It cannot be anything interesting, because the
> pointer is garbled:
>
>   (gdb) p *saved_cu
>   $9 = XIL(0x6f04860091b9000)
>   (gdb) xtype
>   Lisp_Symbol
>   (gdb) xsymbol
>   $10 = (struct Lisp_Symbol *) 0xaa21360
>   Cannot access memory at address 0xaa21368
>
> Since this is a 32-bit build, no Lisp object can have the high 24 bits
> non-zero, so 0x6f04860091b9000 cannot be a valid object.

The high 32 bits have indeed been clobbered, because we allocated only
four bytes for this Lisp_Object.

And since you use MSB tags, I'm assuming 0x91b9000 was where the other
native comp unit used to live.

> Another factoid that may be of interest is this.  At the beginning of
> load_comp_unit we do:
>
>   dynlib_handle_ptr handle = comp_u->handle;
>
> So:
>
>   (gdb) p/x comp_u->handle
>   $13 = 0x6a580000
>
> Now, on Windows the "handle" returned by LoadLibrary is just the
> memory address where the library is loaded.  However, "info shared" in
> GDB doesn't show _any_ .eln library loaded at that address.  The
> closest one is this:
>
>   From        To          Syms Read   Shared Object Library
>   0x6a581000  0x6a5bacd8  Yes         d:\usr\eli\.emacs.d\eln-cache\28.0.50-19fa14f1\cc-align-bb265728-bd3550a3.eln
>
> whose address is 4KB higher.  That probably means the CU represented
> by comp_u was unloaded, right?

But keep in mind that the code managed to call dynlib_sym (handle,
COMP_UNIT_SYM) just fine, so I think there might simply be a 4KB
region at the beginning of the library that's not mapped directly from
the shared object.

My search engine skills are weak, but aren't Windows DLL base
addresses aligned to 64 KB? This really looks to me like the "base
address" in Windows isn't what GDB shows in the From column.

Pip





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08 20:37                                                             ` Eli Zaretskii
@ 2021-03-09  7:03                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-09 12:55                                                                 ` Eli Zaretskii
  0 siblings, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-09  7:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton, pipcet

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: Pip Cet <pipcet@gmail.com>, 46256@debbugs.gnu.org,
>>         andrewjmoreton@gmail.com
>> Date: Mon, 08 Mar 2021 18:15:21 +0000
>> 
>> >   From        To          Syms Read   Shared Object Library
>> >   0x6a581000  0x6a5bacd8  Yes         d:\usr\eli\.emacs.d\eln-cache\28.0.50-19fa14f1\cc-align-bb265728-bd3550a3.eln
>> >
>> > whose address is 4KB higher.  That probably means the CU represented
>> > by comp_u was unloaded, right?
>> 
>> I guess this suggests 0x6a580000 was a previously infact a mapped eln
>> that got unmapped.
>
> Can you tell why are we loading the same .eln files more than once?

I guess `load' was called two times on the same filename.

> What are the reasons for unloading a .eln file which was once loaded
> into a session?

All the functions defined in it are not anymore reachable (read all
symbols functions are makunbound or defined to some other function).

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08 20:50                                                                 ` Pip Cet
@ 2021-03-09  8:28                                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-09  8:35                                                                     ` Pip Cet
  0 siblings, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-09  8:28 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, andrewjmoreton, 46256

Pip Cet <pipcet@gmail.com> writes:

> On Mon, Mar 8, 2021 at 8:47 PM Eli Zaretskii <eliz@gnu.org> wrote:
>> > From: Pip Cet <pipcet@gmail.com>
>> > Date: Mon, 8 Mar 2021 18:32:55 +0000
>> > Cc: Eli Zaretskii <eliz@gnu.org>, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
>> >
>> > What's saved_cu, actually?
>>
>> Not sure I understand the question.  Is the below what you wanted to
>> see?
>>
>>   (gdb) p saved_cu
>>   $6 = (Lisp_Object *) 0x6a5b2e7c <comp_unit>
>
> Yes! I believe we've found another bug.
>
> We were allocating comp_unit as a Lisp_Object **, but it's actually a
> Lisp_Object.

Uops!  Thanks I've installed this as 380ba045c4.

  Andrea

BTW this is apparently fixing also my 32bit wide int GC issue.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08  6:48                                                     ` Pip Cet
  2021-03-08 10:14                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-09  8:32                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-09 13:05                                                         ` Eli Zaretskii
  1 sibling, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-09  8:32 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, andrewjmoreton, 46256

Pip Cet <pipcet@gmail.com> writes:

> On Mon, Mar 8, 2021 at 5:54 AM Pip Cet <pipcet@gmail.com> wrote:
>> Note that this might not always work because of conservative GC.
>
> If it doesn't work, can you simply retry a few times? Eventually there
> shouldn't be references to the stale native_comp_unit on the stack.
>
> However, I think I've worked out why dynlib_close doesn't do its job:
>
> Fnative_elisp_load creates a comp unit, but, if the shared library has
> already been initialized, it doesn't set that comp unit's comp_unit
> variable to point to the new comp unit; instead, it will continue
> pointing to the first comp unit which still has it open.
>
> Then, the original comp unit is unloaded but not the new one created
> by Fnative_elisp_load. We call dynlib_close() once, but we called it
> twice before, leaving the shared library open and initialized.
>
> Then, we try to load the comp unit again, and follow the stale
> comp_unit variable pointing to the original comp unit.
>
> Fix should be as attached. Note the fix is, at worst, harmless (unless
> I messed up), so we should apply it anyway just because it's good not
> to leave stale pointers lying around even if we hope that the OS
> unmaps them at some point.
>
> Pip

The original code was written in the assumption that dlclose (as
FreeLibrary) can't fail unloading a shared when the internal refcount
goes to zero.  As this is not the case I think the suggested patch is
the correct fix.

I've installed it as 93f92cf1ba.

Thanks!

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08 20:49                                                             ` Eli Zaretskii
@ 2021-03-09  8:35                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-09 14:34                                                                 ` Eli Zaretskii
  0 siblings, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-09  8:35 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton, pipcet

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: Pip Cet <pipcet@gmail.com>, 46256@debbugs.gnu.org,
>>         andrewjmoreton@gmail.com
>> Date: Mon, 08 Mar 2021 18:18:16 +0000
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > Anything else we could glean from that crashed Emacs?
>> 
>> Not on my side thanks
>
> Anything you'd like me to try to find out in the next session,
> assuming I'll be able to reproduce this assertion violation?

I think at this point the best is to recompile using the latest state of
the branch with installed the two patches by Pip and see if you still
see the issue (probably no).

Thanks

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-09  8:28                                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-09  8:35                                                                     ` Pip Cet
  2021-03-09  8:43                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-09 14:32                                                                       ` Eli Zaretskii
  0 siblings, 2 replies; 179+ messages in thread
From: Pip Cet @ 2021-03-09  8:35 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: andrewjmoreton, 46256

On Tue, Mar 9, 2021 at 8:28 AM Andrea Corallo <akrl@sdf.org> wrote:
> Pip Cet <pipcet@gmail.com> writes:
> > On Mon, Mar 8, 2021 at 8:47 PM Eli Zaretskii <eliz@gnu.org> wrote:
> >> > From: Pip Cet <pipcet@gmail.com>
> >> > Date: Mon, 8 Mar 2021 18:32:55 +0000
> >> > Cc: Eli Zaretskii <eliz@gnu.org>, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> >> >
> >> > What's saved_cu, actually?
> >>
> >> Not sure I understand the question.  Is the below what you wanted to
> >> see?
> >>
> >>   (gdb) p saved_cu
> >>   $6 = (Lisp_Object *) 0x6a5b2e7c <comp_unit>
> >
> > Yes! I believe we've found another bug.
> >
> > We were allocating comp_unit as a Lisp_Object **, but it's actually a
> > Lisp_Object.
>
> Uops!  Thanks I've installed this as 380ba045c4.

Thank you!

> BTW this is apparently fixing also my 32bit wide int GC issue.

Excellent, we were in luck there, then :-)

I think the only mystery left here, assuming the bug doesn't happen
again, is why GDB reports a different shared library address from what
LoadLibrary returned. I think it might be because GDB looks at the
actual mmap state, and the DLL header might have been read in rather
than mmapped.

Pip





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-09  8:35                                                                     ` Pip Cet
@ 2021-03-09  8:43                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-09 14:32                                                                       ` Eli Zaretskii
  1 sibling, 0 replies; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-09  8:43 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, andrewjmoreton, 46256

Pip Cet <pipcet@gmail.com> writes:

> On Tue, Mar 9, 2021 at 8:28 AM Andrea Corallo <akrl@sdf.org> wrote:
>> Pip Cet <pipcet@gmail.com> writes:
>> > On Mon, Mar 8, 2021 at 8:47 PM Eli Zaretskii <eliz@gnu.org> wrote:
>> >> > From: Pip Cet <pipcet@gmail.com>
>> >> > Date: Mon, 8 Mar 2021 18:32:55 +0000
>> >> > Cc: Eli Zaretskii <eliz@gnu.org>, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
>> >> >
>> >> > What's saved_cu, actually?
>> >>
>> >> Not sure I understand the question.  Is the below what you wanted to
>> >> see?
>> >>
>> >>   (gdb) p saved_cu
>> >>   $6 = (Lisp_Object *) 0x6a5b2e7c <comp_unit>
>> >
>> > Yes! I believe we've found another bug.
>> >
>> > We were allocating comp_unit as a Lisp_Object **, but it's actually a
>> > Lisp_Object.
>>
>> Uops!  Thanks I've installed this as 380ba045c4.
>
> Thank you!
>
>> BTW this is apparently fixing also my 32bit wide int GC issue.
>
> Excellent, we were in luck there, then :-)

And this simply answer also why only 32bit wide-int was affected (and
how little has been tested in the past), nice.

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-08 10:45                                                         ` Pip Cet
  2021-03-08 15:02                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-09 12:36                                                           ` Eli Zaretskii
  1 sibling, 0 replies; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-09 12:36 UTC (permalink / raw)
  To: Pip Cet; +Cc: 46256, andrewjmoreton, akrl

> From: Pip Cet <pipcet@gmail.com>
> Date: Mon, 8 Mar 2021 10:45:49 +0000
> Cc: Eli Zaretskii <eliz@gnu.org>, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> 
> > IIUC (and make sense to me) the issue is that we are leaving two pointer
> > pointing to the same handle: One is in the CU_2 allocated by
> > 'Fnative_elisp_load' and later discarded by 'load_comp_unit' when
> > reloading the same filename.  The other is the original CU_1 created the
> > first time this filename was loaded.
> >
> > When CU_2 will be GC'ed because discarded we'll get the problem because
> > we'll dlclose the handle.  Is this correct?
> 
> CU_1 is GC'ed first. CU_2, for whatever reason, isn't  GC'ed in the same cycle.
> 
> > In case isn't the attached curing the issue as well?
> 
> I don't think so. The problem is that we have an invalid Lisp_Object
> in the shared library, not that we're calling dlclose() too often..
> 
> Again, there's no real cost to fixing this: at best, we avoid a
> catastrophic use-after-free. At worst, we nulled out a word of memory
> only for it to be unmapped a moment later, no harm done.

Once again, you are discussing a scenario whose relation to Real Life
I'm not sure I understand.  When will a cu be GC'ed?  Isn't that when
a .eln file is unloaded?  And isn't it true that it can only be
unloaded if the user or some code calls unload-feature or something
similar?  If the above is true, then the probability of this scenario
to happen is very low, and in my particular case it is strictly zero.

Not that I object to making the code robust in those rare cases, but
we are discussing a particular crash.

> > PS I couldn't reproduce using the lisp reproducer both on my 64bit both
> > on my 32bit system (I left it looping for a while), is that reproducer
> > working for you?
> 
> Have you modified dynlib_open() to leak the shared object? That's what
> I think might be happening for Eli

What shared object is supposed to leak in my case, and why?





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-09  7:03                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-09 12:55                                                                 ` Eli Zaretskii
  2021-03-09 14:55                                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-09 16:30                                                                   ` Eli Zaretskii
  0 siblings, 2 replies; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-09 12:55 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton, pipcet

> From: Andrea Corallo <akrl@sdf.org>
> Cc: pipcet@gmail.com, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> Date: Tue, 09 Mar 2021 07:03:01 +0000
> 
> > Can you tell why are we loading the same .eln files more than once?
> 
> I guess `load' was called two times on the same filename.

Is this likely to happen?  Our code generally uses 'require', which
should avoid that.

> > What are the reasons for unloading a .eln file which was once loaded
> > into a session?
> 
> All the functions defined in it are not anymore reachable (read all
> symbols functions are makunbound or defined to some other function).

That means someone did unload-feature, right?  Again, unlikely.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-09  8:32                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-09 13:05                                                         ` Eli Zaretskii
  2021-03-09 13:58                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-09 13:05 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton, pipcet

> From: Andrea Corallo <akrl@sdf.org>
> Cc: Eli Zaretskii <eliz@gnu.org>, 46256@debbugs.gnu.org,
>         andrewjmoreton@gmail.com
> Date: Tue, 09 Mar 2021 08:32:46 +0000
> 
> The original code was written in the assumption that dlclose (as
> FreeLibrary) can't fail unloading a shared when the internal refcount
> goes to zero.  As this is not the case

I think it _is_ the case, but the problem might be that the refcount
is not zero, and therefore the shared library is not actually unloaded
and unmapped.  (I say "might be" because I still don't see the
scenario where this could happen, and I'm not sure if it does happen
the solution should be as suggested -- it could be that it's better to
not load the .eln the second time, i.e. make 'load' behave like
'require').





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-09 13:05                                                         ` Eli Zaretskii
@ 2021-03-09 13:58                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-09 16:36                                                             ` Eli Zaretskii
  0 siblings, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-09 13:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton, pipcet

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: Eli Zaretskii <eliz@gnu.org>, 46256@debbugs.gnu.org,
>>         andrewjmoreton@gmail.com
>> Date: Tue, 09 Mar 2021 08:32:46 +0000
>> 
>> The original code was written in the assumption that dlclose (as
>> FreeLibrary) can't fail unloading a shared when the internal refcount
>> goes to zero.  As this is not the case
>
> I think it _is_ the case, but the problem might be that the refcount
> is not zero, and therefore the shared library is not actually unloaded
> and unmapped.  (I say "might be" because I still don't see the
> scenario where this could happen, and I'm not sure if it does happen
> the solution should be as suggested -- it could be that it's better to
> not load the .eln the second time, i.e. make 'load' behave like
> 'require').

That was my understanding (as I don't see why dlclose should fail) but
reading the man page:

"On success, dlclose() returns 0; on error, it returns a nonzero value."

So my understanding now is that it can fail.  Am I wrong?

Thanks

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-09  8:35                                                                     ` Pip Cet
  2021-03-09  8:43                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-09 14:32                                                                       ` Eli Zaretskii
  1 sibling, 0 replies; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-09 14:32 UTC (permalink / raw)
  To: Pip Cet; +Cc: 46256, andrewjmoreton, akrl

> From: Pip Cet <pipcet@gmail.com>
> Date: Tue, 9 Mar 2021 08:35:18 +0000
> Cc: Eli Zaretskii <eliz@gnu.org>, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> 
> I think the only mystery left here, assuming the bug doesn't happen
> again, is why GDB reports a different shared library address from what
> LoadLibrary returned. I think it might be because GDB looks at the
> actual mmap state, and the DLL header might have been read in rather
> than mmapped.

Yes, empirically I see this in every DLL that Emacs loads.  So this is
a non-issue.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-09  8:35                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-09 14:34                                                                 ` Eli Zaretskii
  2021-03-09 15:38                                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-09 14:34 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton, pipcet

> From: Andrea Corallo <akrl@sdf.org>
> Cc: pipcet@gmail.com, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> Date: Tue, 09 Mar 2021 08:35:03 +0000
> 
> > Anything you'd like me to try to find out in the next session,
> > assuming I'll be able to reproduce this assertion violation?
> 
> I think at this point the best is to recompile using the latest state of
> the branch with installed the two patches by Pip and see if you still
> see the issue (probably no).

I've now built the latest branch.  It still crashes, in the same
place, although with different Lisp files.  I'm looking into this,
will post the details.

Thanks.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-09 12:55                                                                 ` Eli Zaretskii
@ 2021-03-09 14:55                                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-09 16:42                                                                     ` Eli Zaretskii
  2021-03-09 18:31                                                                     ` Eli Zaretskii
  2021-03-09 16:30                                                                   ` Eli Zaretskii
  1 sibling, 2 replies; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-09 14:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton, pipcet

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: pipcet@gmail.com, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
>> Date: Tue, 09 Mar 2021 07:03:01 +0000
>>
>> > Can you tell why are we loading the same .eln files more than once?
>>
>> I guess `load' was called two times on the same filename.
>
> Is this likely to happen?  Our code generally uses 'require', which
> should avoid that.

cc-*.el files for instance have more than one direct call to load.  IIRC
one of the analyzed cases was cc-mode related (certanly one I observed).

>> > What are the reasons for unloading a .eln file which was once loaded
>> > into a session?
>>
>> All the functions defined in it are not anymore reachable (read all
>> symbols functions are makunbound or defined to some other function).
>
> That means someone did unload-feature, right?

Also loading for example a new version freshly compiled of the same file
should present the same scenario.

Thanks

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-09 14:34                                                                 ` Eli Zaretskii
@ 2021-03-09 15:38                                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-09 16:51                                                                     ` Eli Zaretskii
  0 siblings, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-09 15:38 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton, pipcet

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: pipcet@gmail.com, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
>> Date: Tue, 09 Mar 2021 08:35:03 +0000
>> 
>> > Anything you'd like me to try to find out in the next session,
>> > assuming I'll be able to reproduce this assertion violation?
>> 
>> I think at this point the best is to recompile using the latest state of
>> the branch with installed the two patches by Pip and see if you still
>> see the issue (probably no).
>
> I've now built the latest branch.  It still crashes, in the same
> place, although with different Lisp files.  I'm looking into this,
> will post the details.

Thinking about, you might have stale eln files reachable in the
`comp-eln-load-path' generated with the bug fixed by 380ba045c4.

We should have probably bumped a new `comp-abi-hash' contextually with
the fix, as I'm not sure if the merge bumped a new `comp-abi-hash' I did
it now manually with 79c83f79c5 to be on the safe side.

Thanks

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-09 12:55                                                                 ` Eli Zaretskii
  2021-03-09 14:55                                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-09 16:30                                                                   ` Eli Zaretskii
  2021-03-10 13:14                                                                     ` Alan Mackenzie
  1 sibling, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-09 16:30 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 46256, andrewjmoreton, pipcet, akrl

> Date: Tue, 09 Mar 2021 14:55:57 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com, pipcet@gmail.com
> 
> > From: Andrea Corallo <akrl@sdf.org>
> > Cc: pipcet@gmail.com, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> > Date: Tue, 09 Mar 2021 07:03:01 +0000
> > 
> > > Can you tell why are we loading the same .eln files more than once?
> > 
> > I guess `load' was called two times on the same filename.
> 
> Is this likely to happen?  Our code generally uses 'require', which
> should avoid that.

Answering my own question here: it can easily happen due to use of
cc-require in cc-*.el files.  Alan, why does CC mode use this
technique? what is the purpose of always loading a Lisp file even if
it was already loaded?





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-09 13:58                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-09 16:36                                                             ` Eli Zaretskii
  2021-03-09 17:10                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-09 16:36 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton, pipcet

> From: Andrea Corallo <akrl@sdf.org>
> Cc: pipcet@gmail.com, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> Date: Tue, 09 Mar 2021 13:58:31 +0000
> 
> > I think it _is_ the case, but the problem might be that the refcount
> > is not zero, and therefore the shared library is not actually unloaded
> > and unmapped.  (I say "might be" because I still don't see the
> > scenario where this could happen, and I'm not sure if it does happen
> > the solution should be as suggested -- it could be that it's better to
> > not load the .eln the second time, i.e. make 'load' behave like
> > 'require').
> 
> That was my understanding (as I don't see why dlclose should fail) but
> reading the man page:
> 
> "On success, dlclose() returns 0; on error, it returns a nonzero value."
> 
> So my understanding now is that it can fail.  Am I wrong?

I don't know.  Posix says no errors are defined for dlclose, so maybe
look at the glibc sources to see what happens on GNU/Linux?





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-09 14:55                                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-09 16:42                                                                     ` Eli Zaretskii
  2021-03-09 18:31                                                                     ` Eli Zaretskii
  1 sibling, 0 replies; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-09 16:42 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton, pipcet

> From: Andrea Corallo <akrl@sdf.org>
> Cc: pipcet@gmail.com, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> Date: Tue, 09 Mar 2021 14:55:40 +0000
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> I guess `load' was called two times on the same filename.
> >
> > Is this likely to happen?  Our code generally uses 'require', which
> > should avoid that.
> 
> cc-*.el files for instance have more than one direct call to load.  IIRC
> one of the analyzed cases was cc-mode related (certanly one I observed).

Yes, that's the cc-require macro, as I have discovered meanwhile.  I'm
not yet sure I understand why CC mode does that.

> >> > What are the reasons for unloading a .eln file which was once loaded
> >> > into a session?
> >>
> >> All the functions defined in it are not anymore reachable (read all
> >> symbols functions are makunbound or defined to some other function).
> >
> > That means someone did unload-feature, right?
> 
> Also loading for example a new version freshly compiled of the same file
> should present the same scenario.

Yes, that, too.  There's actually a problem with what we do in that
case, see my other (as yet unwritten) message.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-09 15:38                                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-09 16:51                                                                     ` Eli Zaretskii
  2021-03-09 17:04                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-09 16:51 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton, pipcet

> From: Andrea Corallo <akrl@sdf.org>
> Cc: pipcet@gmail.com, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> Date: Tue, 09 Mar 2021 15:38:54 +0000
> 
> > I've now built the latest branch.  It still crashes, in the same
> > place, although with different Lisp files.  I'm looking into this,
> > will post the details.
> 
> Thinking about, you might have stale eln files reachable in the
> `comp-eln-load-path' generated with the bug fixed by 380ba045c4.

Maybe.  What I see is that we load a CU in Fnative_elisp_load followed
by load_comp_unit, for the first time, and create a Lisp CU object for
it:

  Lisp_Object comp_u_lisp_obj;
  XSETNATIVE_COMP_UNIT (comp_u_lisp_obj, comp_u);

Then we store it in the shared library:

  if (comp_u->loaded_once)
    ...
  else
    *saved_cu = comp_u_lisp_obj;

But then we clobber the value of comp_u_lisp_obj here:

	  data_ephemeral_vec =
	    load_static_obj (comp_u, TEXT_DATA_RELOC_EPHEMERAL_SYM);

	  EMACS_INT d_vec_len = XFIXNUM (Flength (data_ephemeral_vec));
	  for (EMACS_INT i = 0; i < d_vec_len; i++)
	    data_eph_relocs[i] = AREF (data_ephemeral_vec, i);  <<<<<<<<<<<

Is this likely to be due to that problem?






^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-09 16:51                                                                     ` Eli Zaretskii
@ 2021-03-09 17:04                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-09 18:20                                                                         ` Eli Zaretskii
  0 siblings, 1 reply; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-09 17:04 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton, pipcet

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: pipcet@gmail.com, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
>> Date: Tue, 09 Mar 2021 15:38:54 +0000
>> 
>> > I've now built the latest branch.  It still crashes, in the same
>> > place, although with different Lisp files.  I'm looking into this,
>> > will post the details.
>> 
>> Thinking about, you might have stale eln files reachable in the
>> `comp-eln-load-path' generated with the bug fixed by 380ba045c4.
>
> Maybe.  What I see is that we load a CU in Fnative_elisp_load followed
> by load_comp_unit, for the first time, and create a Lisp CU object for
> it:
>
>   Lisp_Object comp_u_lisp_obj;
>   XSETNATIVE_COMP_UNIT (comp_u_lisp_obj, comp_u);
>
> Then we store it in the shared library:
>
>   if (comp_u->loaded_once)
>     ...
>   else
>     *saved_cu = comp_u_lisp_obj;
>
> But then we clobber the value of comp_u_lisp_obj here:
>
> 	  data_ephemeral_vec =
> 	    load_static_obj (comp_u, TEXT_DATA_RELOC_EPHEMERAL_SYM);
>
> 	  EMACS_INT d_vec_len = XFIXNUM (Flength (data_ephemeral_vec));
> 	  for (EMACS_INT i = 0; i < d_vec_len; i++)
> 	    data_eph_relocs[i] = AREF (data_ephemeral_vec, i);  <<<<<<<<<<<
>
> Is this likely to be due to that problem?

Interesting, how can we clobber the value of 'comp_u_lisp_obj' that is
stack allocated while writing into 'data_eph_relocs[i]' that is static
allocated in an eln?

If we clobber 'comp_u_lisp_obj' this is certanly a problem as we have to
pass it to 'top_level_run' later on.

Thanks

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-09 16:36                                                             ` Eli Zaretskii
@ 2021-03-09 17:10                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 0 replies; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-09 17:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton, pipcet

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: pipcet@gmail.com, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
>> Date: Tue, 09 Mar 2021 13:58:31 +0000
>> 
>> > I think it _is_ the case, but the problem might be that the refcount
>> > is not zero, and therefore the shared library is not actually unloaded
>> > and unmapped.  (I say "might be" because I still don't see the
>> > scenario where this could happen, and I'm not sure if it does happen
>> > the solution should be as suggested -- it could be that it's better to
>> > not load the .eln the second time, i.e. make 'load' behave like
>> > 'require').
>> 
>> That was my understanding (as I don't see why dlclose should fail) but
>> reading the man page:
>> 
>> "On success, dlclose() returns 0; on error, it returns a nonzero value."
>> 
>> So my understanding now is that it can fail.  Am I wrong?
>
> I don't know.  Posix says no errors are defined for dlclose, so maybe
> look at the glibc sources to see what happens on GNU/Linux?

To a quick look into GLIBC AFAIU dlclose can return non zero values.

Also looking at [1] it says:

"If handle does not refer to an open symbol table handle or if the
symbol table handle could not be closed, dlclose() shall return a
non-zero value."

So yeah, don't know if this is a real case (never seen it in practice)
but I think is good to be robust against in principal.

Thanks

  Andrea

[1] <https://pubs.opengroup.org/onlinepubs/9699919799/>





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-09 17:04                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-09 18:20                                                                         ` Eli Zaretskii
  2021-03-09 19:23                                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-09 18:20 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton, pipcet

> From: Andrea Corallo <akrl@sdf.org>
> Cc: pipcet@gmail.com, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> Date: Tue, 09 Mar 2021 17:04:58 +0000
> 
> >   else
> >     *saved_cu = comp_u_lisp_obj;
> >
> > But then we clobber the value of comp_u_lisp_obj here:
> >
> > 	  data_ephemeral_vec =
> > 	    load_static_obj (comp_u, TEXT_DATA_RELOC_EPHEMERAL_SYM);
> >
> > 	  EMACS_INT d_vec_len = XFIXNUM (Flength (data_ephemeral_vec));
> > 	  for (EMACS_INT i = 0; i < d_vec_len; i++)
> > 	    data_eph_relocs[i] = AREF (data_ephemeral_vec, i);  <<<<<<<<<<<
> >
> > Is this likely to be due to that problem?
> 
> Interesting, how can we clobber the value of 'comp_u_lisp_obj' that is
> stack allocated while writing into 'data_eph_relocs[i]' that is static
> allocated in an eln?

I don't know, but the problem disappeared after I rebuild with the
latest branch, so I guess it was related to the bug fixed in
380ba045c4 after all.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-09 14:55                                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-09 16:42                                                                     ` Eli Zaretskii
@ 2021-03-09 18:31                                                                     ` Eli Zaretskii
  2021-03-09 19:38                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-09 18:31 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 46256, andrewjmoreton, pipcet

> From: Andrea Corallo <akrl@sdf.org>
> Cc: pipcet@gmail.com, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> Date: Tue, 09 Mar 2021 14:55:40 +0000
> 
> cc-*.el files for instance have more than one direct call to load.  IIRC
> one of the analyzed cases was cc-mode related (certanly one I observed).
> 
> >> > What are the reasons for unloading a .eln file which was once loaded
> >> > into a session?
> >>
> >> All the functions defined in it are not anymore reachable (read all
> >> symbols functions are makunbound or defined to some other function).
> >
> > That means someone did unload-feature, right?
> 
> Also loading for example a new version freshly compiled of the same file
> should present the same scenario.

I see a problem in this area.  Consider this code in
native-elisp-load:

  if (!NILP (Fgethash (filename, all_loaded_comp_units_h, Qnil))
      && !file_in_eln_sys_dir (filename)
      && !NILP (Ffile_writable_p (filename)))
    {
      /* If in this session there was ever a file loaded with this
	 name, rename it before loading, to make sure we always get a
	 new handle!  */
      Lisp_Object tmp_filename =
	Fmake_temp_file_internal (filename, Qnil, build_string (".eln.tmp"),
				  Qnil);
      if (NILP (Ffile_writable_p (tmp_filename)))
	comp_u->handle = dynlib_open (SSDATA (encoded_filename));
      else
	{
	  Frename_file (filename, tmp_filename, Qt);
	  comp_u->handle = dynlib_open (SSDATA (ENCODE_FILE (tmp_filename)));
	  Frename_file (tmp_filename, filename, Qnil);
	}

The last 'else' branch momentarily renames the .eln file, then loads
it under the modified name, with the assumption that this would force
dynlib_open to produce a new handle.  But in the case that the same
.eln file is loaded more than once, i.e. the .eln file was not
modified since the previous load, dynlib_open returns the same handle
regardless of the file name, at least on MS-Windows.  (Does this work
as intended on GNU/Linux?)

The problem with returning the same handle is that the refcount of the
handle is incremented, which means unload-feature will be unable to
unload the library.

Is this renaming dance for the case that the .eln file was updated
since the last load, or do we need it even if it wasn't updated?  If
the former, then I guess we could dynlib_close the handle immediately
if we discover that it's identical to the one we had from the previous
load.





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-09 18:20                                                                         ` Eli Zaretskii
@ 2021-03-09 19:23                                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 0 replies; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-09 19:23 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton, pipcet

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: pipcet@gmail.com, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
>> Date: Tue, 09 Mar 2021 17:04:58 +0000
>> 
>> >   else
>> >     *saved_cu = comp_u_lisp_obj;
>> >
>> > But then we clobber the value of comp_u_lisp_obj here:
>> >
>> > 	  data_ephemeral_vec =
>> > 	    load_static_obj (comp_u, TEXT_DATA_RELOC_EPHEMERAL_SYM);
>> >
>> > 	  EMACS_INT d_vec_len = XFIXNUM (Flength (data_ephemeral_vec));
>> > 	  for (EMACS_INT i = 0; i < d_vec_len; i++)
>> > 	    data_eph_relocs[i] = AREF (data_ephemeral_vec, i);  <<<<<<<<<<<
>> >
>> > Is this likely to be due to that problem?
>> 
>> Interesting, how can we clobber the value of 'comp_u_lisp_obj' that is
>> stack allocated while writing into 'data_eph_relocs[i]' that is static
>> allocated in an eln?
>
> I don't know, but the problem disappeared after I rebuild with the
> latest branch, so I guess it was related to the bug fixed in
> 380ba045c4 after all.

Super

Thanks

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-09 18:31                                                                     ` Eli Zaretskii
@ 2021-03-09 19:38                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 0 replies; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-09 19:38 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton, pipcet

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: pipcet@gmail.com, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
>> Date: Tue, 09 Mar 2021 14:55:40 +0000
>> 
>> cc-*.el files for instance have more than one direct call to load.  IIRC
>> one of the analyzed cases was cc-mode related (certanly one I observed).
>> 
>> >> > What are the reasons for unloading a .eln file which was once loaded
>> >> > into a session?
>> >>
>> >> All the functions defined in it are not anymore reachable (read all
>> >> symbols functions are makunbound or defined to some other function).
>> >
>> > That means someone did unload-feature, right?
>> 
>> Also loading for example a new version freshly compiled of the same file
>> should present the same scenario.
>
> I see a problem in this area.  Consider this code in
> native-elisp-load:
>
>   if (!NILP (Fgethash (filename, all_loaded_comp_units_h, Qnil))
>       && !file_in_eln_sys_dir (filename)
>       && !NILP (Ffile_writable_p (filename)))
>     {
>       /* If in this session there was ever a file loaded with this
> 	 name, rename it before loading, to make sure we always get a
> 	 new handle!  */
>       Lisp_Object tmp_filename =
> 	Fmake_temp_file_internal (filename, Qnil, build_string (".eln.tmp"),
> 				  Qnil);
>       if (NILP (Ffile_writable_p (tmp_filename)))
> 	comp_u->handle = dynlib_open (SSDATA (encoded_filename));
>       else
> 	{
> 	  Frename_file (filename, tmp_filename, Qt);
> 	  comp_u->handle = dynlib_open (SSDATA (ENCODE_FILE (tmp_filename)));
> 	  Frename_file (tmp_filename, filename, Qnil);
> 	}
>
> The last 'else' branch momentarily renames the .eln file, then loads
> it under the modified name, with the assumption that this would force
> dynlib_open to produce a new handle.  But in the case that the same
> .eln file is loaded more than once, i.e. the .eln file was not
> modified since the previous load, dynlib_open returns the same handle
> regardless of the file name, at least on MS-Windows.  (Does this work
> as intended on GNU/Linux?)
>
> The problem with returning the same handle is that the refcount of the
> handle is incremented, which means unload-feature will be unable to
> unload the library.

It works because the handle is stored into the new CU object and passed
to 'load_comp_unit'.  'load_comp_unit' will recognize the "re-load"
condition and discard the CU freshly allocated to use the original one
(that is stored in the .eln).  As a consequence the discarded CU will be
GC'd end so the refcounf will be decremented.

> Is this renaming dance for the case that the .eln file was updated
> since the last load, or do we need it even if it wasn't updated?

The renaming dance is for cases like one changes `comp-speed' recompile
and load.

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-09 16:30                                                                   ` Eli Zaretskii
@ 2021-03-10 13:14                                                                     ` Alan Mackenzie
  2021-03-10 13:20                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-10 14:07                                                                       ` Eli Zaretskii
  0 siblings, 2 replies; 179+ messages in thread
From: Alan Mackenzie @ 2021-03-10 13:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton, pipcet, akrl

Hello, Eli.

On Tue, Mar 09, 2021 at 18:30:57 +0200, Eli Zaretskii wrote:
> > Date: Tue, 09 Mar 2021 14:55:57 +0200
> > From: Eli Zaretskii <eliz@gnu.org>
> > Cc: 46256@debbugs.gnu.org, andrewjmoreton@gmail.com, pipcet@gmail.com

> > > From: Andrea Corallo <akrl@sdf.org>
> > > Cc: pipcet@gmail.com, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com
> > > Date: Tue, 09 Mar 2021 07:03:01 +0000

> > > > Can you tell why are we loading the same .eln files more than once?

> > > I guess `load' was called two times on the same filename.

> > Is this likely to happen?  Our code generally uses 'require', which
> > should avoid that.

> Answering my own question here: it can easily happen due to use of
> cc-require in cc-*.el files.  Alan, why does CC mode use this
> technique? what is the purpose of always loading a Lisp file even if
> it was already loaded?

Are you sure?  cc-require is intended just to compile a `require' form
(OK, it compiles (progn nil (require 'cc-vars)), but the byte compiler
will optimise the progn away).

When loading uncompiled cc-*.el, cc-require does fancy things to make
sure the cc-*.el is in the "correct" directory, but it shouldn't compile
any of this into the *.elc.  Maybe there's a bug, somewhere.

The code in this area was written by Martin Stjernholm (my predecessor),
who was evidently having trouble with "wrong" versions of the *.el files
getting loaded.

I've had a bit of a look at the thread for bug #46256, but I can't really
follow it, at least not without a lot of effort.  Might it be that the
..eln compiler is doing things on the .el file?  I'm not at all familiar
with how the native compilation works, I'm afraid.

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-10 13:14                                                                     ` Alan Mackenzie
@ 2021-03-10 13:20                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-10 14:07                                                                       ` Eli Zaretskii
  1 sibling, 0 replies; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-10 13:20 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Eli Zaretskii, andrewjmoreton, 46256, pipcet

Alan Mackenzie <acm@muc.de> writes:

[...]

> I've had a bit of a look at the thread for bug #46256, but I can't really
> follow it, at least not without a lot of effort.  Might it be that the
> ..eln compiler is doing things on the .el file?  I'm not at all familiar
> with how the native compilation works, I'm afraid.

Hi Alan,

I believe should be independent from the native compilation, I'd expect
the same load being performed on a vanilla Emacs.

Thanks

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-10 13:14                                                                     ` Alan Mackenzie
  2021-03-10 13:20                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-10 14:07                                                                       ` Eli Zaretskii
  2021-03-10 15:24                                                                         ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-10 16:56                                                                         ` Alan Mackenzie
  1 sibling, 2 replies; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-10 14:07 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 46256, andrewjmoreton, pipcet, akrl

> Date: Wed, 10 Mar 2021 13:14:16 +0000
> Cc: akrl@sdf.org, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com,
>   pipcet@gmail.com
> From: Alan Mackenzie <acm@muc.de>
> 
> > Answering my own question here: it can easily happen due to use of
> > cc-require in cc-*.el files.  Alan, why does CC mode use this
> > technique? what is the purpose of always loading a Lisp file even if
> > it was already loaded?
> 
> Are you sure?  cc-require is intended just to compile a `require' form
> (OK, it compiles (progn nil (require 'cc-vars)), but the byte compiler
> will optimise the progn away).

We are not talking about compilation, we are talking about loading
cc-* files.  When we process cc-require, we end up loading the
required CC mode file, even though it is already loaded.

> When loading uncompiled cc-*.el, cc-require does fancy things to make
> sure the cc-*.el is in the "correct" directory, but it shouldn't compile
> any of this into the *.elc.  Maybe there's a bug, somewhere.
> 
> The code in this area was written by Martin Stjernholm (my predecessor),
> who was evidently having trouble with "wrong" versions of the *.el files
> getting loaded.
> 
> I've had a bit of a look at the thread for bug #46256, but I can't really
> follow it, at least not without a lot of effort.  Might it be that the
> ..eln compiler is doing things on the .el file?  I'm not at all familiar
> with how the native compilation works, I'm afraid.

Maybe.  Andrea, could you take a look at what happens with cc-require
in the native-comp branch?





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-10 14:07                                                                       ` Eli Zaretskii
@ 2021-03-10 15:24                                                                         ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-10 16:56                                                                         ` Alan Mackenzie
  1 sibling, 0 replies; 179+ messages in thread
From: Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-10 15:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Alan Mackenzie, 46256, andrewjmoreton, pipcet

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Wed, 10 Mar 2021 13:14:16 +0000
>> Cc: akrl@sdf.org, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com,
>>   pipcet@gmail.com
>> From: Alan Mackenzie <acm@muc.de>
>> 
>> > Answering my own question here: it can easily happen due to use of
>> > cc-require in cc-*.el files.  Alan, why does CC mode use this
>> > technique? what is the purpose of always loading a Lisp file even if
>> > it was already loaded?
>> 
>> Are you sure?  cc-require is intended just to compile a `require' form
>> (OK, it compiles (progn nil (require 'cc-vars)), but the byte compiler
>> will optimise the progn away).
>
> We are not talking about compilation, we are talking about loading
> cc-* files.  When we process cc-require, we end up loading the
> required CC mode file, even though it is already loaded.
>
>> When loading uncompiled cc-*.el, cc-require does fancy things to make
>> sure the cc-*.el is in the "correct" directory, but it shouldn't compile
>> any of this into the *.elc.  Maybe there's a bug, somewhere.
>> 
>> The code in this area was written by Martin Stjernholm (my predecessor),
>> who was evidently having trouble with "wrong" versions of the *.el files
>> getting loaded.
>> 
>> I've had a bit of a look at the thread for bug #46256, but I can't really
>> follow it, at least not without a lot of effort.  Might it be that the
>> ..eln compiler is doing things on the .el file?  I'm not at all familiar
>> with how the native compilation works, I'm afraid.
>
> Maybe.  Andrea, could you take a look at what happens with cc-require
> in the native-comp branch?

Yes, today or tomorrow evening I'll try to have a look.

Thanks

  Andrea





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-10 14:07                                                                       ` Eli Zaretskii
  2021-03-10 15:24                                                                         ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-10 16:56                                                                         ` Alan Mackenzie
  2021-03-10 17:43                                                                           ` Eli Zaretskii
  1 sibling, 1 reply; 179+ messages in thread
From: Alan Mackenzie @ 2021-03-10 16:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46256, andrewjmoreton, pipcet, akrl

Hello, Eli.

On Wed, Mar 10, 2021 at 16:07:55 +0200, Eli Zaretskii wrote:
> > Date: Wed, 10 Mar 2021 13:14:16 +0000
> > Cc: akrl@sdf.org, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com,
> >   pipcet@gmail.com
> > From: Alan Mackenzie <acm@muc.de>

> > > Answering my own question here: it can easily happen due to use of
> > > cc-require in cc-*.el files.  Alan, why does CC mode use this
> > > technique? what is the purpose of always loading a Lisp file even if
> > > it was already loaded?

> > Are you sure?  cc-require is intended just to compile a `require' form
> > (OK, it compiles (progn nil (require 'cc-vars)), but the byte compiler
> > will optimise the progn away).

> We are not talking about compilation, we are talking about loading
> cc-* files.

The cc-*.el files?  The cc-*.elc files simply have compiled `require's in
them and shouldn't be reloading already loaded .elc files.  I don't know
anything about the cc-*.eln files.

> When we process cc-require, we end up loading the required CC mode
> file, even though it is already loaded.

Yes, if it is processed while loading the source .el file.  This is a
facility designed for CC Mode hackers, in particular Martin S., whose
working style apparently led to him switching source directories
frequently.  

> > When loading uncompiled cc-*.el, cc-require does fancy things to make
> > sure the cc-*.el is in the "correct" directory, but it shouldn't compile
> > any of this into the *.elc.  Maybe there's a bug, somewhere.

> > The code in this area was written by Martin Stjernholm (my predecessor),
> > who was evidently having trouble with "wrong" versions of the *.el files
> > getting loaded.

> > I've had a bit of a look at the thread for bug #46256, but I can't really
> > follow it, at least not without a lot of effort.  Might it be that the
> > ..eln compiler is doing things on the .el file?  I'm not at all familiar
> > with how the native compilation works, I'm afraid.

> Maybe.  Andrea, could you take a look at what happens with cc-require
> in the native-comp branch?

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 179+ messages in thread

* bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree
  2021-03-10 16:56                                                                         ` Alan Mackenzie
@ 2021-03-10 17:43                                                                           ` Eli Zaretskii
  0 siblings, 0 replies; 179+ messages in thread
From: Eli Zaretskii @ 2021-03-10 17:43 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 46256, andrewjmoreton, pipcet, akrl

> Date: Wed, 10 Mar 2021 16:56:14 +0000
> Cc: akrl@sdf.org, 46256@debbugs.gnu.org, andrewjmoreton@gmail.com,
>   pipcet@gmail.com
> From: Alan Mackenzie <acm@muc.de>
> 
> > We are not talking about compilation, we are talking about loading
> > cc-* files.
> 
> The cc-*.el files?  The cc-*.elc files simply have compiled `require's in
> them and shouldn't be reloading already loaded .elc files.  I don't know
> anything about the cc-*.eln files.

It's not realated to *.eln files, AFAIU.  Emacs was byte-compiling
cc-*.el files.  As part of byte-compiling, we load these files, right?
And when we load them, cc-require causes some CC mode files to be
loaded more than once in the same session.

Perhaps there was some trick there not to do that when we load the
*.elc files instead, and perhaps the compiled code in the
corresponding *.eln files misses that trick.

> > When we process cc-require, we end up loading the required CC mode
> > file, even though it is already loaded.
> 
> Yes, if it is processed while loading the source .el file.  This is a
> facility designed for CC Mode hackers, in particular Martin S., whose
> working style apparently led to him switching source directories
> frequently.  

If this is the intended behavior, fine.  I just was surprised by those
multiple loads and didn't expect them.





^ permalink raw reply	[flat|nested] 179+ messages in thread

end of thread, other threads:[~2021-03-10 17:43 UTC | newest]

Thread overview: 179+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-02 11:11 bug#46256: [feature/native-comp] AOT eln files ignored if run from build tree Andy Moreton
2021-02-03 20:51 ` akrl--- via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-02-04  0:03   ` Andy Moreton
2021-02-04  1:40     ` Andy Moreton
2021-02-05 14:42       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-02-05 20:59         ` Andy Moreton
2021-02-05 23:55       ` Andy Moreton
2021-02-17 22:39         ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-02-18 20:48           ` Andy Moreton
2021-02-18 21:00             ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-02-19  8:02               ` Eli Zaretskii
2021-02-19 14:49                 ` Andy Moreton
2021-02-19 15:28                   ` Eli Zaretskii
2021-02-19 16:01                   ` Andrea Corallo
2021-02-26 20:34                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-02-26 20:45                     ` Eli Zaretskii
2021-02-26 20:48                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-02-26 20:52                         ` Eli Zaretskii
2021-02-27  6:58                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-02-27  7:55                             ` Eli Zaretskii
2021-02-27 12:08                     ` Andy Moreton
2021-02-27 19:14                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-02-27 19:20                         ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-02-27 19:46                         ` Andy Moreton
2021-02-27 21:58                           ` Andy Moreton
2021-02-28 17:35                             ` Eli Zaretskii
2021-02-28 21:15                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-01  5:36                                 ` Eli Zaretskii
2021-03-01  6:34                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-01  9:48                               ` Andy Moreton
2021-03-03 18:27                                 ` Eli Zaretskii
2021-03-03 18:43                                   ` Eli Zaretskii
2021-03-03 19:46                                     ` Eli Zaretskii
2021-03-03 20:04                                       ` Eli Zaretskii
2021-03-03 20:21                                         ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-04  8:30                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-04 11:54                                             ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-04 14:13                                               ` Eli Zaretskii
2021-03-04 14:24                                                 ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-04 14:49                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-04 17:24                                                     ` Eli Zaretskii
2021-03-04 18:56                                                       ` Eli Zaretskii
2021-03-04 20:11                                                         ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-04 21:33                                                           ` Eli Zaretskii
2021-03-05  9:32                                                             ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-05 10:09                                                               ` Pip Cet
2021-03-05 10:19                                                                 ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-06  1:47                                                                   ` Andy Moreton
2021-03-06  9:54                                                                     ` Pip Cet
2021-03-06 10:30                                                                       ` Eli Zaretskii
2021-03-06 12:15                                                                       ` Andy Moreton
2021-03-06 13:10                                                                         ` Eli Zaretskii
2021-03-06 15:18                                                                           ` Andy Moreton
2021-03-06 18:37                                                                             ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-07  9:22                                                                         ` Pip Cet
2021-03-05 11:55                                                               ` Eli Zaretskii
2021-03-05 13:56                                                                 ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-05 14:54                                                                   ` Eli Zaretskii
2021-03-05 15:18                                                                     ` Pip Cet
2021-03-05 15:22                                                                       ` Eli Zaretskii
2021-03-05 15:54                                                                         ` Pip Cet
2021-03-05 18:44                                                                           ` Eli Zaretskii
2021-03-05 15:26                                                                     ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-04 21:30                                                         ` Andy Moreton
2021-03-04 20:47                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-05 13:52                                                     ` Eli Zaretskii
2021-03-05 14:04                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-05 15:00                                                         ` Eli Zaretskii
2021-03-05 15:56                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-05 18:46                                                             ` Eli Zaretskii
2021-03-05 19:22                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-05 20:31                                                                 ` Eli Zaretskii
2021-03-05 22:25                                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-06  7:39                                                                     ` Eli Zaretskii
2021-03-06 14:38                                                                 ` Eli Zaretskii
2021-03-06 15:35                                                                   ` Eli Zaretskii
2021-03-06 17:47                                                                     ` Eli Zaretskii
2021-03-06 18:31                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-06 18:48                                                                         ` Eli Zaretskii
2021-03-06 19:19                                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-06 19:40                                                                             ` Pip Cet
2021-03-06 19:48                                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-06 20:24                                                                                 ` Eli Zaretskii
2021-03-06 20:31                                                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-06 20:53                                                                                     ` Eli Zaretskii
2021-03-06 21:02                                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-07  5:55                                                                                         ` Eli Zaretskii
2021-03-07  6:57                                                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-07  7:40                                                                                             ` Eli Zaretskii
2021-03-07 19:05                                                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-07 18:56                                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-07 19:08                                                                                         ` Eli Zaretskii
2021-03-06 20:08                                                                             ` Eli Zaretskii
2021-03-06 20:19                                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-06 18:30                                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-06 18:44                                                                     ` Eli Zaretskii
2021-03-06 19:21                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-06 20:10                                                                         ` Eli Zaretskii
2021-03-06 20:26                                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-06  0:33                                             ` Andy Moreton
2021-03-06  7:42                                               ` Eli Zaretskii
2021-03-06 12:09                                                 ` Andy Moreton
2021-03-06 13:05                                                   ` Eli Zaretskii
2021-03-06 15:46                                                     ` Andy Moreton
2021-03-06 19:31                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-07 17:59                                     ` Eli Zaretskii
2021-03-07 18:53                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-07 19:15                                         ` Eli Zaretskii
2021-03-07 20:16                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-07 21:27                                             ` Pip Cet
2021-03-07 21:47                                               ` Pip Cet
2021-03-07 21:51                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-07 22:16                                                 ` Pip Cet
2021-03-08 13:26                                                   ` Eli Zaretskii
2021-03-08 13:52                                                     ` Pip Cet
2021-03-08 14:39                                                       ` Eli Zaretskii
2021-03-08 14:50                                                         ` Pip Cet
2021-03-08 15:14                                                           ` Eli Zaretskii
2021-03-08 17:40                                                           ` Eli Zaretskii
2021-03-08  3:31                                                 ` Eli Zaretskii
2021-03-08  5:54                                                   ` Pip Cet
2021-03-08  6:48                                                     ` Pip Cet
2021-03-08 10:14                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-08 10:45                                                         ` Pip Cet
2021-03-08 15:02                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-08 15:09                                                             ` Pip Cet
2021-03-08 15:38                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-09 12:36                                                           ` Eli Zaretskii
2021-03-09  8:32                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-09 13:05                                                         ` Eli Zaretskii
2021-03-09 13:58                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-09 16:36                                                             ` Eli Zaretskii
2021-03-09 17:10                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-08 14:11                                                     ` Eli Zaretskii
2021-03-08 14:27                                                       ` Pip Cet
2021-03-08 18:06                                                         ` Eli Zaretskii
2021-03-08 18:15                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-08 20:37                                                             ` Eli Zaretskii
2021-03-09  7:03                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-09 12:55                                                                 ` Eli Zaretskii
2021-03-09 14:55                                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-09 16:42                                                                     ` Eli Zaretskii
2021-03-09 18:31                                                                     ` Eli Zaretskii
2021-03-09 19:38                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-09 16:30                                                                   ` Eli Zaretskii
2021-03-10 13:14                                                                     ` Alan Mackenzie
2021-03-10 13:20                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-10 14:07                                                                       ` Eli Zaretskii
2021-03-10 15:24                                                                         ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-10 16:56                                                                         ` Alan Mackenzie
2021-03-10 17:43                                                                           ` Eli Zaretskii
2021-03-08 18:18                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-08 18:32                                                             ` Pip Cet
2021-03-08 20:47                                                               ` Eli Zaretskii
2021-03-08 20:50                                                                 ` Pip Cet
2021-03-09  8:28                                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-09  8:35                                                                     ` Pip Cet
2021-03-09  8:43                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-09 14:32                                                                       ` Eli Zaretskii
2021-03-08 20:49                                                             ` Eli Zaretskii
2021-03-09  8:35                                                               ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-09 14:34                                                                 ` Eli Zaretskii
2021-03-09 15:38                                                                   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-09 16:51                                                                     ` Eli Zaretskii
2021-03-09 17:04                                                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-09 18:20                                                                         ` Eli Zaretskii
2021-03-09 19:23                                                                           ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-08 18:13                                                         ` Eli Zaretskii
2021-03-08 20:53                                                           ` Pip Cet
2021-03-08 15:07                                             ` Eli Zaretskii
2021-03-03 18:48                                   ` Eli Zaretskii
2021-03-03 19:28                                     ` Eli Zaretskii
2021-03-03 19:50                                       ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-03 20:08                                         ` Eli Zaretskii
2021-03-03 19:37                                     ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-03 20:13                                       ` Eli Zaretskii
2021-02-28 21:04                             ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-02-05 14:39     ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-02-05 15:08       ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).