From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Andrea Corallo Newsgroups: gmane.emacs.devel Subject: Re: MPS: Please check if scratch/igc builds with native compilation Date: Tue, 21 May 2024 13:57:02 -0400 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="15230"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: Emacs Devel , Eli Zaretskii , Helmut Eller To: Gerd =?utf-8?Q?M=C3=B6llmann?= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue May 21 19:57:59 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1s9TkM-0003dY-HL for ged-emacs-devel@m.gmane-mx.org; Tue, 21 May 2024 19:57:58 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s9TjV-0006q4-MB; Tue, 21 May 2024 13:57:05 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s9TjT-0006op-Md for emacs-devel@gnu.org; Tue, 21 May 2024 13:57:03 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s9TjT-0006Ko-DO; Tue, 21 May 2024 13:57:03 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:Date:References:In-Reply-To:Subject:To: From; bh=bcmXKpWra1KET0/1MSnW1Qn70KwwwGmtiIMGnQbH5Kc=; b=UxzuITJnC+4mYyEdTygH 01mAa4WrFyfJruGkX0NXgF4yiGI65I3V344wU6HVYd9wOhDsiwdH2S+RNyrAJyDwv+dsL2ArAM3lq VWm14/Bb+dkDZBq4jvCimRMGU5VCa3YJLV11UpOD8/ZLuneVlgC7dLz4GUjvs4scU1v2MKl9Cw5X2 ZsRVXNr5h3o3x4D8zF8bYWaOXX81cD7hplt2tqgrNNLEN/qGED/QVM7QGg0dlmbHRxywIekAgJ0P+ Px5xxHGHuQ2P/c0wmBHIMh1FrzV9N7F9Pi62oOZoEPURBEojxwlg2Lz9AJTjMTAFRDNE5GEy/YEn2 Cxr88TM9yM8Lfg==; Original-Received: from acorallo by fencepost.gnu.org with local (Exim 4.90_1) (envelope-from ) id 1s9TjS-0004ON-UY; Tue, 21 May 2024 13:57:02 -0400 In-Reply-To: ("Gerd =?utf-8?Q?M=C3=B6llman?= =?utf-8?Q?n=22's?= message of "Tue, 21 May 2024 19:06:42 +0200") X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:319448 Archived-At: Gerd M=C3=B6llmann writes: > Here's something about my debugging attenpts so far: > > I'm throwing the towel wrt to native compilation with MPS on macOS. > Which makes it a failure for me. > > The situation is as follows: > > When building with native compilation with --enable-checking=3Dall, I am > observing errors of the form > > igc.c:345: Emacs fatal error: assertion failed: h->obj_type !=3D IGC_OB= J_FWD > > when compiling Lisp files, for example > > ELC+ELN ../lisp/international/mule-cmds.elc > ELC+ELN ../lisp/files.elc > > What file triggers the error is not predictable, and it is not > reproducible when running under LLDB, with or without ASLR. At least here the error seems reproducible. Bootstrapping with -j1 makes native compiling leim/ja-dic/ja-dic.el always fail. And if I run it under gdb I see we get a SIGSEGV in 'maybe_resize_hash_table' at fns.c:4987 memcpy (key, h->key, old_size * sizeof *key); with the following bt (gdb) bt #0 maybe_resize_hash_table (h=3D0x7fffe7dabd48) at fns.c:4987 #1 hash_put (h=3D0x7fffe7dabd48, key=3DXIL(0x7fffe4fc297b), value=3DXIL(0x= 30), hash=3D1644298) at fns.c:5162 #2 0x0000555555817fc0 in Fputhash (key=3DXIL(0x7fffe4fc297b), value=3DXIL(= 0x30), table=3D) at fns.c:5993 #3 0x00007ffff14f6313 in F627974652d72756e2d2d73747269702d6c697374_byte_ru= n__strip_list_0 () at /home/andcor03/emacs4/src/../native-lisp/30.0.50-00c2= e4a4/preloaded/byte-run-79ff048e-d52588ab.eln #4 0x00005555557fdbac in Ffuncall (nargs=3D2, args=3D0x7fffffffc010) at ev= al.c:3032 #5 0x00007ffff14f6476 in F627974652d72756e2d2d73747269702d6c697374_byte_ru= n__strip_list_0 () at /home/andcor03/emacs4/src/../native-lisp/30.0.50-00c2= e4a4/preloaded/byte-run-79ff048e-d52588ab.eln #6 0x00005555557fdbac in Ffuncall (nargs=3D2, args=3D0x7fffffffc0d0) at ev= al.c:3032 #7 0x00007ffff14f6476 in F627974652d72756e2d2d73747269702d6c697374_byte_ru= n__strip_list_0 () at /home/andcor03/emacs4/src/../native-lisp/30.0.50-00c2= e4a4/preloaded/byte-run-79ff048e-d52588ab.eln #8 0x00005555557fdbac in Ffuncall (nargs=3D2, args=3D0x7fffffffc190) at ev= al.c:3032 #9 0x00007ffff14f6476 in F627974652d72756e2d2d73747269702d6c697374_byte_ru= n__strip_list_0 () at /home/andcor03/emacs4/src/../native-lisp/30.0.50-00c2= e4a4/preloaded/byte-run-79ff048e-d52588ab.eln #10 0x00005555557fdbac in Ffuncall (nargs=3D2, args=3D0x7fffffffc250) at ev= al.c:3032 #11 0x00007ffff14f6476 in F627974652d72756e2d2d73747269702d6c697374_byte_ru= n__strip_list_0 () at /home/andcor03/emacs4/src/../native-lisp/30.0.50-00c2= e4a4/preloaded/byte-run-79ff048e-d52588ab.eln #12 0x00005555557fdbac in Ffuncall (nargs=3D2, args=3D0x7fffffffc310) at ev= al.c:3032 #13 0x00007ffff14f6476 in F627974652d72756e2d2d73747269702d6c697374_byte_ru= n__strip_list_0 () at /home/andcor03/emacs4/src/../native-lisp/30.0.50-00c2= e4a4/preloaded/byte-run-79ff048e-d52588ab.eln #14 0x00005555557fdbac in Ffuncall (nargs=3D2, args=3D0x7fffffffc3d0) at ev= al.c:3032 #15 0x00007ffff14f6476 in F627974652d72756e2d2d73747269702d6c697374_byte_ru= n__strip_list_0 () at /home/andcor03/emacs4/src/../native-lisp/30.0.50-00c2= e4a4/preloaded/byte-run-79ff048e-d52588ab.eln #16 0x00005555557fdbac in Ffuncall (nargs=3D2, args=3D0x7fffffffc490) at ev= al.c:3032 #17 0x00007ffff14f6476 in F627974652d72756e2d2d73747269702d6c697374_byte_ru= n__strip_list_0 () at /home/andcor03/emacs4/src/../native-lisp/30.0.50-00c2= e4a4/preloaded/byte-run-79ff048e-d52588ab.eln #18 0x00005555557fdbac in Ffuncall (nargs=3D2, args=3D0x7fffffffc550) at ev= al.c:3032 #19 0x00007ffff14f6476 in F627974652d72756e2d2d73747269702d6c697374_byte_ru= n__strip_list_0 () at /home/andcor03/emacs4/src/../native-lisp/30.0.50-00c2= e4a4/preloaded/byte-run-79ff048e-d52588ab.eln #20 0x00005555557fdbac in Ffuncall (nargs=3D2, args=3D0x7fffffffc610) at ev= al.c:3032 #21 0x00007ffff14f6476 in F627974652d72756e2d2d73747269702d6c697374_byte_ru= n__strip_list_0 () at /home/andcor03/emacs4/src/../native-lisp/30.0.50-00c2= e4a4/preloaded/byte-run-79ff048e-d52588ab.eln #22 0x00005555557fdbac in Ffuncall (nargs=3D2, args=3D0x7fffffffc6d0) at ev= al.c:3032 #23 0x00007ffff14f6476 in F627974652d72756e2d2d73747269702d6c697374_byte_ru= n__strip_list_0 () at /home/andcor03/emacs4/src/../native-lisp/30.0.50-00c2= e4a4/preloaded/byte-run-79ff048e-d52588ab.eln #24 0x00005555557fdbac in Ffuncall (nargs=3D2, args=3D0x7fffffffc760) at ev= al.c:3032 #25 0x00007ffff14f692c in F627974652d72756e2d73747269702d73796d626f6c2d706f= 736974696f6e73_byte_run_strip_symbol_positions_0 () [...] Which is admittedly different to what I saw from command line. > To debug this, I changed the check in igc.c to not assert, but print > the PID, and enter an endless loop sleeping. This makes it possible to > attach to the process with LLDB. > > In all cases I investigated in this way, I'm seeing a pattern: What is > happening is that a function in the Emacs core is called from a > native-compiled function. Things look like, simplified, > > /* In some .eln */ > Lisp_Object d_reloc[100]; > > Lisp_Object some_native_compiled_lisp_function () > { > Lisp_Object frame[2]; > frame[0] =3D d_reloc[17]; // some symbol > frame[1] =3D ... > f_reloc->funcall (2, frame); > } > > where f_reloc is a large struct with function pointer members for > function being called from the .eln. Doesn't matter. We then land in > Ffuncall in the Emacs core, and the first element of its args vector, > a symbol, is found to be forwarded which leads to the assertion. > > d_reloc in the .eln is scanned in igc.c, and it being on the control > stack, in frame[], or in a register, should pin it, one would assume. > So how comes Ffuncall in Emacs receives an invalid symbol? > > I've checked that d_reloc is indeed scanned by fix_comp_unit. The > check gives me reasonable confidence that this "should work". But as > an alternative, I also made all the things like d_reloc in the .elns > ambiguous roots, so that they cannot possibly be moved, if all works as > expected. > > - No change, it still asserts in the same way. > > - Changing optimization levels - no change. > - Changing from arm64 to x86_64 - no change. That's very bizarre, I've hard time believing we are hitting such a bug :/ Hope we are missing something. Andrea