From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: =?utf-8?Q?Gerd_M=C3=B6llmann?= Newsgroups: gmane.emacs.devel Subject: Re: MPS: Please check if scratch/igc builds with native compilation Date: Tue, 21 May 2024 20:09:50 +0200 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="8631"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: Emacs Devel , Eli Zaretskii , Helmut Eller To: Andrea Corallo Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue May 21 20:11:01 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1s9Tww-0001uw-BX for ged-emacs-devel@m.gmane-mx.org; Tue, 21 May 2024 20:10:58 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s9Tvz-0002Kh-FQ; Tue, 21 May 2024 14:09:59 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s9Tvy-0002KW-A5 for emacs-devel@gnu.org; Tue, 21 May 2024 14:09:58 -0400 Original-Received: from mail-lf1-x12a.google.com ([2a00:1450:4864:20::12a]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1s9Tvw-0008WG-E5; Tue, 21 May 2024 14:09:58 -0400 Original-Received: by mail-lf1-x12a.google.com with SMTP id 2adb3069b0e04-51f0b6b682fso6169298e87.1; Tue, 21 May 2024 11:09:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1716314994; x=1716919794; darn=gnu.org; h=mime-version:user-agent:message-id:date:references:in-reply-to :subject:cc:to:from:from:to:cc:subject:date:message-id:reply-to; bh=KmDk69jqh6xqVpYJlqzIXQr0ZYYp9c+K7OVhQYoTgyY=; b=a6QgcQG9QB/vp6ZNn2rBH4a37uzvmBxx0w3Ri4SpxPtZ7jES0uRstDrfwmrekws0zC 1tWzHUgWvKppNaRQXtK0DiZlJyg1JAelhOt2kf999tSNUD0xaaI25lN8xy1VzixJEF6Z x7AHlDy09902JRyu1KcZOtE/DXdfId+UyXsFr8DM6rAXqL/6cD+fQfu8ZUxQoYnMFSIJ VC2u32Ee8NeMIIDspDCheNhJ1/E3JeLuoc2BFRwqj5w3uTrC/1jf+Bd7FZ0IVqle+erO FRvtFUWeeVjK7Gn85zPylb9W4Uy2zh0GNdxiNPENoU6MI6Ogv3iGak8TirOH+D+6H3gR yGfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716314994; x=1716919794; h=mime-version:user-agent:message-id:date:references:in-reply-to :subject:cc:to:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=KmDk69jqh6xqVpYJlqzIXQr0ZYYp9c+K7OVhQYoTgyY=; b=T2F68drPVOuCTdPyiSgW+Q2tKGhJZa29n0PWLF9UruOoc4fscWXObKUv+GIxTrk2gM C12SOimtvAJA4OY70XRpS/gsMsWZLiuNeT3H/w/3I4hSqm0NonkeG8QaJuK8D57uNVnZ owIKkBw3HIZs2IJNgAfOeWUVp8jeD80Hi9cYZDDKDRWzLn5zFbL+0l8icgV2tcS6skaz Z1fWxS4a6rgl2v2ER8/r0T5FPBLiamhYq8J0GRFyi5QDdjvrds2/BxKPRoa4VHPKxFWq W7iyT/lbTefZI2b4J/EYsTH5oGkPezcQBbOcbirDZt/fGH3Xg0lFE8DeFUhWpL48WXPh /atA== X-Forwarded-Encrypted: i=1; AJvYcCXTiGpxE30IP3klwXKZvCwUTTOcFf8KHoK6bIXQ843EEsfU9xRFJQSijcp0atZdjyGNntYJpQ5pv289z7o= X-Gm-Message-State: AOJu0Yzy2Z3NSt2rEKpH/rUMriT5zxy1ChLvdJuvDBtmo54OdVwFE5lM 3TiOKDwOfs1LzVe1N2UTDShwfaYSNExZemxUCW9MEYpQ0UiUZjI8 X-Google-Smtp-Source: AGHT+IF6bD0mOu+DXM9WOqmxsKB9Wx1grbxpfe+gYbXuzWKZhudWIKFITN2E6j3kDr5fSNVzGshvAw== X-Received: by 2002:ac2:5dd3:0:b0:518:c610:754f with SMTP id 2adb3069b0e04-5220fc7bf01mr26849542e87.19.1716314993314; Tue, 21 May 2024 11:09:53 -0700 (PDT) Original-Received: from pro2.fritz.box (pd9e36251.dip0.t-ipconnect.de. [217.227.98.81]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a5a17b01932sm1633485666b.168.2024.05.21.11.09.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 May 2024 11:09:52 -0700 (PDT) In-Reply-To: (Andrea Corallo's message of "Tue, 21 May 2024 13:57:02 -0400") Received-SPF: pass client-ip=2a00:1450:4864:20::12a; envelope-from=gerd.moellmann@gmail.com; helo=mail-lf1-x12a.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:319449 Archived-At: Andrea Corallo writes: > At least here the error seems reproducible. Bootstrapping with -j1 > makes native compiling leim/ja-dic/ja-dic.el always fail. > > And if I run it under gdb I see we get a SIGSEGV in > 'maybe_resize_hash_table' at fns.c:4987 > > memcpy (key, h->key, old_size * sizeof *key); That's a new one for me. Maybe you are hitting a read/write barrier? I think Eli & Helmut can help here with what to do for the signals in GDB. (On macOS, MPS is using Mach exceptions, not signals.) > > with the following bt > > (gdb) bt > #0 maybe_resize_hash_table (h=0x7fffe7dabd48) at fns.c:4987 > #1 hash_put (h=0x7fffe7dabd48, key=XIL(0x7fffe4fc297b), value=XIL(0x30), hash=1644298) at fns.c:5162 > #2 0x0000555555817fc0 in Fputhash (key=XIL(0x7fffe4fc297b), value=XIL(0x30), table=) at fns.c:5993 > #3 0x00007ffff14f6313 in F627974652d72756e2d2d73747269702d6c697374_byte_run__strip_list_0 () at /home/andcor03/emacs4/src/../native-lisp/30.0.50-00c2e4a4/preloaded/byte-run-79ff048e-d52588ab.eln > #4 0x00005555557fdbac in Ffuncall (nargs=2, args=0x7fffffffc010) at eval.c:3032 > #5 0x00007ffff14f6476 in F627974652d72756e2d2d73747269702d6c697374_byte_run__strip_list_0 () at /home/andcor03/emacs4/src/../native-lisp/30.0.50-00c2e4a4/preloaded/byte-run-79ff048e-d52588ab.eln > #6 0x00005555557fdbac in Ffuncall (nargs=2, args=0x7fffffffc0d0) at eval.c:3032 > #7 0x00007ffff14f6476 in F627974652d72756e2d2d73747269702d6c697374_byte_run__strip_list_0 () at /home/andcor03/emacs4/src/../native-lisp/30.0.50-00c2e4a4/preloaded/byte-run-79ff048e-d52588ab.eln > #8 0x00005555557fdbac in Ffuncall (nargs=2, args=0x7fffffffc190) at eval.c:3032 > #9 0x00007ffff14f6476 in F627974652d72756e2d2d73747269702d6c697374_byte_run__strip_list_0 () at /home/andcor03/emacs4/src/../native-lisp/30.0.50-00c2e4a4/preloaded/byte-run-79ff048e-d52588ab.eln > #10 0x00005555557fdbac in Ffuncall (nargs=2, args=0x7fffffffc250) at eval.c:3032 > #11 0x00007ffff14f6476 in F627974652d72756e2d2d73747269702d6c697374_byte_run__strip_list_0 () at /home/andcor03/emacs4/src/../native-lisp/30.0.50-00c2e4a4/preloaded/byte-run-79ff048e-d52588ab.eln > #12 0x00005555557fdbac in Ffuncall (nargs=2, args=0x7fffffffc310) at eval.c:3032 > #13 0x00007ffff14f6476 in F627974652d72756e2d2d73747269702d6c697374_byte_run__strip_list_0 () at /home/andcor03/emacs4/src/../native-lisp/30.0.50-00c2e4a4/preloaded/byte-run-79ff048e-d52588ab.eln > #14 0x00005555557fdbac in Ffuncall (nargs=2, args=0x7fffffffc3d0) at eval.c:3032 > #15 0x00007ffff14f6476 in F627974652d72756e2d2d73747269702d6c697374_byte_run__strip_list_0 () at /home/andcor03/emacs4/src/../native-lisp/30.0.50-00c2e4a4/preloaded/byte-run-79ff048e-d52588ab.eln > #16 0x00005555557fdbac in Ffuncall (nargs=2, args=0x7fffffffc490) at eval.c:3032 > #17 0x00007ffff14f6476 in F627974652d72756e2d2d73747269702d6c697374_byte_run__strip_list_0 () at /home/andcor03/emacs4/src/../native-lisp/30.0.50-00c2e4a4/preloaded/byte-run-79ff048e-d52588ab.eln > #18 0x00005555557fdbac in Ffuncall (nargs=2, args=0x7fffffffc550) at eval.c:3032 > #19 0x00007ffff14f6476 in F627974652d72756e2d2d73747269702d6c697374_byte_run__strip_list_0 () at /home/andcor03/emacs4/src/../native-lisp/30.0.50-00c2e4a4/preloaded/byte-run-79ff048e-d52588ab.eln > #20 0x00005555557fdbac in Ffuncall (nargs=2, args=0x7fffffffc610) at eval.c:3032 > #21 0x00007ffff14f6476 in F627974652d72756e2d2d73747269702d6c697374_byte_run__strip_list_0 () at /home/andcor03/emacs4/src/../native-lisp/30.0.50-00c2e4a4/preloaded/byte-run-79ff048e-d52588ab.eln > #22 0x00005555557fdbac in Ffuncall (nargs=2, args=0x7fffffffc6d0) at eval.c:3032 > #23 0x00007ffff14f6476 in F627974652d72756e2d2d73747269702d6c697374_byte_run__strip_list_0 () at /home/andcor03/emacs4/src/../native-lisp/30.0.50-00c2e4a4/preloaded/byte-run-79ff048e-d52588ab.eln > #24 0x00005555557fdbac in Ffuncall (nargs=2, args=0x7fffffffc760) at eval.c:3032 > #25 0x00007ffff14f692c in F627974652d72756e2d73747269702d73796d626f6c2d706f736974696f6e73_byte_run_strip_symbol_positions_0 () > [...] > > Which is admittedly different to what I saw from command line. > >> To debug this, I changed the check in igc.c to not assert, but print >> the PID, and enter an endless loop sleeping. This makes it possible to >> attach to the process with LLDB. >> >> In all cases I investigated in this way, I'm seeing a pattern: What is >> happening is that a function in the Emacs core is called from a >> native-compiled function. Things look like, simplified, >> >> /* In some .eln */ >> Lisp_Object d_reloc[100]; >> >> Lisp_Object some_native_compiled_lisp_function () >> { >> Lisp_Object frame[2]; >> frame[0] = d_reloc[17]; // some symbol >> frame[1] = ... >> f_reloc->funcall (2, frame); >> } >> >> where f_reloc is a large struct with function pointer members for >> function being called from the .eln. Doesn't matter. We then land in >> Ffuncall in the Emacs core, and the first element of its args vector, >> a symbol, is found to be forwarded which leads to the assertion. >> >> d_reloc in the .eln is scanned in igc.c, and it being on the control >> stack, in frame[], or in a register, should pin it, one would assume. >> So how comes Ffuncall in Emacs receives an invalid symbol? >> >> I've checked that d_reloc is indeed scanned by fix_comp_unit. The >> check gives me reasonable confidence that this "should work". But as >> an alternative, I also made all the things like d_reloc in the .elns >> ambiguous roots, so that they cannot possibly be moved, if all works as >> expected. >> >> - No change, it still asserts in the same way. >> >> - Changing optimization levels - no change. >> - Changing from arm64 to x86_64 - no change. > > That's very bizarre, I've hard time believing we are hitting such a bug :/ > Hope we are missing something. Yes, bizarre is a good description. I'm out of ideas.