From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: =?utf-8?Q?Gerd_M=C3=B6llmann?= Newsgroups: gmane.emacs.devel Subject: Re: MPS: assertion failed: header_type (h) != IGC_OBJ_FWD Date: Sun, 14 Jul 2024 09:00:08 +0200 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="17143"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: Emacs Devel , Helmut Eller To: Pip Cet Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Jul 14 09:01:15 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1sStER-0004Gz-6C for ged-emacs-devel@m.gmane-mx.org; Sun, 14 Jul 2024 09:01:15 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sStDX-00006u-Db; Sun, 14 Jul 2024 03:00:20 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sStDS-00006T-Vq for emacs-devel@gnu.org; Sun, 14 Jul 2024 03:00:15 -0400 Original-Received: from mail-ej1-x631.google.com ([2a00:1450:4864:20::631]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sStDQ-0007ai-Ib for emacs-devel@gnu.org; Sun, 14 Jul 2024 03:00:14 -0400 Original-Received: by mail-ej1-x631.google.com with SMTP id a640c23a62f3a-a77c0b42a8fso670960366b.1 for ; Sun, 14 Jul 2024 00:00:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720940410; x=1721545210; darn=gnu.org; h=content-transfer-encoding:mime-version:user-agent:message-id:date :references:in-reply-to:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GNdFx7EP0XmmQ8tyD82LY2eURne5N2ESyhDB+OKJWVc=; b=hkPm+CNOKA/l5tQlwwTWbXN20V3yWGHNc58cKvuSorxdLLBnpqK+2+JLMB1kgNxnpW WE6VocmkCYi9wwlEecbtVda33hytAFAOJJJPxlqxGS6r147tTbZMSMHRazABWO3Vb7AK rDrFtseLOAa63n3huG9CclnPUYwd+yV0mJDsGeamikg4CSsQe1Yy6Kth/3lZ+Duv2sey 6D3GYQRXVr1R5+8C7TzjBbNzGPcjpwT+/A0FT4MgHUnzkMrd6de6Yoc5TPyNbdbb3Fbv 3KPgqe3muLEomtGaCqmvt2sIBTM0KQrT7fhIQi/LObfytJ5h3hBYIa99iBAUK5Hf3zhC rMMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720940410; x=1721545210; h=content-transfer-encoding:mime-version:user-agent:message-id:date :references:in-reply-to:subject:cc:to:from:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=GNdFx7EP0XmmQ8tyD82LY2eURne5N2ESyhDB+OKJWVc=; b=Gi57ESRYTs/90JhSTo77BIsy/3/q8hjQspmu4Ly2VeNsq/VLe4n8KAU0cEXzjo3nNb 77t54nrXz4qBWzl/nuKyyihENsarlcSlSjH7r9FARqR/sq+KpoyfiYHNJokmDMeltlnD 8KJzcoVwkud8oyc+XGmmMIR/FgvNM5nu146GuqVLLeJ3vBmdeHr8A9QvzRsdBvL1+O0O jKDKXbTAAQ/C7pNap3AND3Tjy0lrif+I636rf6fyssWIJ2upwvtGioI9OltRbesBnCV5 9Bz97FoalWvD4rSTKkpkiM68asZvFtoavP5FqZJOaj98ECf+EerHMvfSJBbPQuESpTcr zDHQ== X-Gm-Message-State: AOJu0YyMdV4M+T8XXmgdqdqxHodl9Hy94nUHglDukio/mKsX/yckcqSQ yFfTahMFx+EWG9KP2ngZ7WAij/P3HA0b9NuG31esWHLsT/Y2Ob1x X-Google-Smtp-Source: AGHT+IFDJltFG8KagCb2PKVgomF6SK2oJi5dmHaIRjt4SUapW0nEtOXrreH4+X/xbBEJDGEEVrOjFQ== X-Received: by 2002:a17:907:3e1d:b0:a75:fe6:fe65 with SMTP id a640c23a62f3a-a799d3a583fmr490601066b.38.1720940410240; Sun, 14 Jul 2024 00:00:10 -0700 (PDT) Original-Received: from pro2.fritz.box (p4fe3a395.dip0.t-ipconnect.de. [79.227.163.149]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a79bc7f1d45sm104756566b.133.2024.07.14.00.00.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Jul 2024 00:00:09 -0700 (PDT) In-Reply-To: (Pip Cet's message of "Sun, 14 Jul 2024 05:30:36 +0000") Received-SPF: pass client-ip=2a00:1450:4864:20::631; envelope-from=gerd.moellmann@gmail.com; helo=mail-ej1-x631.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:321630 Archived-At: Pip Cet writes: > On Sunday, July 14th, 2024 at 04:12, Gerd M=C3=B6llmann wrote: >> I'm seeing this assertion sometimes in an Emacs built with >> --enable-checking=3Digc_debug,igc_check_fwd,w here sometines means it can >> take days of using/running Emacs, or it can take a couple of hours. >> This is macOS 14, arm64. I'm linking with -lmps-debug. > > That means lldb and no core dumps, right? (I've had to work with lldb > to make the Android port work with mps, and it's decidedly Not My > GDB). Yes, no GDB for macOS/arm64. I mean LLDB is okay for the usual debugging, but in this case hm. But I don't know if GDB would do better, either. (One can produce core dumps, BTW, but it's complicated, requires signing and entitlements, and is not really helpful debugging-wise, and the dumps are several GB large.) What I did is enter a loop with sleep(3) instead of asserting, so that one can attach to the process. That works, but with optimizations I can't see much. >> The assertion means that we likely have a reference somewhere that >> isn't traced. Because it isn't traced, the reference isn't changed to >> point to the new location when the object being references is copied >> to a new address in memory. Instead, it points to kind of a tombstone >> that is left behind when the object is moved. > > My approach would be to try to capture it in a debugger, then follow > the forwarding pointer and find out what kind of object the pointer > should be referring to. Exactly. If it's reproducible, one can also remember the hash, and stop when objects with that hash are allocated, to see where they are stored. That worked pretty well in the past, but in this case led to nothing. > I'm thinking, though, about how to increase pressure to flush out such > bugs. Here are some ideas: > > 1. scan xmalloc'd memory for pointers that refer to MPS-managed objects. > 2. reduce generation sizes and increase the number of generations, making= it more likely objects will be copied. > 3. trigger GC regularly while allocating objects > 4. Hack MPS to do something. > 5. Keep a log of forwarded objects and their old/new pointers > (2) and (3) I've tried but no new findings, with (3) as far as it went witout rendering Emacs unbearable interactively. Also tried making specpdl and byte stack ambiguous roots, in their entirety. Slowness but still asserted once, so I guess I can exclude these two, with some probability. > (1) seems the most complete approach but relies on unusual pointer > representations to reduce the number of false positives (and even > then, they might be pinned objects and then the pointer is okay...). > > (2) is easy to do, but impacts performance. (3) is easy to do, but > impacts performance a lot. I'd prefer avoiding (4), and while (5) is > doable it's probably unnecessary: you can just read the forwarding > pointer if it's still there. > >> Alas, I haven't been able to debug this. One problem is that I >> can't reproduce it easily, the other is that is is either not >> happening or happening much less often when building with -O0, and >> without -O0 I can't see much here. > > I assume it's being accessed from Lisp or in an exception handler? > Which optimization options are you using? Just -O. igc_check_fwd runs in places like XCONS and XSYMBOL etc. so it detects the problem as soom as possible, on the Lisp side. I'd say most of the time it's a cons that's the problem, and sometimes a symbol. Maybe it's a cons containing a symbol, or something like that, where sometimes the cons isn't copied yet, and the symbol is, or something. Hard to tell. >> This is just to let people know of the problem. If you find a recipe >> how to reproduce this, please let me know. Or better yet, debug it :-). > > Well, it's possible, but quite unlikely, that it is the > handlerlist_sentinel thing or the Lisp_Mutex->name thing (both fixed). > Much more likely it's another issue. Let's see, I've just transferred you latest commits. Like I mentioned, it can take days for the thing to surface.