From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Po Lu Newsgroups: gmane.emacs.devel Subject: Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal Date: Wed, 21 Dec 2022 09:01:01 +0800 Message-ID: <87o7rxd0z6.fsf@yahoo.com> References: <87edsxfop0.fsf@yahoo.com> <83359dgt7v.fsf@gnu.org> <87a63lfcz7.fsf@yahoo.com> <83y1r5f6lh.fsf@gnu.org> <875ye9f385.fsf@yahoo.com> <83fsddey48.fsf@gnu.org> <871qowgbay.fsf@yahoo.com> <83bko0gact.fsf@gnu.org> <87wn6oesfx.fsf@yahoo.com> <835ye8fweu.fsf@gnu.org> <87sfhcdulj.fsf@yahoo.com> <838rj3ea07.fsf@gnu.org> <87cz8eetvi.fsf@yahoo.com> <835ye6cdh9.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="24520"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Dec 21 02:03:25 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1p7nVz-0006Dv-T0 for ged-emacs-devel@m.gmane-mx.org; Wed, 21 Dec 2022 02:03:24 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1p7nUH-0000Zk-CS; Tue, 20 Dec 2022 20:01:37 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p7nTx-0000Wq-B2 for emacs-devel@gnu.org; Tue, 20 Dec 2022 20:01:25 -0500 Original-Received: from sonic302-20.consmr.mail.ne1.yahoo.com ([66.163.186.146]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1p7nTu-000392-Cp for emacs-devel@gnu.org; Tue, 20 Dec 2022 20:01:17 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1671584471; bh=x2rkcAKzN7g7vVdm3zm6aedngTykBdYXz9ivPivt4kY=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From:Subject:Reply-To; b=Wq5iFoPI6bB64+1nsdoQ2ukOsPAoVtoCn1bDK1RyNApac1hX5gCzmTKxbrmlDmkQ8vhUfOILLEZsJq6//65XdhV9XFngY6M3lRl0WU0tExnlktU5NCu9riy+MYGYMbmMogTDh32TRTGrRJPzVuT0DD89T+pg0UKTrOrO3xgxiIYq3sAppR3VW//BI0W380/Mz5oqfEg8akDPDEC7lSbRAOVfWdFhcSTFBHc9UHYd149LwKOxCqksn/qyNCfCClrdaGELRS4oA+xpKcEti5W/cQp5xsa/qPv0Hqyen5fhIXzszqiPLroN96tcYOFCWHA822Xe3eakEhQgsfgwYGwlLA== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1671584471; bh=LSR9nlI3yaXBvgjZG5dZ2wA0z0AIihiYjUd4V1AOsGc=; h=X-Sonic-MF:From:To:Subject:Date:From:Subject; b=ZvFBX2OuX+o/KkfHXTutOM++vL6C+jUy6F4g025d+cJ3+jYinv27Dvn3X5au+iaE0Y2Ytd/9hXLZS7GGqVISa3W1CEhYjDaBCXIfRRuMNIj5M3ZotTRkAKjlBz3T2UNhOauv/Ucg9gGbk4242Rw2YQxkVDILob62W9ig8NtQUtYTwSpgBP7CA1BVAoRWSm8/sSoSSjbSvuVOiPJ0iTd/Elxj6DNxEpM77CwIO39oeuvkS0kjIjW9igFVNXBzlfmnMaZ1GgOftSvpLE2kISW2JTzYHC8S1EtNdJ9OTbyW6iPs2ANpQHTeKkqsuq+gBQGN0yH159u93KdohQuYWI8iAg== X-YMail-OSG: zFjHCvsVM1kfF_uay8fxTHTWR3flLrJ8U9ul_mgxNI9wdJ2N2P4TFi5MbM4.jGQ GGvEQI.9yLXLSPLf2YC_oxxIEJFROq4GhQk5TpECpy78WXN0oypcHogmfJAof1Cbw6DBvak5pbQM qTiKuIX0DQfkKja85krr4uKCv6vUNr32ddItP1Kpx3wcZS9Diz_GLq8KcaIfedIwrrMgof9bpT9w 5Smv7L1Tgf2Ndq3RABX8Ak5_sPvDA04z3ctCP074t5ZBOuQN8HHTbDPGy8bSgWrtf5vWuffDUWs3 20y.meCz1JINU0i8Abo_Mhl5LGX9bgVRo7i.YTY_xwQQ2SNddV9Kb8vJ97emp0cB8G9RBp77eo.K Z3KHlgYjTppi2kAQ3YsNgggZqgK9nEjVClnYiLU9U394YAXhXNJ.geEq9gYKbnaSYMndgIosmAPr HhIg2Y1CNWmi3XaNS1rGJPzwlnn.DbHR.0qUiDDkWtysoKREGOooBPafXjB1569xSCYXM_R14gXB jjfhh4z_fhMqRBxclU4a6J3thjg7Dgwj.y.Wfw0Rzi7OJWtlAbUsqEPYg9S53uOSYsIEBI6cEtEi e.oCResjF6orb_TiqIxjf.fzmhZx4VzaHHPfmFHZcbeGOSokGsEtY7J24J8yfqKuFsMjGNKlk8yF KlppbVkkVhmIGWGA9bN7xY8GM4kRPcVkl4V9Pjn6dXN.0lRvLtzYIhqCDTCJA0lwhhQ5db_ivdHI 0PHNVpkiZ029MlXaMZSSiimMaS9HXqikfMF9GM4P291P8v_XeWvEq4A7CKmB_6e1Yx7oK0.lU04A f0ulEGJJZICaCXmmo6F5gAAOokG75dxmkSbrQG9U06 X-Sonic-MF: Original-Received: from sonic.gate.mail.ne1.yahoo.com by sonic302.consmr.mail.ne1.yahoo.com with HTTP; Wed, 21 Dec 2022 01:01:11 +0000 Original-Received: by hermes--production-sg3-b666c6484-7jgtw (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID 9b0467df25ea42028663783822a9f388; Wed, 21 Dec 2022 01:01:06 +0000 (UTC) In-Reply-To: <835ye6cdh9.fsf@gnu.org> (Eli Zaretskii's message of "Tue, 20 Dec 2022 17:16:18 +0200") X-Mailer: WebService/1.1.20982 mail.backend.jedi.jws.acl:role.jedi.acl.token.atz.jws.hermes.yahoo Received-SPF: pass client-ip=66.163.186.146; envelope-from=luangruo@yahoo.com; helo=sonic302-20.consmr.mail.ne1.yahoo.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:301717 Archived-At: Eli Zaretskii writes: > That is definitely incorrect, not in the general form you are > presenting it. unblock_input processes input events, and those can > legitimately signal errors. Other places in C call other functions > that can signal. C code which cannot allow control to escape it via > non-local exits must use the record_unwind_protect machinery to set up > suitable unwinders. > > I very much hope that all your work on new X features and in > particular on the new input events and XInput2 was not based on the > above wrong assumption -- that unblock_input can never signal (or > should never signal). > > But we moved far away from the original issue, and are in fact > discussing a very far (and much less important, from my POV) tangent. > Let's not lose our focus, okay? The original issue was whether we can > have code which calls handle_one_xevent and other similar functions, > which can call unblock_input and thus can signal an error, from > threads other than the main thread, or from callbacks we install that > are then called by toolkits from a non-main thread. By contrast, the > past several messages were discussing whether unblock_input is allowed > to signal even if called from the main thread. And that is an > entirely different issue. > > It is a different issue because my point is simple: calls to > handle_one_xevent etc. from non-main threads _cannot_ be made safe in > Emacs, no matter what you do. So it's an absolute no-no. By > contrast, if we call unblock_input from the main thread, and the code > which calls it doesn't expect signals, all we need is either to > restructure the caller, or use record_unwind_protect. Thus, even if > you are absolutely right about all of the cases you show below -- and > I'm not convinced you are right in all of them, but even if you are -- > even if all of them are indeed unsafe in the face of Lisp errors, all > it means that we must examine them one by one and make them safe by > one of the above mentioned techniques. > > So I would like to return to the main issue: calls to > handle_one_xevent from non-main threads. That must not happen in > Emacs, or else the corresponding build will be fragile and can crash > given the opportune conditions. If you agree with that, let's discuss > how to make those configurations safer. If you don't agree, please > explain why. > >> Being called from the Lisp thread does not make it safe to signal! > > Not by itself, it doesn't. But it definitely makes it fixable, so > that it _can_ be safe. By contrast, calling the Lisp machine from > another thread _cannot_ be fixed. It's a time bomb that will > definitely go off. > >> If the last unblock_input signals, match (allocated by FcFontMatch) >> leaks. > > Yes, code which doesn't expect errors might leak descriptors or memory > or other resources. But resource leaks are very rarely fatal, unless > you leak a lot of them: modern machines have enough memory and file > descriptors to let us leak them for many days before disaster > strikes. By contrast, a single error signaled from the wrong thread > will immediately crash Emacs because it longjmp's to the wrong stack! > >> >> > You cannot request that. note_mouse_highlight examines properties, >> >> > and that can always signal, because properties are Lisp creatures and >> >> > can invoke Lisp. >> >> >> >> Could you please explain how? >> > >> > Which part is unclear? >> >> Which properties involve Lisp execution, and why the evaluation of that >> Lisp cannot be put inside internal_catch_all. > > You've just seen that: we examine text properties and overlays in a > buffer whose restriction could have changed behind our backs, what > with today's proliferation of calls into Lisp from very deep and > low-level functions. We call Lisp in keyboard.c and in xdisp.c and in > insdel.c and in window.c. > >> > These calls will all have to go, then. Maybe we should use some >> > simplified, stripped down version of handle_one_xevent, which doesn't >> > invoke processing that can access Lisp data and the global state. Why >> > would a toolkit menu widget need to call note_mouse_highlight, which >> > is _our_ function that implements _our_ features, about which GTK (or >> > any other toolkit) knows absolutely nothing?? >> >> The toolkit menu mostly demands that Emacs set the cursor to the >> appropriate cursor for whatever the pointer is under. > > I cannot parse this: "set the cursor to the appropriate cursor"? > >> The X server can also demand Emacs expose parts of the frame at any >> time, due to window configuration changes. > > We already handle this, by setting a flag in the SIGIO handler, and > then calling expose_frame when it is safe. What are the problems you > see here if we queue these events instead of processing them > immediately? > >> > If it's impossible to do safely, it will not work. It already doesn't >> > work in popup menus on w32, for similar reasons. That's a minor >> > inconvenience to users, and a much smaller catastrophe than making >> > these unsafe calls. >> >> But it worked since the beginning of 2006 or so, when xmenu.c was made >> to call handle_one_xevent. > > "Worked" in the sense that we maybe didn't hear about problems, or > heard too few of them? That doesn't mean the problems don't exist, > just that people don't come immediately running here and complain. > > Besides, PGTK is very young, definitely not around since 2006. Give > it some time. > >> Anyway, the "minor inconvenience" can be avoided by simply wrapping >> note_mouse_highlight in internal_catch_all. Why do you think that >> would be too harsh? > > You cannot use that machinery reliably in a non-main thread anyway. > For starters, you have no idea what kind of stack do those threads > get. They aren't our threads, so we don't know anything about them. > Using internal_catch_all would mean we'd need to run unwinders, and > there's no guarantee they could be run on those other threads. For > example, they cannot safely access the state of the Lisp machine, > because they run on other threads asynchronically. > > Besides, note_mouse_highlight is called from many places that don't > need any internal_catch_all. If some caller cannot allow non-local > exits, it should call internal_catch_all by itself. But that's > another tangent. > >> > And I responded saying that there should be no "mouse stuck" because >> > both XI_Motion and XI_Leave will be queued and processed in the order >> > they arrived. What does this have to do with the safety of calling >> > clear_mouse_face, or lack thereof? >> > >> > I'm saying that there's no need to process these events immediately, >> > when the event calls into the Lisp machine. We can, and in many cases >> > already do, queue the event and process it a bit later, and in a >> > different, safer, context. >> >> But then as a result the mouse face will not be dismissed when entering >> a menu. > > Again, I don't understand: we don't display mouse-face on menus, only > on buffer text, on the mode line, and on the tool bar (if we implement > it in Emacs, not use the toolkit's tool bar). > >> And we will also lose the valuable development feature of using >> mouse-high to determine how wedged a stuck Emacs really is. > > Now you lost me. What is that feature, and how is it related? > >> > But it's already "in keyboard.c"! See the part that begins with >> > >> > /* Try generating a mouse motion event. */ >> > else if (some_mouse_moved ()) >> > >> > which ends with >> > >> > if (!NILP (x) && NILP (obj)) >> > obj = make_lispy_movement (f, bar_window, part, x, y, t); >> > >> > Then we process these "lispy movements" as input. >> >> That isn't related to mouse-highlight, and only happens when track-mouse >> is non-nil. > > Those are details. My point is that we do process _some_ mouse events > in the main loop, and it works. > >> > Not normally, no. But if some Lisp program sets bad text properties >> > or overlays, it could signal. We can never assume in Emacs that some >> > Lisp machine code never signals. >> >> Then the simple solution would be to catch those signals inside >> note_mouse_highlight. What could be the problem with that? > > It's wrong, that's the problem. > > And again, this is a tangent. The main issue is that this code cannot > be called from a non-main thread, period. Let's focus on that and > solve those problems first. You made me very worried by telling we do > this nonchalantly and for a long time. There is some kind of misunderstanding here. You seem to be talking about calling Lisp from another thread, which is a big no-no in my book as well. However, the problem is nowhere near as drastic. Under the GTK builds, there is only a single thread. The event loop runs from the main thread. Those calls to note_mouse_highlight *are* being done from the main thread. The problem is that it is unsafe to signal in the main thread when handle_one_xevent is being called by GLib, because GLib does the equivalent of this: static bool inside_callback; assert (!inside_callback); inside_callback = true; [call handle_one_xevent] inside_callback = false; If handle_one_xevent signals, then inside_callback will never be set to false. As a result, the next time GLib enters its own event dispatch code, it will abort, leading to the crash seen here. No second thread is involved!