From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Emacs Hangs on Filesystem Operations on Stale NFS Date: Mon, 11 Jun 2018 18:51:43 +0300 Message-ID: <83efhdqq28.fsf@gnu.org> References: <1727545582523435cab149c2bc857b40@alexander.shukaev.name> <7466e2d177e79983436af2425ceb5b54@alexander.shukaev.name> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org X-Trace: blaine.gmane.org 1528732236 5630 195.159.176.226 (11 Jun 2018 15:50:36 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 11 Jun 2018 15:50:36 +0000 (UTC) Cc: schwab@suse.de, emacs-devel-bounces+emacs=alexander.shukaev.name@gnu.org, npostavs@gmail.com, emacs-devel@gnu.org To: Alexander Shukaev Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Jun 11 17:50:32 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fSP5T-0001MT-N1 for ged-emacs-devel@m.gmane.org; Mon, 11 Jun 2018 17:50:31 +0200 Original-Received: from localhost ([::1]:49707 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fSP7a-0007pL-I5 for ged-emacs-devel@m.gmane.org; Mon, 11 Jun 2018 11:52:42 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:50755) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fSP6a-0007nT-H1 for emacs-devel@gnu.org; Mon, 11 Jun 2018 11:51:41 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fSP6X-0007Vg-FV for emacs-devel@gnu.org; Mon, 11 Jun 2018 11:51:40 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:46744) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fSP6X-0007VZ-Bp; Mon, 11 Jun 2018 11:51:37 -0400 Original-Received: from [176.228.60.248] (port=2096 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1fSP6W-0003n9-PS; Mon, 11 Jun 2018 11:51:37 -0400 In-reply-to: (message from Alexander Shukaev on Mon, 11 Jun 2018 14:46:35 +0200) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:226221 Archived-At: > Date: Mon, 11 Jun 2018 14:46:35 +0200 > From: Alexander Shukaev > Cc: Emacs-devel , > Noam Postavsky , Emacs developers > > On 2018-06-11 14:40, Andreas Schwab wrote: > > On Jun 11 2018, Alexander Shukaev wrote: > > > >> signal.signal(signal.SIGALRM, alarm_handler) > >> signal.alarm(3) > >> try: > >> proc = subprocess.call('stat ' + path, > >> shell=True, > >> stderr=subprocess.PIPE, > >> stdout=subprocess.PIPE) > >> stdoutdata, stderrdata = proc.communicate() > >> signal.alarm(0) > >> except Alarm: > >> print "Timed out after 3 seconds..." > > > > How do you know that 3 seconds is enough? > > > > Andreas. > > You don't know. You just decide that it's maximum tolerable for > you/your setup/hardware/connection/preferences/whatever, otherwise you > are 99.(9)% sure that something is wrong somewhere with your system, but > you don't give up your Emacs instance for that and rather get indicated > that there might be a potential problem. I think there's more here than meets the eye. Sure, it's quite easy to come up with a toy program that uses SIGALRM to time out a system call that went awry. But Emacs is not a toy program, so doing that has complications, even if we will come up with a suitable number of seconds to wait (which ain't easy, since some I/O calls could really need a long time, or example reading a large file or directory). Here are some complications we should keep in mind: . Emacs already uses SIGALRM for different purposes, see atimer.c. Reusing it for this issue will need some complex logic, to avoid breaking the features that use SIGALRM now. . You tried this with a single 'stat' call, but that's just the tip of the iceberg. Typically, Emacs will need to read a file after it found it readable, and we normally do that in a way that keeps looping as long as the system call was interrupted by signals, see, e.g., emacs_intr_read. Then setting up an alarm clock will not help if 'read' hangs, we will just loop forever. . We usually deliver signals to the main thread, so if the code that hangs happens to run in a non-main thread (recall that Emacs 26 has threads), it will be somewhat tricky, to say the least, to deliver signal there. . Even if we somehow succeed to interrupt the hang by a signal, it's not clear whether it's safe to continue running the session -- there's a reason why we stopped doing non-trivial stuff in signal handlers. It may be that the only sensible thing is to shut down, and in that case, what did we gain, exactly? . This technique is non-portable to MS-Windows. There are probably other complications. All in all, I'd be much happier if we could interrupt such hangs, e.g. by C-g, as Stefan points out (on a TTY frame, this should already be possible in many cases, since C-g there generates SIGINT). But I'm not sure this would be possible in general. Maybe Paul will have some ideas.