From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Alexander Shukaev Newsgroups: gmane.emacs.devel Subject: Re: Emacs Hangs on Filesystem Operations on Stale NFS Date: Mon, 11 Jun 2018 13:55:51 +0200 Message-ID: <4daff1aff27be64eeb1c7bff45d86e0b@alexander.shukaev.name> References: <1727545582523435cab149c2bc857b40@alexander.shukaev.name> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit X-Trace: blaine.gmane.org 1528718067 2964 195.159.176.226 (11 Jun 2018 11:54:27 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 11 Jun 2018 11:54:27 +0000 (UTC) User-Agent: Roundcube Webmail/1.1.2 Cc: Emacs-devel To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Jun 11 13:54:23 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fSLOw-0000eE-Od for ged-emacs-devel@m.gmane.org; Mon, 11 Jun 2018 13:54:22 +0200 Original-Received: from localhost ([::1]:47891 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fSLR3-0001L8-Ig for ged-emacs-devel@m.gmane.org; Mon, 11 Jun 2018 07:56:33 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:40671) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fSLQW-0001Ky-60 for emacs-devel@gnu.org; Mon, 11 Jun 2018 07:56:01 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fSLQU-0004X5-VR for emacs-devel@gnu.org; Mon, 11 Jun 2018 07:56:00 -0400 Original-Received: from relay4-d.mail.gandi.net ([217.70.183.196]:37667) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fSLQQ-0004Nd-2V; Mon, 11 Jun 2018 07:55:54 -0400 Original-Received: from webmail.gandi.net (unknown [10.200.201.3]) (Authenticated sender: forum@alexander.shukaev.name) by relay4-d.mail.gandi.net (Postfix) with ESMTPA id 245E8E001B; Mon, 11 Jun 2018 11:55:51 +0000 (UTC) In-Reply-To: <1727545582523435cab149c2bc857b40@alexander.shukaev.name> X-Sender: emacs@alexander.shukaev.name X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 217.70.183.196 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:226193 Archived-At: On 2018-06-11 12:27, Alexander Shukaev wrote: > Hi Everyone, > > > I initiated a discussion back in 2015 [1] about fragility of Emacs in > terms of filesystem operations on stale NFS. No solution actually > came out of this discussion. I still find this issue very disruptive. > Yet another example would be `recentf-cleanup' which is in my case > triggered on Emacs start up, when the file comes from stale NFS, the > corresponding `file-readable-p' down the stack will hang indefinitely, > and there would be no way to unfreeze it apart from issuing 'kill -9' > to that Emacs instance. Don't you people find it unacceptable for the > daily usage? Well, I do. Such hangs always disrupt daily work and > require quite some time to track them down as they are not > Lisp-debuggable with e.g. in a straightforward way (these are > dead hangs from C code, where even attaching a GDB does not work). > > Well, enough rant. I think I have a proposal how to fix the issue, > even given the blocking nature of Emacs. How about introducing a > variable `file-access-timeout' defaulting to `nil', which would > reflect a configurable timeout for all access operations (such as > `file-readable-p')? This would be achieved via `SIGALARM' in the C > code, which would protect every such operation. For example, > > #include > #include > #include > #include > > static void alarm_handler(int sig) > { > return; > } > > int emacs_stat(const char* path, struct stat* s, unsigned int seconds) > { > struct sigaction newact; > struct sigaction oldact; > > memset(&newact, 0, sizeof(newact)); > memset(&oldact, 0, sizeof(oldact)); > > sigemptyset(&newact.sa_mask); > > newact.sa_flags = 0; > newact.sa_handler = alarm_handler; > sigaction(SIGALRM, &newact, &oldact); > > alarm(seconds); > > errno = 0; > const int rc = stat(path, s); > const int saved_errno = errno; > > alarm(0); > sigaction(SIGALRM, &oldact, NULL); > > errno = saved_errno; > return rc; > } > > where `seconds' should be initialized with the value of > `file-access-timeout'. The cool advantage of this that I see is that > one can then also selectively `let'-bind different values for > `file-access-timeout', thus having total control over the use cases in > which one wants to protect oneself from indefinite hangs. > > Kind regards, > Alexander > > [1] > https://lists.gnu.org/archive/html/help-gnu-emacs/2015-11/msg00251.html A couple of more ideas: - I think it is reasonable to actually signal a dedicated error in case of the timeout so that API consumers can handle it accordingly to their needs. - It might be worth to also factor this alarm mechanism out into a separate macro, e.g. similar to `condition-case', where one could wrap a piece of Lisp code into that macro by supplying a timeout and expect it to call a timeout handler code in case of timeout: (with-system-timeout 3 (do-something) (message "%s" "Timed out after 3 seconds...")) This would also give Lisp developers full control over system related interactions. Regards, Alexander