From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: wait_reading_process_ouput hangs in certain cases (w/ patches) Date: Tue, 07 Nov 2017 18:40:42 +0200 Message-ID: <83inemrqid.fsf@gnu.org> References: <83lgjz8eiy.fsf@gnu.org> <831slp98ut.fsf@gnu.org> <83tvyj62qg.fsf@gnu.org> <83r2tetf90.fsf@gnu.org> <5150d198-8dd3-9cf4-5914-b7e945294452@binary-island.eu> <83tvy7s6wi.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org X-Trace: blaine.gmane.org 1510072932 29449 195.159.176.226 (7 Nov 2017 16:42:12 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 7 Nov 2017 16:42:12 +0000 (UTC) Cc: emacs-devel@gnu.org To: Matthias Dahl Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Nov 07 17:42:08 2017 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eC6wv-0007Ow-1K for ged-emacs-devel@m.gmane.org; Tue, 07 Nov 2017 17:42:05 +0100 Original-Received: from localhost ([::1]:54342 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eC6x2-0006w9-HG for ged-emacs-devel@m.gmane.org; Tue, 07 Nov 2017 11:42:12 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:37151) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eC6vV-00060e-NO for emacs-devel@gnu.org; Tue, 07 Nov 2017 11:40:39 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eC6vS-0007zO-Hv for emacs-devel@gnu.org; Tue, 07 Nov 2017 11:40:37 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:52757) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eC6vS-0007zK-E5; Tue, 07 Nov 2017 11:40:34 -0500 Original-Received: from [176.228.60.248] (port=4174 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1eC6vR-0007L1-PL; Tue, 07 Nov 2017 11:40:34 -0500 In-reply-to: (message from Matthias Dahl on Tue, 7 Nov 2017 15:18:02 +0100) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:219967 Archived-At: > From: Matthias Dahl > Cc: emacs-devel@gnu.org > Date: Tue, 7 Nov 2017 15:18:02 +0100 > > The amount of output read (in total) from a process is tracked via the > following member variable of Lisp_Process: > > /* Byte-count for process output read from `infd'. */ > unsigned long infd_num_bytes_read; > > That gives us at least 4 GiB worth of data that can be read back and > tracked before it wraps around to 0. If we want, we can make it EMACS_INT, which will give us as much as Emacs can gobble. > What might have happened now is that wait_proc->infd_num_bytes_read was > already at the edge of the maximum representable value range and wrapped > around during some recursive calls to wait_reading_process_output while > we still have a value in initial_wait_proc_num_bytes_read that has not > wrapped around. E.g.: > > wait_proc->infd_num_bytes_read = (2^32)-10 > initial_wait_proc_num_bytes_read = wait_proc->infd_num_bytes_read > > ... recursive calls happen, output is read ... > > wait_proc->infd_num_bytes_read = 256 > initial_wait_proc_num_bytes_read = (2^32)-10 > > // this will be wrong > bytes_read = wait_proc->infd_num_bytes_read - > initial_wait_proc_num_bytes_read > > That causes a problem -- even though this case is unlikely, it is still > possible and I don't think it should be dismissed. That should be very rare (barring application-level bugs), and in any case it is no worse than just returning 1. So I don't think we should give up better support of 90% or even 99% of use cases due to the other 10% or 1%. > gnulib's integer overflow helpers won't help in this case, at all. Those helpers are meant to avoid the overflow itself, thus preventing "undefined behavior" when a signed integer overflows. They cannot, and shouldn't be expected to, fix the reason for the overflow -- that cannot be helped when it happens. > > I don't agree with this line of reasoning. We are not developing a > > library whose users will be familiar only with the formal interface > > definition and nothing else. We are talking about code that is being > > read, studied, and used by Emacs hack^H^H^H^Hdevelopers all the time. > > Even the C interface? Especially in the C interface. > It is my impression that the C parts of Emacs are usually what is > being avoided at all costs. It depends on who we are talking about. Some of us (yours truly included) work on the C level quite frequently. Any relatively radical new feature needs at least some C infrastructure. I had my share of writing code based on what I read in the function commentary and some general common sense and familiarity with the internals, only to find out later that they were incomplete or even prone to wrong interpretation. I'd like to minimize such occurrences as much as I can. > Besides that, I treated the C parts like implementation details and > kept strictly to the documented interfaces, expecting no user to > ever touch this. But that's exactly the problem: we _don't_have_ documented interfaces on the C level, at least not of the quality we would like to have. Do you really think that the commentary at the beginning of wait_reading_process_output describes what the function does anywhere close to completeness? Far from it. > > Having a non-trivial function whose behavior is hard to describe and > > remember accurately makes using it error-prone, especially if the > > deviant behavior happens only in some corner use case that is hard to > > reproduce. > > That is why it is important that the documentation and implementation of > a function actually align. Of course. But it's hard to keep them in sync, partly because it's incomplete to begin with, and that makes it not easy to recognize when the commentary needs to change due to some code changes. And the result is that I quite frequently find comments that are blatantly incorrect: mention parameters no longer there or fail to mention new ones that were added, for example. > In this case, no user should actually expect the function to return > the number of bytes read, just a positive or negative number or > zero... just like it is stated. I'd actually say that the commentary is incomplete because it doesn't say what that "positive" value means. Returning an opaque value from a function is not useful, unless you can pass it later to the same function to get it to do something special. > But, again, that might be a matter of perspective. For me, what is > documented is like a contract. IME with Emacs code, we are a long way from interfaces that are documented on a level that could be considered a contract. For starters, we don't even say clearly enough which functions can throw to top level, without which a contract is not really a contract, don't you agree? > > Sorry, you lost me half-way through this description. I actually > > meant to return only the increment of how many bytes were read since > > the last call, but I don't see why you'd need more than one counter > > that will be reset to zero once its value is consumed. > > My suggestion was to have an in-flight bytes read counter, that will > only track the number of bytes read while /any/ wait for that specific > process is active. Ah, I see. I think this is an unnecessary complication. The risk of overflow, while it isn't zero, is too low to justify such measures and the resulting complexity, IMO. > > It's a bug that has been there "forever", and the fix is too risky to > > have it in the middle of a pretest. I don't think we understand well > > enough all of its implications, and reading from subprocesses is a > > very central and very delicate part of Emacs. Sorry. > > Ack. Even though this means that users running into this right now will > have to wait quite a while for a fix in a stable release, which is a > pity imho. We can publish the patch here, and then people who really want this fixed ASAP can simply apply the patch.