From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: wait_reading_process_ouput hangs in certain cases (w/ patches) Date: Sat, 21 Jul 2018 12:52:56 +0300 Message-ID: <83bmb0zzif.fsf@gnu.org> References: <709614e8-1937-07c1-f554-b453ed4f3d4a@binary-island.eu> <7550438b-9fd4-d374-e571-8bb16456cad5@cs.ucla.edu> <797d0e16-1bae-50c2-35f8-05489ffce935@binary-island.eu> <83tvugdiu5.fsf@gnu.org> <877er5s0xv.fsf@gmail.com> <4e4c72bb-295d-81e1-e4ed-cad256bca83c@binary-island.eu> <87zi3v9461.fsf@gmail.com> <87k1uy8x68.fsf@gmail.com> <6d1970af-8c5c-20ba-be09-0b9aa757d663@binary-island.eu> <13b3e003-d12b-33a7-3ebe-c07b017a7cc0@binary-island.eu> <833714rm3d.fsf@gnu.org> <7b64afc3-e9b0-536d-1e42-a5f5d74f1adf@binary-island.eu> <83y3iur4jo.fsf@gnu.org> <2e181861-cfb8-a461-dfb6-9fee2164a611@binary-island.eu> <03e60e83-c832-ed6b-38bc-c0a20d2a0eac@binary-island.eu> NNTP-Posting-Host: blaine.gmane.org X-Trace: blaine.gmane.org 1532166835 19723 195.159.176.226 (21 Jul 2018 09:53:55 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 21 Jul 2018 09:53:55 +0000 (UTC) Cc: eggert@cs.ucla.edu, larsi@gnus.org, rrandresf@gmail.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Matthias Dahl Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Jul 21 11:53:50 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fgoaE-000540-8C for ged-emacs-devel@m.gmane.org; Sat, 21 Jul 2018 11:53:50 +0200 Original-Received: from localhost ([::1]:51338 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fgocL-0000Ix-2O for ged-emacs-devel@m.gmane.org; Sat, 21 Jul 2018 05:56:01 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:33842) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fgoZc-0006J0-GX for emacs-devel@gnu.org; Sat, 21 Jul 2018 05:53:16 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fgoZZ-0007dU-1K for emacs-devel@gnu.org; Sat, 21 Jul 2018 05:53:12 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:55978) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fgoZL-0007ZI-BE; Sat, 21 Jul 2018 05:52:55 -0400 Original-Received: from [176.228.60.248] (port=2642 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1fgoZK-0005qu-OO; Sat, 21 Jul 2018 05:52:55 -0400 In-reply-to: <03e60e83-c832-ed6b-38bc-c0a20d2a0eac@binary-island.eu> (message from Matthias Dahl on Tue, 26 Jun 2018 15:36:19 +0200) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:227608 Archived-At: > Cc: larsi@gnus.org, rrandresf@gmail.com, emacs-devel@gnu.org, > eggert@cs.ucla.edu, Stefan Monnier > From: Matthias Dahl > Date: Tue, 26 Jun 2018 15:36:19 +0200 > > Hello Eli, > > sorry that it has been a while since my last sign of life but 2018 has > been an especially bad year thus far health-wise, so I am struggling > to juggle "everything". > > I just wanted to bring the attached fixes back into discussion to get > them into master. > > As I understand it, they don't completely fix the problems Andres and > Lars have been seeing, but they still fix real bugs that can cause > random erratic / buggy behavior and/or freezes. > > What do you (and all the others in cc' :P) say? Sadly, none of "the others" chimed in, and I'm definitely not an expert on this stuff. So just some comments, hopefully they will make sense. > diff --git a/src/gnutls.c b/src/gnutls.c > index d22d5d267c..5bf5ee0e5c 100644 > --- a/src/gnutls.c > +++ b/src/gnutls.c > @@ -708,16 +708,18 @@ emacs_gnutls_read (struct Lisp_Process *proc, char *buf, ptrdiff_t nbyte) > rtnval = gnutls_record_recv (state, buf, nbyte); > if (rtnval >= 0) > return rtnval; > - else if (rtnval == GNUTLS_E_UNEXPECTED_PACKET_LENGTH) > - /* The peer closed the connection. */ > - return 0; Why is this hunk being deleted? I see that you intend to handle it in emacs_gnutls_handle_error, but I'm not sure I understand the net result. The current code simply ignores this situation; what will happen with the new code, except for the problem being logged? > else if (emacs_gnutls_handle_error (state, rtnval)) > - /* non-fatal error */ > - return -1; > - else { > - /* a fatal error occurred */ > - return 0; > - } > + { > + /* non-fatal error */ > + errno = EAGAIN; > + return -1; > + } > + else > + { > + /* a fatal error occurred */ > + errno = EPROTO; > + return 0; > + } EPROTO is not universally available, AFAIK. We have Gnulib's errno.h that defines a value for it, but that just gets us through compilation. What do you expect this value to cause when it is encountered by some code which needs to interpret it? We need to make sure the results will be sensible. > diff --git a/src/process.c b/src/process.c > index 6dba218c90..5702408985 100644 > --- a/src/process.c > +++ b/src/process.c > @@ -5397,60 +5397,31 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd, > #endif /* !HAVE_GLIB */ > > #ifdef HAVE_GNUTLS > - /* GnuTLS buffers data internally. In lowat mode it leaves > - some data in the TCP buffers so that select works, but > - with custom pull/push functions we need to check if some > - data is available in the buffers manually. */ > - if (nfds == 0) > + /* GnuTLS buffers data internally. select() will only report > + available data for the underlying kernel sockets API, not > + what has been buffered internally. As such, we need to loop > + through all channels and check for available data manually. */ > + if (nfds >= 0) > { > - fd_set tls_available; > - int set = 0; > - > - FD_ZERO (&tls_available); > - if (! wait_proc) > - { > - /* We're not waiting on a specific process, so loop > - through all the channels and check for data. > - This is a workaround needed for some versions of > - the gnutls library -- 2.12.14 has been confirmed > - to need it. See > - http://comments.gmane.org/gmane.emacs.devel/145074 */ > - for (channel = 0; channel < FD_SETSIZE; ++channel) > - if (! NILP (chan_process[channel])) > - { > - struct Lisp_Process *p = > - XPROCESS (chan_process[channel]); > - if (p && p->gnutls_p && p->gnutls_state > - && ((emacs_gnutls_record_check_pending > - (p->gnutls_state)) > - > 0)) > - { > - nfds++; > - eassert (p->infd == channel); > - FD_SET (p->infd, &tls_available); > - set++; > - } > - } > - } > - else > - { > - /* Check this specific channel. */ > - if (wait_proc->gnutls_p /* Check for valid process. */ > - && wait_proc->gnutls_state > - /* Do we have pending data? */ > - && ((emacs_gnutls_record_check_pending > - (wait_proc->gnutls_state)) > - > 0)) > - { > - nfds = 1; > - eassert (0 <= wait_proc->infd); > - /* Set to Available. */ > - FD_SET (wait_proc->infd, &tls_available); > - set++; > - } > - } > - if (set) > - Available = tls_available; > + for (channel = 0; channel < FD_SETSIZE; ++channel) > + if (! NILP (chan_process[channel])) > + { > + struct Lisp_Process *p = > + XPROCESS (chan_process[channel]); > + > + if (just_wait_proc && p != wait_proc) > + continue; > + > + if (p && p->gnutls_p && p->gnutls_state > + && ((emacs_gnutls_record_check_pending > + (p->gnutls_state)) > + > 0)) > + { > + nfds++; > + eassert (p->infd == channel); > + FD_SET (p->infd, &Available); > + } > + } This change is hard to read. The original code already called emacs_gnutls_record_check_pending, and there's no calls to pselect in the hunk, so I'm unsure what exactly are we changing here, in terms of the details. The overview in the commit log just gives the general idea, but its hard for me to connect that to the actual code changes. Plus, you lose some of the comments, for some reason, even though the same code is still present. Bottom line, I'd appreciate more details for this part. > diff --git a/src/xgselect.c b/src/xgselect.c > index fedd3127ef..f68982143e 100644 > --- a/src/xgselect.c > +++ b/src/xgselect.c > @@ -143,6 +143,14 @@ xg_select (int fds_lim, fd_set *rfds, fd_set *wfds, fd_set *efds, > ++retval; > } > } > + else if (nfds == 0) > + { > + // pselect() clears the file descriptor sets if no fd is ready (but > + // not if an error occurred), so should we to be compatible. (Bug#21337) > + if (rfds) FD_ZERO (rfds); > + if (wfds) FD_ZERO (wfds); > + if (efds) FD_ZERO (efds); > + } This is OK, but please don't use C++ style comments, it's not our style. Also, please be sure to test this when waiting on pselect is interrupted by C-g, we had some problems in that area which had roots in xg_select. Thanks.