From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Matthias Dahl Newsgroups: gmane.emacs.devel Subject: Re: wait_reading_process_ouput hangs in certain cases (w/ patches) Date: Tue, 7 Aug 2018 10:38:02 +0200 Message-ID: <6898bfb5-132e-2451-c49f-8864da8af429@binary-island.eu> References: <709614e8-1937-07c1-f554-b453ed4f3d4a@binary-island.eu> <7550438b-9fd4-d374-e571-8bb16456cad5@cs.ucla.edu> <797d0e16-1bae-50c2-35f8-05489ffce935@binary-island.eu> <83tvugdiu5.fsf@gnu.org> <877er5s0xv.fsf@gmail.com> <4e4c72bb-295d-81e1-e4ed-cad256bca83c@binary-island.eu> <87zi3v9461.fsf@gmail.com> <87k1uy8x68.fsf@gmail.com> <6d1970af-8c5c-20ba-be09-0b9aa757d663@binary-island.eu> <13b3e003-d12b-33a7-3ebe-c07b017a7cc0@binary-island.eu> <833714rm3d.fsf@gnu.org> <7b64afc3-e9b0-536d-1e42-a5f5d74f1adf@binary-island.eu> <83y3iur4jo.fsf@gnu.org> <2e181861-cfb8-a461-dfb6-9fee2164a611@binary-island.eu> <03e60e83-c832-ed6b-38bc-c0a20d2a0eac@binary-island.eu> <83bmb0zzif.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1533632578 559 195.159.176.226 (7 Aug 2018 09:02:58 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 7 Aug 2018 09:02:58 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.0 Cc: eggert@cs.ucla.edu, larsi@gnus.org, rrandresf@gmail.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Aug 07 11:02:53 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fmxtD-0008Q7-SD for ged-emacs-devel@m.gmane.org; Tue, 07 Aug 2018 11:02:52 +0200 Original-Received: from localhost ([::1]:37980 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fmxvI-0008ST-Mf for ged-emacs-devel@m.gmane.org; Tue, 07 Aug 2018 05:05:00 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:46894) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fmxv8-0008HO-7U for emacs-devel@gnu.org; Tue, 07 Aug 2018 05:04:51 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fmxv4-0000ez-Ty for emacs-devel@gnu.org; Tue, 07 Aug 2018 05:04:50 -0400 Original-Received: from ud19.udmedia.de ([194.117.254.59]:56856 helo=mail.ud19.udmedia.de) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fmxv4-0000bZ-Dl for emacs-devel@gnu.org; Tue, 07 Aug 2018 05:04:46 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple; d=binary-island.eu; h= subject:to:cc:references:from:message-id:date:mime-version :in-reply-to:content-type:content-transfer-encoding; s=k1; bh=eU s/6P8HLwASnWW/UhG9M8sl9OAHEUecAY1bROXhhEA=; b=jWdmyHH5FPPYR00jfa QsfGIDRb6Ok4XGZSpTt17EUMBfcOU3tS4cq3e805x2QIMIZsfcG6XnvV6m94fstw oupx9+BTRpBNiZeNLpdlSkvUjB1pGstRz0P+CQBAjMlfBkpU9hcaEAPRh6VkHxqT 2eQIugj5azpQutEE5sBwUBhKk= Original-Received: (qmail 27666 invoked from network); 7 Aug 2018 10:38:02 +0200 Original-Received: by mail.ud19.udmedia.de with ESMTPSA (ECDHE-RSA-AES256-GCM-SHA384 encrypted, authenticated); 7 Aug 2018 10:38:02 +0200 X-UD-Smtp-Session: ud19?126p1@Lv/lT9RyHuEqAoELxUACNDaqJbnKj9Bf Openpgp: id=1E87ADA02EFE759EFC20B2D1042F47D273AA780C Autocrypt: addr=ml_emacs-lists@binary-island.eu; prefer-encrypt=mutual; keydata= xsFNBFflJwUBEAC9fr9/1xQ6yIjhPy2Rk+bcoBwS99iU1XhW370gsY6Zh+1ZocsQSQ4gskND qgUc5VTI0ZvjlQF/HYYr+OZLibLFCeQ/ywzL6dLD2B7tbLFV3s8kgDwqV/LwzD5xbafIWX1v qqsiFA3Wy3HJxfqzIQneeffB281qs8rj07zihneqHSCowB7Pp3VMjpc9xYvBToNoOMZIkP8r fivMaF+2BvaNYYZVrtTTnqfCrNPmTV0WyRtOm9IGOgEUBuBJ5nJaUc0PY8C/pnXjIhV3Ev2y GDOIEMbARIEBct7wY/P0hvWs/MWlvVB0WgpJhTFXTNwhOHroqQ3HdizwLgMZPycvsuOsS4ZB ly5uhTIGwKstfdluxK4nWHzt2AXRuCOqW2+fklwyVSUWGg0rm6n6HmR3BNnff0AeIqfPR7M5 qCqbgIap7JrsuA8E2C0GP95un+cv1XmzHKaZJqu6y98PshV9ArrnrOE5XvmTwgKbVmzx2JPO 3reVokM5eqG1Xcs9jbQ/h467fXkEh/PpknBDqeEyHtxlq/MUg01Gxx+fHhYIYFGW+/h/wYf9 yS1tQJgs/w5kLhZH0TftlV9e/egI5Oyh0v8Yl8H0TkfkAkMVJHCpICW8EqOWMAGdDpQP50lG mbZ6j39V2DgNl4J7IX5KIDDtc7kTIaudUFCynDp1RITxHRITmwARAQABzS5NYXR0aGlhcyBE YWhsIDxtYXR0aGlhcy5kYWhsQGJpbmFyeS1pc2xhbmQuZXU+wsGBBBMBCgArBQJX5ScFAhsB BQk In-Reply-To: <83bmb0zzif.fsf@gnu.org> Content-Language: en-US X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 194.117.254.59 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:228263 Archived-At: Hello Eli... On 21/07/2018 11:52, Eli Zaretskii wrote: >> diff --git a/src/gnutls.c b/src/gnutls.c >> index d22d5d267c..5bf5ee0e5c 100644 >> --- a/src/gnutls.c >> +++ b/src/gnutls.c >> @@ -708,16 +708,18 @@ emacs_gnutls_read (struct Lisp_Process *proc, char *buf, ptrdiff_t nbyte) >> rtnval = gnutls_record_recv (state, buf, nbyte); >> if (rtnval >= 0) >> return rtnval; >> - else if (rtnval == GNUTLS_E_UNEXPECTED_PACKET_LENGTH) >> - /* The peer closed the connection. */ >> - return 0; > > Why is this hunk being deleted? I see that you intend to handle it in > emacs_gnutls_handle_error, but I'm not sure I understand the net > result. The current code simply ignores this situation; what will > happen with the new code, except for the problem being logged? The rationale behind this is: On GnuTLS < 3, GNUTLS_E_UNEXPECTED_PACKET_LENGTH was returned for premature connection termination. On >= 3, it is a real error on its own and should be treated that way. I moved the error handling over to emacs_gnutls_handle_error, so it's at a central point and not scattered all across. Also makes the code clearer to read, imho. >> else if (emacs_gnutls_handle_error (state, rtnval)) >> - /* non-fatal error */ >> - return -1; >> - else { >> - /* a fatal error occurred */ >> - return 0; >> - } >> + { >> + /* non-fatal error */ >> + errno = EAGAIN; >> + return -1; >> + } >> + else >> + { >> + /* a fatal error occurred */ >> + errno = EPROTO; >> + return 0; >> + } > > EPROTO is not universally available, AFAIK. We have Gnulib's errno.h > that defines a value for it, but that just gets us through > compilation. What do you expect this value to cause when it is > encountered by some code which needs to interpret it? We need to make > sure the results will be sensible. Sorry, I missed that. Right now, no code checks specifically for EPROTO, so any other fatal value would do as well. But EPROTO is fitting rather well in what is going on. Any code that will get an errno = EPROTO will treat it as a fatal error and that is exactly what it should do. So there is no problem with that. The only problem that might arise is, if one of the platforms that do not currently define EPROTO, define and use it. Then Emacs needs to be recompiled for it to work properly again on that platform. I don't know if Emacs uses other non-universally available errno values as well. If it does, then, imho, it really does not matter here. If it does not, I will change it to something else, if requested to avoid the situation altogether. What do you think? >> diff --git a/src/process.c b/src/process.c >> index 6dba218c90..5702408985 100644 >> --- a/src/process.c >> +++ b/src/process.c >> @@ -5397,60 +5397,31 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd, >> #endif /* !HAVE_GLIB */ >> >> #ifdef HAVE_GNUTLS >> - /* GnuTLS buffers data internally. In lowat mode it leaves >> - some data in the TCP buffers so that select works, but >> - with custom pull/push functions we need to check if some >> - data is available in the buffers manually. */ >> - if (nfds == 0) >> + /* GnuTLS buffers data internally. select() will only report >> + available data for the underlying kernel sockets API, not >> + what has been buffered internally. As such, we need to loop >> + through all channels and check for available data manually. */ >> + if (nfds >= 0) >> { >> - fd_set tls_available; >> - int set = 0; >> - >> - FD_ZERO (&tls_available); >> - if (! wait_proc) >> - { >> - /* We're not waiting on a specific process, so loop >> - through all the channels and check for data. >> - This is a workaround needed for some versions of >> - the gnutls library -- 2.12.14 has been confirmed >> - to need it. See >> - http://comments.gmane.org/gmane.emacs.devel/145074 */ >> - for (channel = 0; channel < FD_SETSIZE; ++channel) >> - if (! NILP (chan_process[channel])) >> - { >> - struct Lisp_Process *p = >> - XPROCESS (chan_process[channel]); >> - if (p && p->gnutls_p && p->gnutls_state >> - && ((emacs_gnutls_record_check_pending >> - (p->gnutls_state)) >> - > 0)) >> - { >> - nfds++; >> - eassert (p->infd == channel); >> - FD_SET (p->infd, &tls_available); >> - set++; >> - } >> - } >> - } >> - else >> - { >> - /* Check this specific channel. */ >> - if (wait_proc->gnutls_p /* Check for valid process. */ >> - && wait_proc->gnutls_state >> - /* Do we have pending data? */ >> - && ((emacs_gnutls_record_check_pending >> - (wait_proc->gnutls_state)) >> - > 0)) >> - { >> - nfds = 1; >> - eassert (0 <= wait_proc->infd); >> - /* Set to Available. */ >> - FD_SET (wait_proc->infd, &tls_available); >> - set++; >> - } >> - } >> - if (set) >> - Available = tls_available; >> + for (channel = 0; channel < FD_SETSIZE; ++channel) >> + if (! NILP (chan_process[channel])) >> + { >> + struct Lisp_Process *p = >> + XPROCESS (chan_process[channel]); >> + >> + if (just_wait_proc && p != wait_proc) >> + continue; >> + >> + if (p && p->gnutls_p && p->gnutls_state >> + && ((emacs_gnutls_record_check_pending >> + (p->gnutls_state)) >> + > 0)) >> + { >> + nfds++; >> + eassert (p->infd == channel); >> + FD_SET (p->infd, &Available); >> + } >> + } > > This change is hard to read. The original code already called > emacs_gnutls_record_check_pending, and there's no calls to pselect in > the hunk, so I'm unsure what exactly are we changing here, in terms of > the details. The overview in the commit log just gives the general > idea, but its hard for me to connect that to the actual code changes. > Plus, you lose some of the comments, for some reason, even though the > same code is still present. > > Bottom line, I'd appreciate more details for this part. With Emacs 25.2, the required GnuTLS version was set to 2.12.2 (from the "2.6.6 or later" previous default). GnuTLS 2.12 changed its default from lowat mode to non-lowat mode (and removed lowat mode with GnuTLS 3 completely). What this means is, there is no data left intentionally in the kernel buffers for a select to work properly. So we usually do not get any of the fds that belong to GnuTLS set as available when we do a select() call. The old Emacs code only checked _all_ channels if there was no wait_proc and if _no_ fds (nfds == 0) were set as available by our previous select() call. That was wrong and missed a few important cases and could lead to unnecessary waits or stalls. Previously, the old code treated a wait_proc as if just_wait_proc was set as well and only checked wait_proc and ignored the others. That was a bug as well. The new code always (if nfds >= 0) checks all (if !just_wait_proc) channels and sets the available fds in addition to the ones reported by select(). I hope that clears it up. The new code is basically a lot simpler and it tries to do the right thing (tm). :-) >> diff --git a/src/xgselect.c b/src/xgselect.c >> index fedd3127ef..f68982143e 100644 >> --- a/src/xgselect.c >> +++ b/src/xgselect.c >> @@ -143,6 +143,14 @@ xg_select (int fds_lim, fd_set *rfds, fd_set *wfds, fd_set *efds, >> ++retval; >> } >> } >> + else if (nfds == 0) >> + { >> + // pselect() clears the file descriptor sets if no fd is ready (but >> + // not if an error occurred), so should we to be compatible. (Bug#21337) >> + if (rfds) FD_ZERO (rfds); >> + if (wfds) FD_ZERO (wfds); >> + if (efds) FD_ZERO (efds); >> + } > > This is OK, but please don't use C++ style comments, it's not our > style. Also, please be sure to test this when waiting on pselect is > interrupted by C-g, we had some problems in that area which had roots > in xg_select. I will change that with the next version of the patches after I get your comments back. Regarding the problems you mentioned: Can you point me to a bug report and more information about that? If anything breaks with the new behavior, then that broken code is clearly doing something wrong and needs fixing. Personally, I have been running Emacs with the patches since I posted them and I have not run into a single issue. Thanks for taking the time, Eli. It is very much appreciated! So long, Matthias -- Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu