From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Helmut Eller Newsgroups: gmane.emacs.bugs,gmane.emacs.pretest.bugs Subject: bug#5173: 23.1.50; interrupted connect() not handled properly Date: Thu, 10 Dec 2009 00:07:25 +0100 Message-ID: Reply-To: Helmut Eller , 5173@emacsbugs.donarmstrong.com NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: ger.gmane.org 1260402468 14427 80.91.229.12 (9 Dec 2009 23:47:48 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 9 Dec 2009 23:47:48 +0000 (UTC) To: emacs-pretest-bug@gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Dec 10 00:47:41 2009 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1NIWFv-0005lq-Oq for geb-bug-gnu-emacs@m.gmane.org; Thu, 10 Dec 2009 00:47:40 +0100 Original-Received: from localhost ([127.0.0.1]:40629 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NIWFu-0006z4-Un for geb-bug-gnu-emacs@m.gmane.org; Wed, 09 Dec 2009 18:47:38 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NIWFq-0006xy-EA for bug-gnu-emacs@gnu.org; Wed, 09 Dec 2009 18:47:34 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NIWFl-0006v9-Bd for bug-gnu-emacs@gnu.org; Wed, 09 Dec 2009 18:47:33 -0500 Original-Received: from [199.232.76.173] (port=44498 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NIWFl-0006v6-6F for bug-gnu-emacs@gnu.org; Wed, 09 Dec 2009 18:47:29 -0500 Original-Received: from rzlab.ucr.edu ([138.23.92.77]:36385) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1NIWFk-0005aM-Ji for bug-gnu-emacs@gnu.org; Wed, 09 Dec 2009 18:47:28 -0500 Original-Received: from rzlab.ucr.edu (rzlab.ucr.edu [127.0.0.1]) by rzlab.ucr.edu (8.14.3/8.14.3/Debian-5) with ESMTP id nB9NlQsf012549; Wed, 9 Dec 2009 15:47:26 -0800 Original-Received: (from debbugs@localhost) by rzlab.ucr.edu (8.14.3/8.14.3/Submit) id nB9Nj4Ei012255; Wed, 9 Dec 2009 15:45:04 -0800 Resent-Date: Wed, 9 Dec 2009 15:45:04 -0800 X-Loop: owner@emacsbugs.donarmstrong.com Resent-From: Helmut Eller Resent-To: bug-submit-list@donarmstrong.com Resent-CC: Emacs Bugs 2Resent-Date: Wed, 09 Dec 2009 23:45:04 +0000 Resent-Message-ID: Resent-Sender: owner@emacsbugs.donarmstrong.com X-Emacs-PR-Message: report 5173 X-Emacs-PR-Package: emacs X-Emacs-PR-Keywords: Original-Received: via spool by submit@emacsbugs.donarmstrong.com id=B.126040178611626 (code B ref -1); Wed, 09 Dec 2009 23:45:04 +0000 Original-Received: (at submit) by emacsbugs.donarmstrong.com; 9 Dec 2009 23:36:26 +0000 X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. Original-Received: from fencepost.gnu.org (fencepost.gnu.org [140.186.70.10]) by rzlab.ucr.edu (8.14.3/8.14.3/Debian-5) with ESMTP id nB9NaOvi011623 for ; Wed, 9 Dec 2009 15:36:25 -0800 Original-Received: from mail.gnu.org ([199.232.76.166]:43512 helo=mx10.gnu.org) by fencepost.gnu.org with esmtp (Exim 4.67) (envelope-from ) id 1NIW51-0006Y4-TS for emacs-pretest-bug@gnu.org; Wed, 09 Dec 2009 18:36:24 -0500 Original-Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60) (envelope-from ) id 1NIW50-0004a8-3C for emacs-pretest-bug@gnu.org; Wed, 09 Dec 2009 18:36:23 -0500 Original-Received: from dial-177248.pool.broadband44.net ([212.46.177.248]:42583 helo=ix) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1NIW4z-0004a2-DV for emacs-pretest-bug@gnu.org; Wed, 09 Dec 2009 18:36:21 -0500 Original-Received: from helmut by ix with local (Exim 4.69) (envelope-from ) id 1NIVcz-0008Ee-Cx for emacs-pretest-bug@gnu.org; Thu, 10 Dec 2009 00:07:25 +0100 User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1.50 (gnu/linux) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 2) X-Greylist: delayed 1734 seconds by postgrey-1.27 at monty-python; Wed, 09 Dec 2009 18:36:20 EST X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 2) Resent-Date: Wed, 09 Dec 2009 18:47:33 -0500 X-BeenThere: bug-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:33446 gmane.emacs.pretest.bugs:25294 Archived-At: --=-=-= Interrupts during connect() in make-network-process aren't handled properly. The following recipe to reproduce the problem is rather complicated. You'll need: - Qemu - a kernel with tuntap support (/dev/net/tun) - tunctl (from uml-utilities) - a linux image for Qemu. If you haven't one use http://www.nongnu.org/qemu/linux-0.2.img.bz2 - netcat The problem occurs during connect() and to make this period longer and more controllable we will use Qemu so that we can stop&resume the (virtual) TCP stack. Also note that we are dealing with signals here and that strace or gdb would interfere with the problem. * Prepare Qemu Our goal here is to start Qemu with a virtual network interface like so: qemu -hda linux-0.2.img -net nic -net tap,ifname=qtap0,script=no Before we can do that we need to create the qtap0 device: sudo tunctl -u USER -t qtap0 # replace USER with your user sudo ifconfig qtap0 192.168.255.1 up 192.168.255.1 will most likely work for you but any non-conflicting IP address will do. We'll also need netcat on the virtual machine. So let's copy it to the image: mkdir img sudo mount -o loop linux-0.2.img img sudo cp -L /bin/netcat img/usr/bin sudo umount img Now try qemu -hda linux-0.2.img -net nic -net tap,ifname=qtap0,script=no This should boot up a linux and present you a shell. In the shell configure the network device like so: sh-2.05b# ifconfig eth0 192.168.255.2 Make sure that you can ping that device from the host system with ping 192.168.255.2 If it doesn't work, check the output of route on the guest and the host. * Test netcat Next test netcat. Inside Qemu do: sh-2.05b# netcat -l -p 44444 and on the host: netcat 192.168.255.2 44444 Everything you type on the host should be echoed on the guest. When you abort with C-c the netcat inside Qemu should also abort. * Test Function Now we are almost ready to run real tests. Create a file connect-eintr.el containing the following function. (defun testit (vmpid) (switch-to-buffer "*Messages*") (signal-process vmpid 'SIGSTOP) (shell-command (concat (format "(sleep 0.4; kill -SIGSTOP %d; " (emacs-pid)) (format " sleep 0.1; kill -SIGCONT %d; " vmpid) (format " sleep 3; kill -SIGCONT %d;)&" (emacs-pid)))) (let ((sock (make-network-process :name "test" :service 44444 :host "192.168.255.2" :sentinel (lambda (x y) (error "sentinel: %s %s" x y))))) (process-send-string sock "foo") (process-send-string sock "bar") (process-send-string sock "baz\n") (message "ok"))) The function does the following steps: 1) stop Qemu 2) connect() 3) stop Emacs 4) resume Qemu 5) resume Emacs 6) write some output to the socket >From 2 to 4 Emacs will be inside connect() and we have plenty of time to press a key to generate an interrupt. * Run the test Before running the function create a listening socket inside Qemu as above: sh-2.05b# netcat -l -p 44444 For the next step we need the process id of Qemu, lets call that QPID. Use QPID in the following command line: emacs -Q -load connect-eintr.el -eval '(testit QPID)' -f kill-emacs This starts Emacs and runs the test. If you're using X11 and and don't press any key, Emacs will terminate after a few seconds and foobarbaz will appear in Qemu. Restart netcat as above and re-run the test, but this time press a key after Emacs' frame appears. This time Emacs will not terminate, but instead an error message will be visible in the *Messages* buffer. Also the netcat process in Qemu will be terminated but without producing any output. This latter behavior is wrong. Emacs should handle interrupts generated by pressing keys more gracefully. The problem will also occur if you run Emacs without X11 but the SIGSTOP will return the terminal to the shell and you have to put Emacs into foreground again with the fg command. Your terminal is most likely messed up at that point but the error message should still be visible. * Probable Cause of the problem The cause of the problem is that Emacs closes the socket after being interrupted in connect(). That approach works with servers which accept many connections but fails for servers which serve one connection only as the example with netcat above did. * Proposed fix As described here: http://www.madore.org/~david/computers/connect-intr.html the recommended way to handle interrupts during connect() is to use select() on the socket. The socket will become writable when the connection is established or when an error occurs. The error can be obtained with getsockopt. The patch below implements just that. The only other addition is the introduction of two macros EWOULDBLOCK_P and EINPROGRESS_P which have the only purpose to reduce #ifdef/#ifndef clutter. Helmut --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=connect.patch --- process.c.~1.607.~ 2009-12-04 08:01:43.000000000 +0100 +++ process.c 2009-12-09 23:37:19.000000000 +0100 @@ -234,6 +234,18 @@ #endif /* NON_BLOCKING_CONNECT */ #endif /* BROKEN_NON_BLOCKING_CONNECT */ +#ifdef EWOULDBLOCK +# define EWOULDBLOCK_P(x) (x == EWOULDBLOCK) +#else +# define EWOULDBLOCK_P(x) (0) +#endif + +#ifdef EINPROGRESS +# define EINPROGRESS_P(x) (x == EINPROGRESS) +#else +# define EINPROGRESS_P(x) (0) +#endif + /* Define DATAGRAM_SOCKETS if datagrams can be used safely on this system. We need to read full packets, so we need a "non-destructive" select. So we require either native select, @@ -3338,9 +3350,8 @@ { #ifndef NON_BLOCKING_CONNECT error ("Non-blocking connect not supported"); -#else - is_non_blocking_client = 1; #endif + is_non_blocking_client = 1; } name = Fplist_get (contact, QCname); @@ -3566,10 +3577,8 @@ continue; } -#ifdef DATAGRAM_SOCKETS if (!is_server && socktype == SOCK_DGRAM) break; -#endif /* DATAGRAM_SOCKETS */ #ifdef NON_BLOCKING_CONNECT if (is_non_blocking_client) @@ -3655,26 +3664,44 @@ ret = connect (s, lres->ai_addr, lres->ai_addrlen); xerrno = errno; - turn_on_atimers (1); + turn_on_atimers (1); - if (ret == 0 || xerrno == EISCONN) - { + if (ret == 0 + || (EWOULDBLOCK_P (xerrno) && is_non_blocking_client) + || (EINPROGRESS_P (xerrno) && is_non_blocking_client)) /* The unwind-protect will be discarded afterwards. Likewise for immediate_quit. */ break; - } -#ifdef NON_BLOCKING_CONNECT -#ifdef EINPROGRESS - if (is_non_blocking_client && xerrno == EINPROGRESS) - break; -#else -#ifdef EWOULDBLOCK - if (is_non_blocking_client && xerrno == EWOULDBLOCK) - break; -#endif -#endif -#endif + if (xerrno == EINTR) + { + /* Unlike most other syscalls connect() cannot be called + again. (That would return EALREADY.) The proper way to + wait for completion is select(). */ + int sc; + fd_set fdset; + retry_select: + FD_ZERO (&fdset); + FD_SET (s, &fdset); + QUIT; + sc = select (s + 1, 0, &fdset, 0, 0); + if (sc == -1) + if (errno == EINTR) + goto retry_select; + else + report_file_error ("select failed", Qnil); + eassert (sc > 0); + { + int len = sizeof xerrno; + eassert (FD_ISSET (s, &fdset)); + if (getsockopt (s, SOL_SOCKET, SO_ERROR, &xerrno, &len) == -1) + report_file_error ("getsockopt failed", Qnil); + if (xerrno != 0) + errno = xerrno, report_file_error ("error during connect", Qnil); + else + break; + } + } immediate_quit = 0; @@ -3682,9 +3709,6 @@ specpdl_ptr = specpdl + count1; emacs_close (s); s = -1; - - if (xerrno == EINTR) - goto retry_connect; } if (s >= 0) --=-=-=--