bug#22789: 25.1.50; In last master build https connections stop working

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

* bug#22789: 25.1.50; In last master build https connections stop working
@ 2016-02-24 10:26 José L. Doménech
  2016-02-24 14:00 ` Lars Ingebrigtsen
  0 siblings, 1 reply; 124+ messages in thread
From: José L. Doménech @ 2016-02-24 10:26 UTC (permalink / raw)
  To: 22789

In emacs, built from master, https connections have stopped working, while
imaps connections are still working.

How to reproduce it:
emacs -Q
eww https://www.fsf.org

The eww buffer remains blank.

Thw "eww http://www.fsf.org" command shows the page.
From my mail client (wanderlust), trying to retrieve rss sources with a https
connection fails, but I can access http sources and imaps folders.

Thanks.

In GNU Emacs 25.1.50.4 (x86_64-w64-mingw32)
 of 2016-02-24 built on LENOVO-PC
Repository revision: 378d138e64e9389e277e95528c143dc2456727a5
Windowing system distributor 'Microsoft Corp.', version 6.3.9600
Configured using:
 'configure --without-imagemagick
 PKG_CONFIG_PATH=/mingw64/lib/pkgconfig'

Configured features:
XPM JPEG TIFF GIF PNG RSVG SOUND NOTIFY ACL GNUTLS LIBXML2 ZLIB
TOOLKIT_SCROLL_BARS

Important settings:
  value of $LANG: es_ES.UTF-8
  locale-coding-system: cp1252

Major mode: eww

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  buffer-read-only: t
  line-number-mode: t
  transient-mark-mode: t

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.
Contacting host: www.fsf.org:443

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message dired dired-loaddefs rfc822 mml
mml-sec epa derived epg epg-config mm-decode mm-bodies mm-encode
mailabbrev gmm-utils mailheader sendmail network-stream nsm starttls
url-http tls gnutls mail-parse rfc2231 url-gw url-cache url-auth eww
puny mm-url gnus nnheader gnus-util rmail rmail-loaddefs rfc2047 rfc2045
ietf-drums mail-utils wid-edit mm-util mail-prsvr url-queue url
url-proxy url-privacy url-expand url-methods url-history url-cookie
url-domsuf url-util url-parse auth-source cl-seq eieio eieio-core
cl-macs eieio-loaddefs password-cache url-vars mailcap shr svg xml seq
byte-opt gv bytecomp byte-compile cconv cl-extra help-mode easymenu dom
cl-loaddefs cl-lib subr-x pcase browse-url format-spec time-date
mule-util tooltip eldoc electric uniquify ediff-hook vc-hooks
lisp-float-type mwheel dos-w32 ls-lisp disp-table term/w32-win w32-win
w32-vars term/common-win tool-bar dnd fontset image regexp-opt fringe
tabulated-list newcomment elisp-mode lisp-mode prog-mode register page
menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock font-lock
syntax facemenu font-core term/tty-colors frame cl-generic cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european
ethiopic indian cyrillic chinese charscript case-table epa-hook
jka-cmpr-hook help simple abbrev obarray minibuffer cl-preloaded nadvice
loaddefs button faces cus-face macroexp files text-properties overlay
sha1 md5 base64 format env code-pages mule custom widget
hashtable-print-readable backquote w32notify w32 multi-tty
make-network-process emacs)

Memory information:
((conses 16 129569 4885)
 (symbols 56 24069 0)
 (miscs 48 46 107)
 (strings 32 28495 4799)
 (string-bytes 1 822432)
 (vectors 16 16800)
 (vector-slots 8 471145 3632)
 (floats 8 237 46)
 (intervals 56 262 21)
 (buffers 976 12))





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-24 10:26 bug#22789: 25.1.50; In last master build https connections stop working José L. Doménech
@ 2016-02-24 14:00 ` Lars Ingebrigtsen
  2016-02-24 16:09   ` José L. Doménech
  0 siblings, 1 reply; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-24 14:00 UTC (permalink / raw)
  To: José L. Doménech; +Cc: 22789

José L. Doménech <j_l_domenech@yahoo.com> writes:

> In emacs, built from master, https connections have stopped working, while
> imaps connections are still working.
>
> How to reproduce it:
> emacs -Q
> eww https://www.fsf.org
>
> The eww buffer remains blank.

Does your Emacs have the GnuTLS libraries available?  What does
(gnutls-available-p) eval to?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-24 14:00 ` Lars Ingebrigtsen
@ 2016-02-24 16:09   ` José L. Doménech
  2016-02-24 18:06     ` Eli Zaretskii
  0 siblings, 1 reply; 124+ messages in thread
From: José L. Doménech @ 2016-02-24 16:09 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: José L. Doménech, 22789

On Wed, 24 Feb 2016 15:00:50 +0100,
Lars Ingebrigtsen wrote:
> 
> José L. Doménech <j_l_domenech@yahoo.com> writes:
> 
> > In emacs, built from master, https connections have stopped working, while
> > imaps connections are still working.
> >
> > How to reproduce it:
> > emacs -Q
> > eww https://www.fsf.org
> >
> > The eww buffer remains blank.
> 
> Does your Emacs have the GnuTLS libraries available?  What does
> (gnutls-available-p) eval to?
(gnutls-available-p) evaluates to t.

The libraries should be available since i have no problems with the following built (that I am now using):

In GNU Emacs 25.1.50.2 (x86_64-w64-mingw32)
 of 2016-02-21 built on LENOVO-PC
Repository revision: 1ba50a0d8cbef6686ecf752583832e7bbb9137ef
Windowing system distributor 'Microsoft Corp.', version 6.3.9600
Configured using:
 'configure --without-imagemagick
 PKG_CONFIG_PATH=/mingw64/lib/pkgconfig'

Configured features:
XPM JPEG TIFF GIF PNG RSVG SOUND NOTIFY ACL GNUTLS LIBXML2 ZLIB
TOOLKIT_SCROLL_BARS

Important settings:
  value of $LANG: ESN
  locale-coding-system: cp1252





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-24 16:09   ` José L. Doménech
@ 2016-02-24 18:06     ` Eli Zaretskii
  2016-02-24 23:48       ` Lars Ingebrigtsen
  0 siblings, 1 reply; 124+ messages in thread
From: Eli Zaretskii @ 2016-02-24 18:06 UTC (permalink / raw)
  To: José L. Doménech; +Cc: larsi, 22789

> From: José L. Doménech
> 	<j_l_domenech@yahoo.com>
> Cc: "José L. Doménech"
> 	<j_l_domenech@yahoo.com>, 22789@debbugs.gnu.org
> 
> On Wed, 24 Feb 2016 15:00:50 +0100,
> Lars Ingebrigtsen wrote:
> > 
> > José L. Doménech <j_l_domenech@yahoo.com> writes:
> > 
> > > In emacs, built from master, https connections have stopped working, while
> > > imaps connections are still working.
> > >
> > > How to reproduce it:
> > > emacs -Q
> > > eww https://www.fsf.org
> > >
> > > The eww buffer remains blank.
> > 
> > Does your Emacs have the GnuTLS libraries available?  What does
> > (gnutls-available-p) eval to?
> (gnutls-available-p) evaluates to t.
> 
> The libraries should be available since i have no problems with the following built (that I am now using):
> 
> In GNU Emacs 25.1.50.2 (x86_64-w64-mingw32)
>  of 2016-02-21 built on LENOVO-PC

I confirm the problem with the MS-Windows build: on master, https
doesn't work; on emacs-25 it does.

First suspect is the async changes, of course.





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-24 18:06     ` Eli Zaretskii
@ 2016-02-24 23:48       ` Lars Ingebrigtsen
  2016-02-25  0:02         ` Lars Ingebrigtsen
  2016-02-25  3:46         ` Eli Zaretskii
  0 siblings, 2 replies; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-24 23:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: José L. Doménech, 22789

Eli Zaretskii <eliz@gnu.org> writes:

> I confirm the problem with the MS-Windows build: on master, https
> doesn't work; on emacs-25 it does.
>
> First suspect is the async changes, of course.

Yup.  I'll try do do a build without getaddrinfo_a support and see
whether I can reproduce the https error here...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-24 23:48       ` Lars Ingebrigtsen
@ 2016-02-25  0:02         ` Lars Ingebrigtsen
  2016-02-25  1:09           ` Lars Ingebrigtsen
                             ` (2 more replies)
  2016-02-25  3:46         ` Eli Zaretskii
  1 sibling, 3 replies; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-25  0:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: José L. Doménech, 22789

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>> I confirm the problem with the MS-Windows build: on master, https
>> doesn't work; on emacs-25 it does.
>>
>> First suspect is the async changes, of course.
>
> Yup.  I'll try do do a build without getaddrinfo_a support and see
> whether I can reproduce the https error here...

I'm unable to reproduce this bug on Ubuntu, even if I compile without
getaddrinfo{,_a} support.

If you eval the following, does anything show up in the "*foo*" buffer?

(setq proc
(make-network-process :name "foo"
		      :buffer (get-buffer-create "*foo*")
		      :host "imap.gmail.com"
		      :service 993
		      :nowait t
		      :tls-parameters
		      (cons 'gnutls-x509pki
			    (gnutls-boot-parameters
			     :type 'gnutls-x509pki
			     :hostname "imap.gmail.com"))))

* OK Gimap ready for requests from 60.225.211.161 qr7mb410250987iec

should appear.  Also, after evaling that, what does

(process-status proc)

say?  It should say "connect" for a little bit, and then "open"...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-25  0:02         ` Lars Ingebrigtsen
@ 2016-02-25  1:09           ` Lars Ingebrigtsen
  2016-02-25 16:41           ` Eli Zaretskii
  2016-02-27 18:05           ` Alain Schneble
  2 siblings, 0 replies; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-25  1:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: José L. Doménech, 22789

Lars Ingebrigtsen <larsi@gnus.org> writes:

> I'm unable to reproduce this bug on Ubuntu, even if I compile without
> getaddrinfo{,_a} support.

I've read through the WINDOWSNT parts of process.c to see if there's
anything strikingly obviously wrong, but a first read through didn't
really show anything...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-25  0:02         ` Lars Ingebrigtsen
  2016-02-25  1:09           ` Lars Ingebrigtsen
@ 2016-02-25 16:41           ` Eli Zaretskii
  2016-02-26  2:29             ` Lars Ingebrigtsen
  2016-02-27 18:05           ` Alain Schneble
  2 siblings, 1 reply; 124+ messages in thread
From: Eli Zaretskii @ 2016-02-25 16:41 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: j_l_domenech, 22789

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: José L. Doménech <j_l_domenech@yahoo.com>,
>   22789@debbugs.gnu.org
> Date: Thu, 25 Feb 2016 11:02:51 +1100
> 
> If you eval the following, does anything show up in the "*foo*" buffer?
> 
> (setq proc
> (make-network-process :name "foo"
> 		      :buffer (get-buffer-create "*foo*")
> 		      :host "imap.gmail.com"
> 		      :service 993
> 		      :nowait t
> 		      :tls-parameters
> 		      (cons 'gnutls-x509pki
> 			    (gnutls-boot-parameters
> 			     :type 'gnutls-x509pki
> 			     :hostname "imap.gmail.com"))))

I see there only this:

  Process foo connect

> * OK Gimap ready for requests from 60.225.211.161 qr7mb410250987iec
> 
> should appear.  Also, after evaling that, what does
> 
> (process-status proc)
> 
> say?  It should say "connect" for a little bit, and then "open"...

It stays "connect" forever.  But "netstat" doesn't show any
connections to that server, AFAICT.  I think the connection simply
doesn't begin.

Btw, one other difference of the Windows build, this time wrt GnuTLS,
is that on Windows we instruct GnuTLS to use our own pull and push
functions, see gnutls.c around line 450.  The functions themselves are
defined in w32.c, at the end.





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-25 16:41           ` Eli Zaretskii
@ 2016-02-26  2:29             ` Lars Ingebrigtsen
  2016-02-26  9:36               ` Eli Zaretskii
  0 siblings, 1 reply; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-26  2:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: j_l_domenech, 22789

Eli Zaretskii <eliz@gnu.org> writes:

> It stays "connect" forever.  But "netstat" doesn't show any
> connections to that server, AFAICT.  I think the connection simply
> doesn't begin.

Hm...  when you mentioned that the Windows GnuTLS functions used our own
functions for the actual pull/push, I thought that perhaps the problem
was there (and was about to suggest the patch below), but this must mean
that the socket isn't even created.

`make-network-socket' ends with

#ifdef HAVE_GETADDRINFO_A
  /* If we're doing async address resolution, the list of addresses
     here will be nil, so we postpone connecting to the server. */
  if (!p->is_server && NILP (ip_addresses))
    {
      p->dns_request = dns_request;
      p->status = Qconnect;
    }
  else
    {
      connect_network_socket (proc, ip_addresses);
    }
#else /* HAVE_GETADDRINFO_A */
  connect_network_socket (proc, ip_addresses);
#endif

so that should happen unconditionally on Windows.

Let's see...

Oh!  This code in connect_network_socket looks suspect, perhaps.  If it
fails, then the socket will never actually be created...  hm...  but it
may be caught later...  and it doesn't explain why non-blocking non-TLS
sockets still work...  so it can't be that...

#ifdef NON_BLOCKING_CONNECT
      if (p->is_non_blocking_client)
	{
	  ret = fcntl (s, F_SETFL, O_NONBLOCK);
	  if (ret < 0)
	    {
	      xerrno = errno;
	      emacs_close (s);
	      s = -1;
	      continue;
	    }
	}
#endif

So perhaps it's in the TLS code anyway.  Could you try the following
code?  It'll make TLS negotiation blocking on WINDOWSNT again.

> Btw, one other difference of the Windows build, this time wrt GnuTLS,
> is that on Windows we instruct GnuTLS to use our own pull and push
> functions, see gnutls.c around line 450.  The functions themselves are
> defined in w32.c, at the end.

diff --git a/src/gnutls.c b/src/gnutls.c
index d1b34c5..00d0e56 100644
--- a/src/gnutls.c
+++ b/src/gnutls.c
@@ -410,12 +410,17 @@ gnutls_try_handshake (struct Lisp_Process *proc)
       QUIT;
     }
   while (ret < 0 && gnutls_error_is_fatal (ret) == 0
-	 && ! proc->is_non_blocking_client);
+#ifndef WINDOWSNT
+	 && ! proc->is_non_blocking_client
+#endif
+	 );
 
   proc->gnutls_initstage = GNUTLS_STAGE_HANDSHAKE_TRIED;
 
+#ifndef WINDOWSNT
   if (proc->is_non_blocking_client)
     proc->gnutls_p = true;
+#endif
 
   if (ret == GNUTLS_E_SUCCESS)
     {

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply related	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-26  2:29             ` Lars Ingebrigtsen
@ 2016-02-26  9:36               ` Eli Zaretskii
  2016-02-27  2:30                 ` Lars Ingebrigtsen
  0 siblings, 1 reply; 124+ messages in thread
From: Eli Zaretskii @ 2016-02-26  9:36 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: j_l_domenech, 22789

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: j_l_domenech@yahoo.com,  22789@debbugs.gnu.org
> Date: Fri, 26 Feb 2016 12:59:39 +1030
> 
> So perhaps it's in the TLS code anyway.  Could you try the following
> code?  It'll make TLS negotiation blocking on WINDOWSNT again.

Now, when I evaluate the same form you posted earlier, I get a lot of
binary garbage in *foo*, and (process-status proc) yields "failed".

And contacting https://www.fsf.org still doesn't work.

Can you post a summary of the changes that you've done, including the
files and functions where they were made?  That will help Someone™
look into this problem on Windows.

Thanks.

^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-26  9:36               ` Eli Zaretskii
@ 2016-02-27  2:30                 ` Lars Ingebrigtsen
  2016-02-27  2:43                   ` John Wiegley
                                     ` (2 more replies)
  0 siblings, 3 replies; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-27  2:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: j_l_domenech, 22789

Eli Zaretskii <eliz@gnu.org> writes:

>> So perhaps it's in the TLS code anyway.  Could you try the following
>> code?  It'll make TLS negotiation blocking on WINDOWSNT again.
>
> Now, when I evaluate the same form you posted earlier, I get a lot of
> binary garbage in *foo*, and (process-status proc) yields "failed".

Hm...  interesting...  I think this might point towards
emacs_gnutls_push/pull needing to be tweaked somehow.  (In particular,
the bytes from the stream do not seem to be delivered to the GnuTLS
library, but instead consumed by Emacs (and output into the buffer).)

But it shows that the problem definitely is in the TLS handling itself,
and not in the DNS part of the changes.

Let's see...  Oh, I think I see the problem with the patch I asked you
to test:

#ifdef HAVE_GNUTLS
      if (p->gnutls_p && p->gnutls_state)
	nbytes = emacs_gnutls_read (p, chars + carryover + buffered,
				    readmax - buffered);
      else
#endif
	nbytes = emacs_read (channel, chars + carryover + buffered,
			     readmax - buffered);

So deferring setting g->gnutls_p is not a good idea.  I'll try to debug
this further, but may not have time today...

> Can you post a summary of the changes that you've done, including the
> files and functions where they were made?  That will help Someone™
> look into this problem on Windows.

Uhm.  Well, it was a 3200 line patch that affected process.c and
gnutls.c...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-27  2:30                 ` Lars Ingebrigtsen
@ 2016-02-27  2:43                   ` John Wiegley
  2016-02-27  3:50                     ` Lars Ingebrigtsen
  2016-02-27  3:49                   ` Lars Ingebrigtsen
  2016-02-27  8:13                   ` Eli Zaretskii
  2 siblings, 1 reply; 124+ messages in thread
From: John Wiegley @ 2016-02-27  2:43 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: j_l_domenech, 22789

>>>>> Lars Ingebrigtsen <larsi@gnus.org> writes:

>> Can you post a summary of the changes that you've done, including the files
>> and functions where they were made? That will help Someone™ look into this
>> problem on Windows.

> Uhm. Well, it was a 3200 line patch that affected process.c and gnutls.c...

Yes, but you have a lot more knowledge of what that patch contained that
someone who will be looking at it afresh. I guess he's asking you to do a
little work to save him a lot of work, if you'd be willing.

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-27  2:43                   ` John Wiegley
@ 2016-02-27  3:50                     ` Lars Ingebrigtsen
  2016-02-27  8:14                       ` Eli Zaretskii
  0 siblings, 1 reply; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-27  3:50 UTC (permalink / raw)
  To: John Wiegley; +Cc: John Wiegley, 22789, j_l_domenech

John Wiegley <jwiegley@gmail.com> writes:

>>>>>> Lars Ingebrigtsen <larsi@gnus.org> writes:
>
>>> Can you post a summary of the changes that you've done, including the files
>>> and functions where they were made? That will help Someone™ look into this
>>> problem on Windows.
>
>> Uhm. Well, it was a 3200 line patch that affected process.c and gnutls.c...
>
> Yes, but you have a lot more knowledge of what that patch contained that
> someone who will be looking at it afresh. I guess he's asking you to do a
> little work to save him a lot of work, if you'd be willing.

Sure, but...  that's a lot of typing.  :-)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-27  3:50                     ` Lars Ingebrigtsen
@ 2016-02-27  8:14                       ` Eli Zaretskii
  0 siblings, 0 replies; 124+ messages in thread
From: Eli Zaretskii @ 2016-02-27  8:14 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: j_l_domenech, johnw, 22789

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: 22789@debbugs.gnu.org,  John Wiegley <johnw@gnu.org>,  Eli Zaretskii <eliz@gnu.org>,  j_l_domenech@yahoo.com
> Date: Sat, 27 Feb 2016 14:20:00 +1030
> 
> John Wiegley <jwiegley@gmail.com> writes:
> 
> >>>>>> Lars Ingebrigtsen <larsi@gnus.org> writes:
> >
> >>> Can you post a summary of the changes that you've done, including the files
> >>> and functions where they were made? That will help Someone™ look into this
> >>> problem on Windows.
> >
> >> Uhm. Well, it was a 3200 line patch that affected process.c and gnutls.c...
> >
> > Yes, but you have a lot more knowledge of what that patch contained that
> > someone who will be looking at it afresh. I guess he's asking you to do a
> > little work to save him a lot of work, if you'd be willing.
> 
> Sure, but...  that's a lot of typing.  :-)

It is not needed if you are keen on continuing to debug this problem.
If someone else would take over, having such a description would help.

Thanks.





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-27  2:30                 ` Lars Ingebrigtsen
  2016-02-27  2:43                   ` John Wiegley
@ 2016-02-27  3:49                   ` Lars Ingebrigtsen
  2016-02-27  8:10                     ` Eli Zaretskii
  2016-02-27  8:13                   ` Eli Zaretskii
  2 siblings, 1 reply; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-27  3:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: j_l_domenech, 22789

Lars Ingebrigtsen <larsi@gnus.org> writes:

> So deferring setting g->gnutls_p is not a good idea.  I'll try to debug
> this further, but may not have time today...

Hey!  Time!

The following patch should make the rest of Emacs leave the TLS socket
alone until we've done the handshake, I think.  It works for me under
Linux (and makes the negotiation blocking).  Could you test this under
Windows?

diff --git a/src/gnutls.c b/src/gnutls.c
index d1b34c5..002e7b4 100644
--- a/src/gnutls.c
+++ b/src/gnutls.c
@@ -403,6 +403,9 @@ gnutls_try_handshake (struct Lisp_Process *proc)
   gnutls_session_t state = proc->gnutls_state;
   int ret;
 
+  if (proc->is_non_blocking_client)
+    proc->gnutls_p = true;
+
   do
     {
       ret = gnutls_handshake (state);
@@ -410,13 +413,13 @@ gnutls_try_handshake (struct Lisp_Process *proc)
       QUIT;
     }
   while (ret < 0 && gnutls_error_is_fatal (ret) == 0
-	 && ! proc->is_non_blocking_client);
+#if 0
+	 && ! proc->is_non_blocking_client
+#endif
+	 );
 
   proc->gnutls_initstage = GNUTLS_STAGE_HANDSHAKE_TRIED;
 
-  if (proc->is_non_blocking_client)
-    proc->gnutls_p = true;
-
   if (ret == GNUTLS_E_SUCCESS)
     {
       /* Here we're finally done.  */


-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply related	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-27  3:49                   ` Lars Ingebrigtsen
@ 2016-02-27  8:10                     ` Eli Zaretskii
  0 siblings, 0 replies; 124+ messages in thread
From: Eli Zaretskii @ 2016-02-27  8:10 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: j_l_domenech, 22789

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: 22789@debbugs.gnu.org,  j_l_domenech@yahoo.com
> Date: Sat, 27 Feb 2016 14:19:08 +1030
> 
> Lars Ingebrigtsen <larsi@gnus.org> writes:
> 
> > So deferring setting g->gnutls_p is not a good idea.  I'll try to debug
> > this further, but may not have time today...
> 
> Hey!  Time!
> 
> The following patch should make the rest of Emacs leave the TLS socket
> alone until we've done the handshake, I think.  It works for me under
> Linux (and makes the negotiation blocking).  Could you test this under
> Windows?

I'm afraid it doesn't help.  The connection still doesn't happen.

Thanks.





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-27  2:30                 ` Lars Ingebrigtsen
  2016-02-27  2:43                   ` John Wiegley
  2016-02-27  3:49                   ` Lars Ingebrigtsen
@ 2016-02-27  8:13                   ` Eli Zaretskii
  2 siblings, 0 replies; 124+ messages in thread
From: Eli Zaretskii @ 2016-02-27  8:13 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: j_l_domenech, 22789

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: j_l_domenech@yahoo.com,  22789@debbugs.gnu.org
> Date: Sat, 27 Feb 2016 13:00:44 +1030
> 
> > Can you post a summary of the changes that you've done, including the
> > files and functions where they were made?  That will help Someone™
> > look into this problem on Windows.
> 
> Uhm.  Well, it was a 3200 line patch that affected process.c and
> gnutls.c...

I can easily generate the patch, of course.  What I was asking for is
some description of the changes, which will make it easier to decide
where the problem might be.  Otherwise, that Someone™ will have to
review the entire patch to make that decision.





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-25  0:02         ` Lars Ingebrigtsen
  2016-02-25  1:09           ` Lars Ingebrigtsen
  2016-02-25 16:41           ` Eli Zaretskii
@ 2016-02-27 18:05           ` Alain Schneble
  2016-02-27 22:38             ` Lars Ingebrigtsen
  2016-02-28 16:47             ` Eli Zaretskii
  2 siblings, 2 replies; 124+ messages in thread
From: Alain Schneble @ 2016-02-27 18:05 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: José L. Doménech, 22789

Lars Ingebrigtsen <larsi@gnus.org> writes:

> (setq proc
> (make-network-process :name "foo"
> 		      :buffer (get-buffer-create "*foo*")
> 		      :host "imap.gmail.com"
> 		      :service 993
> 		      :nowait t
> 		      :tls-parameters
> 		      (cons 'gnutls-x509pki
> 			    (gnutls-boot-parameters
> 			     :type 'gnutls-x509pki
> 			     :hostname "imap.gmail.com"))))
>
> * OK Gimap ready for requests from 60.225.211.161 qr7mb410250987iec
>
> should appear.  Also, after evaling that, what does

It seems to be a timing issue.  If I set gnutls-log-level to 5, this
works also on Windows (i.e i get OK Gimap...).

What I found out is that it runs into the following branch in
wait_reading_process_output:

...
else
{
  /* Preserve status of processes already terminated.  */
  XPROCESS (proc)->tick = ++process_tick;
  deactivate_process (proc);
  if (XPROCESS (proc)->raw_status_new)
    update_status (XPROCESS (proc));
  if (EQ (XPROCESS (proc)->status, Qrun))
    pset_status (XPROCESS (proc),
                 list2 (Qexit, make_number (256)));
}

Here it deactivates the process, but as its status is "connect", it
won't change it.  That's the reason why it remains in "connect" state.

I guess that it enters this path because the socket is not ready yet.
But why?  I will try to figure it out later...






^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-27 18:05           ` Alain Schneble
@ 2016-02-27 22:38             ` Lars Ingebrigtsen
  2016-02-27 23:06               ` Alain Schneble
  2016-02-28 16:47             ` Eli Zaretskii
  1 sibling, 1 reply; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-27 22:38 UTC (permalink / raw)
  To: Alain Schneble; +Cc: José L. Doménech, 22789

Alain Schneble <a.s@realize.ch> writes:

> What I found out is that it runs into the following branch in
> wait_reading_process_output:
>
> ...
> else
> {
>   /* Preserve status of processes already terminated.  */
>   XPROCESS (proc)->tick = ++process_tick;
>   deactivate_process (proc);
>   if (XPROCESS (proc)->raw_status_new)
>     update_status (XPROCESS (proc));
>   if (EQ (XPROCESS (proc)->status, Qrun))
>     pset_status (XPROCESS (proc),
>                  list2 (Qexit, make_number (256)));
> }
>
> Here it deactivates the process, but as its status is "connect", it
> won't change it.  That's the reason why it remains in "connect" state.
>
> I guess that it enters this path because the socket is not ready yet.
> But why?  I will try to figure it out later...

I think you're on to something!

The thing starts with

	      nread = read_process_output (proc, channel);

and for un-setup TLS sockets, it'll now get back -1, and it should
ideally end up in the

	      else if (nread == -1 && errno == EAGAIN)
		;

thing, so that it tries again later.  But errno is not EAGAIN here
(usually)...

Does the following patch make things work on Windows?

diff --git a/src/gnutls.c b/src/gnutls.c
index d1b34c5..a6b1294 100644
--- a/src/gnutls.c
+++ b/src/gnutls.c
@@ -403,6 +403,9 @@ gnutls_try_handshake (struct Lisp_Process *proc)
   gnutls_session_t state = proc->gnutls_state;
   int ret;
 
+  if (proc->is_non_blocking_client)
+    proc->gnutls_p = true;
+
   do
     {
       ret = gnutls_handshake (state);
@@ -410,13 +413,13 @@ gnutls_try_handshake (struct Lisp_Process *proc)
       QUIT;
     }
   while (ret < 0 && gnutls_error_is_fatal (ret) == 0
-	 && ! proc->is_non_blocking_client);
+#if 0
+	 && ! proc->is_non_blocking_client
+#endif
+	 );
 
   proc->gnutls_initstage = GNUTLS_STAGE_HANDSHAKE_TRIED;
 
-  if (proc->is_non_blocking_client)
-    proc->gnutls_p = true;
-
   if (ret == GNUTLS_E_SUCCESS)
     {
       /* Here we're finally done.  */
@@ -541,7 +544,10 @@ emacs_gnutls_read (struct Lisp_Process *proc, char *buf, ptrdiff_t nbyte)
   gnutls_session_t state = proc->gnutls_state;
 
   if (proc->gnutls_initstage != GNUTLS_STAGE_READY)
-    return -1;
+    {
+      errno = EAGAIN;
+      return -1;
+    }
 
   rtnval = gnutls_record_recv (state, buf, nbyte);
   if (rtnval >= 0)


-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply related	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-27 22:38             ` Lars Ingebrigtsen
@ 2016-02-27 23:06               ` Alain Schneble
  2016-02-27 23:49                 ` Alain Schneble
  0 siblings, 1 reply; 124+ messages in thread
From: Alain Schneble @ 2016-02-27 23:06 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: José L. Doménech, 22789

Lars Ingebrigtsen <larsi@gnus.org> writes:

> I think you're on to something!
>
> The thing starts with
>
> 	      nread = read_process_output (proc, channel);
>
> and for un-setup TLS sockets, it'll now get back -1, and it should
> ideally end up in the
>
> 	      else if (nread == -1 && errno == EAGAIN)
> 		;
>
> thing, so that it tries again later.  But errno is not EAGAIN here
> (usually)...

I suspected this as well.

> Does the following patch make things work on Windows?

But unfortunately, it does not work, also with this patch applied.  The
problem seems to happen earlier.

In w32.c (emacs_gnutls_push) I see that sys_write returns with 0.  But
the buffer to write contains sz=255 bytes.  And here errno is 0 after
the write.  This is strange.  I guess that here errno should be set to
EAGAIN... I mean in sys_write...

After this broken emacs_gnutls_push call, gnutls_handshake returns:

-53 GNUTLS_E_PUSH_ERROR
And later...
-10 GNUTLS_E_INVALID_SESSION.

^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-27 23:06               ` Alain Schneble
@ 2016-02-27 23:49                 ` Alain Schneble
  2016-02-28  3:31                   ` Lars Ingebrigtsen
  2016-02-28  3:43                   ` Eli Zaretskii
  0 siblings, 2 replies; 124+ messages in thread
From: Alain Schneble @ 2016-02-27 23:49 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: José L. Doménech, 22789

Alain Schneble <a.s@realize.ch> writes:

> In w32.c (emacs_gnutls_push) I see that sys_write returns with 0.  But
> the buffer to write contains sz=255 bytes.  And here errno is 0 after
> the write.  This is strange.  I guess that here errno should be set to
> EAGAIN... I mean in sys_write...
>
> After this broken emacs_gnutls_push call, gnutls_handshake returns:
>
> -53 GNUTLS_E_PUSH_ERROR
> And later...
> -10 GNUTLS_E_INVALID_SESSION.

Here we go.  I think we are getting closer to the root cause of the
problem.  In w32.c (sys_write), it runs into the following error:

      if (nchars == SOCKET_ERROR)
        {
	  DebPrint (("sys_write.send failed with error %d on socket %ld\n",
		     pfn_WSAGetLastError (), SOCK_HANDLE (fd)));
	  set_errno ();
	}

Strange thing: set_errno returns with errno == 0.  This because
pfn_WSAGetLastError returns 0 as well.

Now, if I do...

	  if (errno == 0)
	    errno = EAGAIN;

...just after the call to set_errno above, guess what: It seems to work!

At least for me, it will be an exercise for tomorrow to find the reason
why pfn_WSAGetLastError returns 0 in this case.  *snore*

Do you agree it shouldn't return 0?

^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-27 23:49                 ` Alain Schneble
@ 2016-02-28  3:31                   ` Lars Ingebrigtsen
  2016-02-28  9:58                     ` Alain Schneble
  2016-02-28 16:53                     ` Eli Zaretskii
  2016-02-28  3:43                   ` Eli Zaretskii
  1 sibling, 2 replies; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-28  3:31 UTC (permalink / raw)
  To: Alain Schneble; +Cc: José L. Doménech, 22789

Alain Schneble <a.s@realize.ch> writes:

> Here we go.  I think we are getting closer to the root cause of the
> problem.  In w32.c (sys_write), it runs into the following error:
>
>       if (nchars == SOCKET_ERROR)
>         {
> 	  DebPrint (("sys_write.send failed with error %d on socket %ld\n",
> 		     pfn_WSAGetLastError (), SOCK_HANDLE (fd)));
> 	  set_errno ();
> 	}
>
> Strange thing: set_errno returns with errno == 0.  This because
> pfn_WSAGetLastError returns 0 as well.
>
> Now, if I do...
>
> 	  if (errno == 0)
> 	    errno = EAGAIN;
>
> ...just after the call to set_errno above, guess what: It seems to work!

Aha!  Good sleuthing.  :-)

> At least for me, it will be an exercise for tomorrow to find the reason
> why pfn_WSAGetLastError returns 0 in this case.  *snore*
>
> Do you agree it shouldn't return 0?

Yes.  That would make more sense.

Both I don't think that code path (sys_write) has ever been called
before on a nonblocking socket.  (Because we've always opened the
sockets before without O_NONBLOCK, since we've never called
`make-network-process' with :nowait t before from `open-gnutls-stream'.)

So ... is it possible that these functions that w32.c calls just
don't...  quite work with nonblocking sockets?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-28  3:31                   ` Lars Ingebrigtsen
@ 2016-02-28  9:58                     ` Alain Schneble
  2016-02-28 16:53                     ` Eli Zaretskii
  1 sibling, 0 replies; 124+ messages in thread
From: Alain Schneble @ 2016-02-28  9:58 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: José L. Doménech, 22789

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Alain Schneble <a.s@realize.ch> writes:
>
>> At least for me, it will be an exercise for tomorrow to find the reason
>> why pfn_WSAGetLastError returns 0 in this case.  *snore*
>>
>> Do you agree it shouldn't return 0?
>
> Yes.  That would make more sense.

See my other message in this thread.

> Both I don't think that code path (sys_write) has ever been called
> before on a nonblocking socket.  (Because we've always opened the
> sockets before without O_NONBLOCK, since we've never called
> `make-network-process' with :nowait t before from `open-gnutls-stream'.)

I agree (at least for the gnutls case...).

> So ... is it possible that these functions that w32.c calls just
> don't...  quite work with nonblocking sockets?

It seems so, yes :(  I'll try to find some more time to dive into this
further...






^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-28  3:31                   ` Lars Ingebrigtsen
  2016-02-28  9:58                     ` Alain Schneble
@ 2016-02-28 16:53                     ` Eli Zaretskii
  2016-02-29  2:37                       ` Lars Ingebrigtsen
  1 sibling, 1 reply; 124+ messages in thread
From: Eli Zaretskii @ 2016-02-28 16:53 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: j_l_domenech, a.s, 22789

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Sun, 28 Feb 2016 14:01:19 +1030
> Cc: "José L. Doménech"
> 	<j_l_domenech@yahoo.com>, 22789@debbugs.gnu.org
> 
> Both I don't think that code path (sys_write) has ever been called
> before on a nonblocking socket.  (Because we've always opened the
> sockets before without O_NONBLOCK, since we've never called
> `make-network-process' with :nowait t before from `open-gnutls-stream'.)
> 
> So ... is it possible that these functions that w32.c calls just
> don't...  quite work with nonblocking sockets?

That's not true, non-blocking sockets are supported on Windows since a
year ago.

And the above aren't the right questions anyway: the problem is not
with sockets per se, the problem is with a TLS connection
specifically.  And I think it is caused by the fact that now we
proceed to GnuTLS handshaking right after the call to 'connect', which
takes some time to complete, when the socket is non-blocking.





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-28 16:53                     ` Eli Zaretskii
@ 2016-02-29  2:37                       ` Lars Ingebrigtsen
  0 siblings, 0 replies; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-29  2:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: j_l_domenech, a.s, 22789

Eli Zaretskii <eliz@gnu.org> writes:

> That's not true, non-blocking sockets are supported on Windows since a
> year ago.

I grepped for uses of sys_write, and the only one I noticed was the one
from the TLS push functions.  Because I missed this:

#ifdef WINDOWSNT
#define read sys_read
#define write sys_write

D'oh.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-27 23:49                 ` Alain Schneble
  2016-02-28  3:31                   ` Lars Ingebrigtsen
@ 2016-02-28  3:43                   ` Eli Zaretskii
  2016-02-28  9:48                     ` Alain Schneble
  1 sibling, 1 reply; 124+ messages in thread
From: Eli Zaretskii @ 2016-02-28  3:43 UTC (permalink / raw)
  To: Alain Schneble; +Cc: larsi, j_l_domenech, 22789

> From: Alain Schneble <a.s@realize.ch>
> Date: Sun, 28 Feb 2016 00:49:25 +0100
> Cc: "José L. Doménech"
> 	<j_l_domenech@yahoo.com>, 22789@debbugs.gnu.org
> 
> Now, if I do...
> 
> 	  if (errno == 0)
> 	    errno = EAGAIN;
> 
> ...just after the call to set_errno above, guess what: It seems to work!

This must be conditioned on something that requires EAGAIN.  Otherwise
overriding the errno of zero sounds like a bad idea to me.

Why does the nchars == SOCKET_ERROR happen here at all, if winsock
returns with an error of zero?  Isn't that strange?





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-28  3:43                   ` Eli Zaretskii
@ 2016-02-28  9:48                     ` Alain Schneble
  2016-02-28 17:00                       ` Eli Zaretskii
  0 siblings, 1 reply; 124+ messages in thread
From: Alain Schneble @ 2016-02-28  9:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, j_l_domenech, 22789

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Alain Schneble <a.s@realize.ch>
>> Date: Sun, 28 Feb 2016 00:49:25 +0100
>> Cc: "José L. Doménech"
>> 	<j_l_domenech@yahoo.com>, 22789@debbugs.gnu.org
>> 
>> Now, if I do...
>> 
>> 	  if (errno == 0)
>> 	    errno = EAGAIN;
>> 
>> ...just after the call to set_errno above, guess what: It seems to work!
>
> This must be conditioned on something that requires EAGAIN.  Otherwise
> overriding the errno of zero sounds like a bad idea to me.

This was just a quick try, to better understand the behavior.  Not a
proposed solution.  Excuse me for not being precise.

> Why does the nchars == SOCKET_ERROR happen here at all, if winsock
> returns with an error of zero?  Isn't that strange?

Exactly, this was what I meant in the part you elided:

Alain Schneble <a.s@realize.ch> writes:

> At least for me, it will be an exercise for tomorrow to find the reason
> why pfn_WSAGetLastError returns 0 in this case.

I just had some time to investigate it further.  The WSAGetLastError
gets overridden in the call to pfn_ioctlsocket.  That's why errno is 0.

If I swap the order of the if statements in sys_write to look as
follows, then the reason for the SOCKET_ERROR is revealed:

      if (nchars == SOCKET_ERROR)
        {
	  DebPrint (("sys_write.send failed with error %d on socket %ld\n",
		     pfn_WSAGetLastError (), SOCK_HANDLE (fd)));
	  set_errno ();
	}

      /* Set the socket back to non-blocking if it was before,
	 for other operations that support it.  */
      if (fd_info[fd].flags & FILE_NDELAY)
	{
	  printf ("reset file_ndelay");
	  nblock = 1;
	  pfn_ioctlsocket (SOCK_HANDLE (fd), FIONBIO, &nblock);
	}

=> WSAENOTCONN (10057): Socket is not connected.  So that's the prove it
accesses the socket too early.

Alas, even though it seems to help at least for the test code I tried,
turning WSAENOTCONN into EAGAIN seems wrong after all.  It shouldn't try
to write to the socket before it is connected at all...(?)  Also the
code "wraps" pfn_send and turns it into a blocking call.  Not sure what
the implications are...

Nevertheless, don't you think the error handling in this code section is
not very elaborate and switching the order as shown above might be
better anyway?  sys_write is primarily about writing, not about
switching from non-blocking to blocking and back again...  Or shall it
somehow aggregate possible errors of both calls (pfn_send and
pfn_ioctlsocket)?

^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-28  9:48                     ` Alain Schneble
@ 2016-02-28 17:00                       ` Eli Zaretskii
  2016-02-29  2:49                         ` Lars Ingebrigtsen
  2016-02-29  9:55                         ` Alain Schneble
  0 siblings, 2 replies; 124+ messages in thread
From: Eli Zaretskii @ 2016-02-28 17:00 UTC (permalink / raw)
  To: Alain Schneble; +Cc: larsi, j_l_domenech, 22789

> From: Alain Schneble <a.s@realize.ch>
> CC: <larsi@gnus.org>, <j_l_domenech@yahoo.com>, <22789@debbugs.gnu.org>
> Date: Sun, 28 Feb 2016 10:48:37 +0100
> 
> If I swap the order of the if statements in sys_write to look as
> follows, then the reason for the SOCKET_ERROR is revealed:
> 
>       if (nchars == SOCKET_ERROR)
>         {
> 	  DebPrint (("sys_write.send failed with error %d on socket %ld\n",
> 		     pfn_WSAGetLastError (), SOCK_HANDLE (fd)));
> 	  set_errno ();
> 	}
> 
>       /* Set the socket back to non-blocking if it was before,
> 	 for other operations that support it.  */
>       if (fd_info[fd].flags & FILE_NDELAY)
> 	{
> 	  printf ("reset file_ndelay");
> 	  nblock = 1;
> 	  pfn_ioctlsocket (SOCK_HANDLE (fd), FIONBIO, &nblock);
> 	}
> 
> => WSAENOTCONN (10057): Socket is not connected.  So that's the prove it
> accesses the socket too early.

Yes.

> Alas, even though it seems to help at least for the test code I tried,
> turning WSAENOTCONN into EAGAIN seems wrong after all.

It does here, although this needs to be done only if the socket is in
the process of connecting, and the return value needs to be negative,
not zero.  I installed a fix along these lines, and it seems to work
for me: https://www.fsf.org is displayed OK.

> It shouldn't try to write to the socket before it is connected at
> all...(?)

No, I think it should: that write comes from GnuTLS, when it attempts
a handshake.  Returning EWOULDBLOCK tells GnuTLS to spin waiting until
the connection is complete.  How else could this work, since we now
proceed with GnuTLS handshake immediately after the call to 'connect'
returns, when the connection is not yet complete, this being a
non-blocking socket?

> Also the code "wraps" pfn_send and turns it into a blocking call.
> Not sure what the implications are...

The only implication is that we get ENOTCONN instead of EWOULDBLOCK.
But that's easy to handle.

> Nevertheless, don't you think the error handling in this code section is
> not very elaborate and switching the order as shown above might be
> better anyway?  sys_write is primarily about writing, not about
> switching from non-blocking to blocking and back again...  Or shall it
> somehow aggregate possible errors of both calls (pfn_send and
> pfn_ioctlsocket)?

Yes, you are right.  I did that.

The only problem left is that not all images on www.fsf.org's page are
downloaded; they are if I use http instead of https.  I guess this is
some eww thing?





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-28 17:00                       ` Eli Zaretskii
@ 2016-02-29  2:49                         ` Lars Ingebrigtsen
  2016-02-29  3:43                           ` Eli Zaretskii
  2016-02-29  9:55                         ` Alain Schneble
  1 sibling, 1 reply; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-29  2:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: j_l_domenech, Alain Schneble, 22789

Eli Zaretskii <eliz@gnu.org> writes:

> No, I think it should: that write comes from GnuTLS, when it attempts
> a handshake.  Returning EWOULDBLOCK tells GnuTLS to spin waiting until
> the connection is complete.

No, it doesn't spin, it just tries again later (from the event loop).
But returning EWOULDBLOCK seems correct.

> The only problem left is that not all images on www.fsf.org's page are
> downloaded; they are if I use http instead of https.  I guess this is
> some eww thing?

No, you should get the images in the https version, too, unless it's
unable to verify the TLS certificate.  Which I guess is not a problem,
since it's displaying the web page itself...

What happens if you load the image directly with eww?

https://static.fsf.org/nosvn/logos/fsf30-logo/fsf30-header-fsf.png

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-29  2:49                         ` Lars Ingebrigtsen
@ 2016-02-29  3:43                           ` Eli Zaretskii
  2016-02-29  4:38                             ` Lars Ingebrigtsen
  0 siblings, 1 reply; 124+ messages in thread
From: Eli Zaretskii @ 2016-02-29  3:43 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: j_l_domenech, a.s, 22789

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Alain Schneble <a.s@realize.ch>,  j_l_domenech@yahoo.com,  22789@debbugs.gnu.org
> Date: Mon, 29 Feb 2016 13:49:36 +1100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > No, I think it should: that write comes from GnuTLS, when it attempts
> > a handshake.  Returning EWOULDBLOCK tells GnuTLS to spin waiting until
> > the connection is complete.
> 
> No, it doesn't spin, it just tries again later (from the event loop).

That's what I meant by "spin" (what "event loop"?)

> But returning EWOULDBLOCK seems correct.
> 
> > The only problem left is that not all images on www.fsf.org's page are
> > downloaded; they are if I use http instead of https.  I guess this is
> > some eww thing?
> 
> No, you should get the images in the https version, too, unless it's
> unable to verify the TLS certificate.  Which I guess is not a problem,
> since it's displaying the web page itself...

Some of the images get downloaded, some don't.  And it isn't 100%
repeatable.  Strange...

> What happens if you load the image directly with eww?
> 
> https://static.fsf.org/nosvn/logos/fsf30-logo/fsf30-header-fsf.png

It's downloaded, of course.





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-29  3:43                           ` Eli Zaretskii
@ 2016-02-29  4:38                             ` Lars Ingebrigtsen
  0 siblings, 0 replies; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-29  4:38 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: j_l_domenech, a.s, 22789

Eli Zaretskii <eliz@gnu.org> writes:

>> No, you should get the images in the https version, too, unless it's
>> unable to verify the TLS certificate.  Which I guess is not a problem,
>> since it's displaying the web page itself...
>
> Some of the images get downloaded, some don't.  And it isn't 100%
> repeatable.  Strange...

Huh.  There's another timing issue, perhaps?

Or...  shr uses url-queue to download images, and it gives up on images
that take more than five seconds to download (by default).  Could that
be the case here?

Also, url caches images, so for greater repeatabilty it's often useful
to nuke the cache when testing:

$ rm -rf ~/.emacs.d/url/cache/

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-28 17:00                       ` Eli Zaretskii
  2016-02-29  2:49                         ` Lars Ingebrigtsen
@ 2016-02-29  9:55                         ` Alain Schneble
  2016-02-29 10:03                           ` Lars Ingebrigtsen
  1 sibling, 1 reply; 124+ messages in thread
From: Alain Schneble @ 2016-02-29  9:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, j_l_domenech, 22789

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Alain Schneble <a.s@realize.ch>
>> CC: <larsi@gnus.org>, <j_l_domenech@yahoo.com>, <22789@debbugs.gnu.org>
>> Date: Sun, 28 Feb 2016 10:48:37 +0100
>> 
>> Alas, even though it seems to help at least for the test code I tried,
>> turning WSAENOTCONN into EAGAIN seems wrong after all.
>
> It does here, although this needs to be done only if the socket is in
> the process of connecting, and the return value needs to be negative,
> not zero.  I installed a fix along these lines, and it seems to work
> for me: https://www.fsf.org is displayed OK.

Thanks!

>> It shouldn't try to write to the socket before it is connected at
>> all...(?)
>
> No, I think it should: that write comes from GnuTLS, when it attempts
> a handshake.  Returning EWOULDBLOCK tells GnuTLS to spin waiting until
> the connection is complete.  How else could this work, since we now
> proceed with GnuTLS handshake immediately after the call to 'connect'
> returns, when the connection is not yet complete, this being a
> non-blocking socket?

What I had in mind was to start the GnuTLS handshake (or even
gnutls_boot) only after the async socket has properly been connected.  I
just consulted the GnuTLS documentation and I understand now that what
you write above is indeed a supported GnuTLS scenario.  But I think it
is not an optimal one, because the number of TLS handshake retries will
then depend on the time it takes to setup the socket connection, IIUC
(see process.c: abort if p->gnutls_handshakes_tried >
GNUTLS_EMACS_HANDSHAKES_LIMIT).

>> Also the code "wraps" pfn_send and turns it into a blocking call.
>> Not sure what the implications are...
>
> The only implication is that we get ENOTCONN instead of EWOULDBLOCK.
> But that's easy to handle.

Ok, I see.  Thanks.

>> Nevertheless, don't you think the error handling in this code section is
>> not very elaborate and switching the order as shown above might be
>> better anyway?  sys_write is primarily about writing, not about
>> switching from non-blocking to blocking and back again...  Or shall it
>> somehow aggregate possible errors of both calls (pfn_send and
>> pfn_ioctlsocket)?
>
> Yes, you are right.  I did that.

Thanks.

> The only problem left is that not all images on www.fsf.org's page are
> downloaded; they are if I use http instead of https.  I guess this is
> some eww thing?

I guess it's not.  There are still some issues with the GnuTLS code
paths, I think.  I tried out the approach I proposed above, and it seems
to resolve this issue as well.  I'll try to arrange and propose a patch
to discuss in a follow up message.






^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-29  9:55                         ` Alain Schneble
@ 2016-02-29 10:03                           ` Lars Ingebrigtsen
  2016-02-29 17:57                             ` Alain Schneble
  0 siblings, 1 reply; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-29 10:03 UTC (permalink / raw)
  To: Alain Schneble; +Cc: j_l_domenech, 22789

Alain Schneble <a.s@realize.ch> writes:

> What I had in mind was to start the GnuTLS handshake (or even
> gnutls_boot) only after the async socket has properly been connected.  I
> just consulted the GnuTLS documentation and I understand now that what
> you write above is indeed a supported GnuTLS scenario.  But I think it
> is not an optimal one, because the number of TLS handshake retries will
> then depend on the time it takes to setup the socket connection, IIUC
> (see process.c: abort if p->gnutls_handshakes_tried >
> GNUTLS_EMACS_HANDSHAKES_LIMIT).

We could just increase that limit.  It's currently set to 100, which is
a number that's taken from thin air, I think?  It should probably be a
time-based handshake limit instead -- try handshaking for, say, ten
seconds before giving up...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-29 10:03                           ` Lars Ingebrigtsen
@ 2016-02-29 17:57                             ` Alain Schneble
  2016-02-29 18:45                               ` Eli Zaretskii
  2016-02-29 21:18                               ` Lars Ingebrigtsen
  0 siblings, 2 replies; 124+ messages in thread
From: Alain Schneble @ 2016-02-29 17:57 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: j_l_domenech, 22789

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Alain Schneble <a.s@realize.ch> writes:
>
>> What I had in mind was to start the GnuTLS handshake (or even
>> gnutls_boot) only after the async socket has properly been connected.  I
>> just consulted the GnuTLS documentation and I understand now that what
>> you write above is indeed a supported GnuTLS scenario.  But I think it
>> is not an optimal one, because the number of TLS handshake retries will
>> then depend on the time it takes to setup the socket connection, IIUC
>> (see process.c: abort if p->gnutls_handshakes_tried >
>> GNUTLS_EMACS_HANDSHAKES_LIMIT).
>
> We could just increase that limit.  It's currently set to 100, which is
> a number that's taken from thin air, I think?  It should probably be a
> time-based handshake limit instead -- try handshaking for, say, ten
> seconds before giving up...

A time-based limit sounds like a good idea to me.  It could even be
combined with a min-number-of-tries approach, like this:

if (TimeElapsed > Timeout && NumberOfTries > MinNumberOfTries) {
   // give up...
}

But the point I tried to address is the following: /When/ shall we start
with the handshake "series" and start counting the number of tries (or
stopwatch)?  Don't you agree that with async sockets, it doesn't make
much sense to start it before the socket is connected?  So we could just
postpone it until then...  Otherwise, the number of handshake tries (or
time elapsed) durnig the "socket not yet connected" are subtracted from
the max number of tries (or timeout) granted.  Which I think is, well,
at least imprecise...

^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-29 17:57                             ` Alain Schneble
@ 2016-02-29 18:45                               ` Eli Zaretskii
  2016-02-29 21:22                                 ` Lars Ingebrigtsen
  2016-02-29 23:13                                 ` Alain Schneble
  2016-02-29 21:18                               ` Lars Ingebrigtsen
  1 sibling, 2 replies; 124+ messages in thread
From: Eli Zaretskii @ 2016-02-29 18:45 UTC (permalink / raw)
  To: Alain Schneble; +Cc: larsi, j_l_domenech, 22789

> From: Alain Schneble <a.s@realize.ch>
> Date: Mon, 29 Feb 2016 18:57:28 +0100
> Cc: j_l_domenech@yahoo.com, 22789@debbugs.gnu.org
> 
> Lars Ingebrigtsen <larsi@gnus.org> writes:
> 
> > Alain Schneble <a.s@realize.ch> writes:
> >
> >> What I had in mind was to start the GnuTLS handshake (or even
> >> gnutls_boot) only after the async socket has properly been connected.  I
> >> just consulted the GnuTLS documentation and I understand now that what
> >> you write above is indeed a supported GnuTLS scenario.  But I think it
> >> is not an optimal one, because the number of TLS handshake retries will
> >> then depend on the time it takes to setup the socket connection, IIUC
> >> (see process.c: abort if p->gnutls_handshakes_tried >
> >> GNUTLS_EMACS_HANDSHAKES_LIMIT).
> >
> > We could just increase that limit.  It's currently set to 100, which is
> > a number that's taken from thin air, I think?  It should probably be a
> > time-based handshake limit instead -- try handshaking for, say, ten
> > seconds before giving up...
> 
> A time-based limit sounds like a good idea to me.  It could even be
> combined with a min-number-of-tries approach, like this:
> 
> if (TimeElapsed > Timeout && NumberOfTries > MinNumberOfTries) {
>    // give up...
> }

I already tried increasing the limit, it doesn't help: the new limit
is reached.  Interestingly, when the initial connection is made, for
the page itself, the handshake completes within 10 attempts.  But the
subsequent connections, presumably for images, don't succeed, for some
reason.

> But the point I tried to address is the following: /When/ shall we start
> with the handshake "series" and start counting the number of tries (or
> stopwatch)?  Don't you agree that with async sockets, it doesn't make
> much sense to start it before the socket is connected?  So we could just
> postpone it until then...  Otherwise, the number of handshake tries (or
> time elapsed) durnig the "socket not yet connected" are subtracted from
> the max number of tries (or timeout) granted.  Which I think is, well,
> at least imprecise...

I think we are looking in the wrong direction.  We need first to
understand why the connection(s) to download the images don't work.
Does anyone already have an idea why this happens?  If so, please
describe that.

Failing that, I came to a conclusion that I don't have a clear and
complete picture of what should happen when eww receives the page and
proceeds to downloading the images.  Lars, can you please describe
what eww does at this point, and how these downloads are expected to
work asynchronously?  You can describe what happens on GNU/Linux, if
that makes it easier.  In particular, what are the differences between
the initial connection to get the page (which works) and the
connections made to get the images (which don't work)?

There is also some disturbing signs in retrying GnuTLS handshake from
wait_reading_process_output -- I'm not sure the way that function
works, at least on Windows, is according to you expectations.  The
while loop there doesn't really spin all the time, did you know that?
Can you describe what you think should happen there for a connection
whose GnuTLS handshake is not yet complete?

^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-29 18:45                               ` Eli Zaretskii
@ 2016-02-29 21:22                                 ` Lars Ingebrigtsen
  2016-03-01  3:35                                   ` Eli Zaretskii
  2016-02-29 23:13                                 ` Alain Schneble
  1 sibling, 1 reply; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-29 21:22 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: j_l_domenech, Alain Schneble, 22789

Eli Zaretskii <eliz@gnu.org> writes:

> I already tried increasing the limit, it doesn't help: the new limit
> is reached.  Interestingly, when the initial connection is made, for
> the page itself, the handshake completes within 10 attempts.  But the
> subsequent connections, presumably for images, don't succeed, for some
> reason.

For the images, it tried handshaking 100 times and then marked the
connections as failed?

> Failing that, I came to a conclusion that I don't have a clear and
> complete picture of what should happen when eww receives the page and
> proceeds to downloading the images.  Lars, can you please describe
> what eww does at this point, and how these downloads are expected to
> work asynchronously?  You can describe what happens on GNU/Linux, if
> that makes it easier.  In particular, what are the differences between
> the initial connection to get the page (which works) and the
> connections made to get the images (which don't work)?

As I said before, the images are fetched from `url-queue', which gives
up if the image hasn't downloaded within five seconds.  Could that be
the case for you?

> There is also some disturbing signs in retrying GnuTLS handshake from
> wait_reading_process_output -- I'm not sure the way that function
> works, at least on Windows, is according to you expectations.  The
> while loop there doesn't really spin all the time, did you know that?

It runs every time something is available from poll, doesn't it?  Which
is what we care about.  I think.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-29 21:22                                 ` Lars Ingebrigtsen
@ 2016-03-01  3:35                                   ` Eli Zaretskii
  0 siblings, 0 replies; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-01  3:35 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: j_l_domenech, a.s, 22789

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Alain Schneble <a.s@realize.ch>,  j_l_domenech@yahoo.com,  22789@debbugs.gnu.org
> Date: Tue, 01 Mar 2016 08:22:35 +1100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > I already tried increasing the limit, it doesn't help: the new limit
> > is reached.  Interestingly, when the initial connection is made, for
> > the page itself, the handshake completes within 10 attempts.  But the
> > subsequent connections, presumably for images, don't succeed, for some
> > reason.
> 
> For the images, it tried handshaking 100 times and then marked the
> connections as failed?

It tried as many times as I gave it (I even tried 1000), yes.  I
didn't check that it marks the connection as failed, but that's
deterministic, no?

> > Failing that, I came to a conclusion that I don't have a clear and
> > complete picture of what should happen when eww receives the page and
> > proceeds to downloading the images.  Lars, can you please describe
> > what eww does at this point, and how these downloads are expected to
> > work asynchronously?  You can describe what happens on GNU/Linux, if
> > that makes it easier.  In particular, what are the differences between
> > the initial connection to get the page (which works) and the
> > connections made to get the images (which don't work)?
> 
> As I said before, the images are fetched from `url-queue', which gives
> up if the image hasn't downloaded within five seconds.  Could that be
> the case for you?

Unlikely: I tried enlarging the limit to 120 sec, and the problem
wasn't fixed.

I feel that we don't really understand the problem, so we are trying
random things "just because they are there".  I think we should try to
understand what's not working first.

> > There is also some disturbing signs in retrying GnuTLS handshake from
> > wait_reading_process_output -- I'm not sure the way that function
> > works, at least on Windows, is according to you expectations.  The
> > while loop there doesn't really spin all the time, did you know that?
> 
> It runs every time something is available from poll, doesn't it?  Which
> is what we care about.  I think.

But something might not become available from the poll for prolonged
periods of time.  Why would we rely on the loop there to crank
frequently enough for the purposes of completing the TLS handshake?
E.g., does it work well for you if you disable cursor blinking before
invoking eww?





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-29 18:45                               ` Eli Zaretskii
  2016-02-29 21:22                                 ` Lars Ingebrigtsen
@ 2016-02-29 23:13                                 ` Alain Schneble
  2016-03-01  0:41                                   ` Lars Ingebrigtsen
  2016-03-01  3:41                                   ` Eli Zaretskii
  1 sibling, 2 replies; 124+ messages in thread
From: Alain Schneble @ 2016-02-29 23:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, j_l_domenech, 22789

[-- Attachment #1: Type: text/plain, Size: 690 bytes --]

Eli Zaretskii <eliz@gnu.org> writes:

> I already tried increasing the limit, it doesn't help: the new limit
> is reached.  Interestingly, when the initial connection is made, for
> the page itself, the handshake completes within 10 attempts.  But the
> subsequent connections, presumably for images, don't succeed, for some
> reason.

Yes that's what I observed as well.  But also that GnuTLS returns -10
GNUTLS_E_INVALID_SESSION for some of the connections quite early.

Interestingly, I had the impression that it behaves better if the
subsequent handshakes are triggered only after the socket is connected.
But that may be by chance.  And could therefore be a red herring.

See patch:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Patch --]
[-- Type: text/x-patch, Size: 962 bytes --]

From 565ac4ce483faf65f2005b23bf806fff636f5cb1 Mon Sep 17 00:00:00 2001
From: Alain Schneble <a.s@realize.ch>
Date: Mon, 29 Feb 2016 23:37:41 +0100
Subject: [PATCH] Optimize GnuTLS handshake on async sockets

* src/process.c (wait_reading_process_output): start retrying GnuTLS
handshake only after async socket has properly connected.
---
 src/process.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/process.c b/src/process.c
index 85a4885..8aad8d3 100644
--- a/src/process.c
+++ b/src/process.c
@@ -4940,7 +4940,8 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd,
 #ifdef HAVE_GNUTLS
 		/* Continue TLS negotiation. */
 		if (p->gnutls_initstage == GNUTLS_STAGE_HANDSHAKE_TRIED
-		    && p->is_non_blocking_client)
+		    && p->is_non_blocking_client
+		    && (fd_info[p->infd].flags & FILE_CONNECT) == 0)
 		  {
 		    gnutls_try_handshake (p);
 		    p->gnutls_handshakes_tried++;
-- 
2.6.2.windows.1


[-- Attachment #3: Type: text/plain, Size: 1648 bytes --]


>> But the point I tried to address is the following: /When/ shall we start
>> with the handshake "series" and start counting the number of tries (or
>> stopwatch)?  Don't you agree that with async sockets, it doesn't make
>> much sense to start it before the socket is connected?  So we could just
>> postpone it until then...  Otherwise, the number of handshake tries (or
>> time elapsed) durnig the "socket not yet connected" are subtracted from
>> the max number of tries (or timeout) granted.  Which I think is, well,
>> at least imprecise...
>
> I think we are looking in the wrong direction.  We need first to
> understand why the connection(s) to download the images don't work.
> Does anyone already have an idea why this happens?  If so, please
> describe that.

I must admit that I have holes in my mental model, and I'm still
observing flows at runtime which seem strange to me.  So yes, it may be
the wrong direction regarding the /issue/.  But I was not only referring
to the issue, but also to an optimization of the new async paths...

> There is also some disturbing signs in retrying GnuTLS handshake from
> wait_reading_process_output -- I'm not sure the way that function
> works, at least on Windows, is according to you expectations.  The
> while loop there doesn't really spin all the time, did you know that?
> Can you describe what you think should happen there for a connection
> whose GnuTLS handshake is not yet complete?

Hmm.  What I observed is that it stops if the Emacs window looses its
focus (and no other window messages are dispatched to the window).  If
it has focus, it gets called at least once per second.

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-29 23:13                                 ` Alain Schneble
@ 2016-03-01  0:41                                   ` Lars Ingebrigtsen
  2016-03-01  3:41                                   ` Eli Zaretskii
  1 sibling, 0 replies; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-03-01  0:41 UTC (permalink / raw)
  To: Alain Schneble; +Cc: j_l_domenech, 22789

Alain Schneble <a.s@realize.ch> writes:

> Yes that's what I observed as well.  But also that GnuTLS returns -10
> GNUTLS_E_INVALID_SESSION for some of the connections quite early.

In my experience working with this stuff under Linux, if the session
goes invalid, it's because other parts of Emacs has been doing something
with the socket (either reading or (re-)writing bytes that should have
gone to the GnuTLS library).

Could you strace Emacs while it's doing a failed negotiation, and add
some printfs to the handshaking?  If I remember correctly, I discovered
one instance where the trace looked something like

Trying handshake...
write( ... )
EAGAIN...
Done trying
..
write( ... )
Trying handshake...
GNUTLS_E_INVALID_SESSION

And the reason was that Emacs in the polling code decided to try to
retransmit the data (until I made it stop doing that for TLS sockets).

Could there be something similar in the Windows code paths?

>  #ifdef HAVE_GNUTLS
>  		/* Continue TLS negotiation. */
>  		if (p->gnutls_initstage == GNUTLS_STAGE_HANDSHAKE_TRIED
> -		    && p->is_non_blocking_client)
> +		    && p->is_non_blocking_client
> +		    && (fd_info[p->infd].flags & FILE_CONNECT) == 0)
>  		  {
>  		    gnutls_try_handshake (p);
>  		    p->gnutls_handshakes_tried++;

I think this change makes sense.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-29 23:13                                 ` Alain Schneble
  2016-03-01  0:41                                   ` Lars Ingebrigtsen
@ 2016-03-01  3:41                                   ` Eli Zaretskii
  2016-03-01  4:29                                     ` Lars Ingebrigtsen
                                                       ` (2 more replies)
  1 sibling, 3 replies; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-01  3:41 UTC (permalink / raw)
  To: Alain Schneble; +Cc: larsi, j_l_domenech, 22789

> From: Alain Schneble <a.s@realize.ch>
> CC: <larsi@gnus.org>, <j_l_domenech@yahoo.com>, <22789@debbugs.gnu.org>
> Date: Tue, 1 Mar 2016 00:13:04 +0100
> 
> > There is also some disturbing signs in retrying GnuTLS handshake from
> > wait_reading_process_output -- I'm not sure the way that function
> > works, at least on Windows, is according to you expectations.  The
> > while loop there doesn't really spin all the time, did you know that?
> > Can you describe what you think should happen there for a connection
> > whose GnuTLS handshake is not yet complete?
> 
> Hmm.  What I observed is that it stops if the Emacs window looses its
> focus (and no other window messages are dispatched to the window).  If
> it has focus, it gets called at least once per second.

Disable cursor blinking and try again.





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-01  3:41                                   ` Eli Zaretskii
@ 2016-03-01  4:29                                     ` Lars Ingebrigtsen
  2016-03-01  4:30                                     ` Lars Ingebrigtsen
  2016-03-01 15:36                                     ` Alain Schneble
  2 siblings, 0 replies; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-03-01  4:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: j_l_domenech, Alain Schneble, 22789

Eli Zaretskii <eliz@gnu.org> writes:

> Disable cursor blinking and try again.

You are completely correct!  Without a blinking cursor, it doesn't
complete the negotiation for (some) of the images on
https://www.fsf.org!

Oops.

Urmn...  Well, we have to add some other mechanism to keep the forward
progress of the TLS negotiations going, then.  Hm...  A timer of some
kind?  That will not be running unless something is negotiated?  Hm...
Ideas?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-01  3:41                                   ` Eli Zaretskii
  2016-03-01  4:29                                     ` Lars Ingebrigtsen
@ 2016-03-01  4:30                                     ` Lars Ingebrigtsen
  2016-03-01  9:00                                       ` Andreas Schwab
  2016-03-01 15:36                                     ` Alain Schneble
  2 siblings, 1 reply; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-03-01  4:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: j_l_domenech, Alain Schneble, 22789

And I would guess that the same is true for the async DNS resolution --
those processes also need some progress, probably?  

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no






^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-01  4:30                                     ` Lars Ingebrigtsen
@ 2016-03-01  9:00                                       ` Andreas Schwab
  2016-03-01 14:12                                         ` Lars Ingebrigtsen
  0 siblings, 1 reply; 124+ messages in thread
From: Andreas Schwab @ 2016-03-01  9:00 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Alain Schneble, j_l_domenech, 22789

Lars Ingebrigtsen <larsi@gnus.org> writes:

> And I would guess that the same is true for the async DNS resolution --
> those processes also need some progress, probably?  

getaddrinfo_a provides async notification (SIGEV_SIGNAL).

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-01  9:00                                       ` Andreas Schwab
@ 2016-03-01 14:12                                         ` Lars Ingebrigtsen
  2016-03-01 14:25                                           ` Alain Schneble
  2016-03-01 15:53                                           ` Eli Zaretskii
  0 siblings, 2 replies; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-03-01 14:12 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Alain Schneble, j_l_domenech, 22789

Andreas Schwab <schwab@suse.de> writes:

> Lars Ingebrigtsen <larsi@gnus.org> writes:
>
>> And I would guess that the same is true for the async DNS resolution --
>> those processes also need some progress, probably?  
>
> getaddrinfo_a provides async notification (SIGEV_SIGNAL).

But we're not using that.

Anyway, all this async DNS/TLS stuff can probably be taken out of
wait_reading_process_output completely, and just be run from the
proposed new timer thing...  That'll be cleaner, too.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-01 14:12                                         ` Lars Ingebrigtsen
@ 2016-03-01 14:25                                           ` Alain Schneble
  2016-03-01 14:43                                             ` Lars Ingebrigtsen
  2016-03-01 15:59                                             ` Eli Zaretskii
  2016-03-01 15:53                                           ` Eli Zaretskii
  1 sibling, 2 replies; 124+ messages in thread
From: Alain Schneble @ 2016-03-01 14:25 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Andreas Schwab, j_l_domenech, 22789

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Andreas Schwab <schwab@suse.de> writes:
>
>> Lars Ingebrigtsen <larsi@gnus.org> writes:
>>
>>> And I would guess that the same is true for the async DNS resolution --
>>> those processes also need some progress, probably?  
>>
>> getaddrinfo_a provides async notification (SIGEV_SIGNAL).
>
> But we're not using that.
>
> Anyway, all this async DNS/TLS stuff can probably be taken out of
> wait_reading_process_output completely, and just be run from the
> proposed new timer thing...  That'll be cleaner, too.

But we could probably make use of it and it would not require the timer
at least for the async DNS resolution.  It would not solve the TLS issue
though.  But maybe there is a similar notification for async sockets
when they get connected?  If there exists an approach that does not
require timers, then I guess this would be the preferred one...






^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-01 14:25                                           ` Alain Schneble
@ 2016-03-01 14:43                                             ` Lars Ingebrigtsen
  2016-03-01 15:59                                             ` Eli Zaretskii
  1 sibling, 0 replies; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-03-01 14:43 UTC (permalink / raw)
  To: Alain Schneble; +Cc: Andreas Schwab, j_l_domenech, 22789

Alain Schneble <a.s@realize.ch> writes:

> But we could probably make use of it and it would not require the timer
> at least for the async DNS resolution.

It seemed to me (after Googling the issue for a few minutes) before
implementing the getaddrinfo_a stuff that getting signal delivery to
work across all architectures would be challenging.  If I was wrong
about that, then we should perhaps look into doing this with signals.

(Remember, we're going to drop the actual glibc getaddrinfo_a usage and
use a clone from gnulib that's going to work on all platforms...)

> It would not solve the TLS issue though.  But maybe there is a similar
> notification for async sockets when they get connected?  If there
> exists an approach that does not require timers, then I guess this
> would be the preferred one...

I haven't looked into that.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-01 14:25                                           ` Alain Schneble
  2016-03-01 14:43                                             ` Lars Ingebrigtsen
@ 2016-03-01 15:59                                             ` Eli Zaretskii
  2016-03-01 16:19                                               ` Alain Schneble
  2016-03-01 16:33                                               ` Andreas Schwab
  1 sibling, 2 replies; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-01 15:59 UTC (permalink / raw)
  To: Alain Schneble; +Cc: schwab, larsi, j_l_domenech, 22789

> From: Alain Schneble <a.s@realize.ch>
> Date: Tue, 1 Mar 2016 15:25:53 +0100
> Cc: Andreas Schwab <schwab@suse.de>, j_l_domenech@yahoo.com,
> 	22789@debbugs.gnu.org
> 
> > Andreas Schwab <schwab@suse.de> writes:
> >
> >> Lars Ingebrigtsen <larsi@gnus.org> writes:
> >>
> >>> And I would guess that the same is true for the async DNS resolution --
> >>> those processes also need some progress, probably?  
> >>
> >> getaddrinfo_a provides async notification (SIGEV_SIGNAL).
> >
> > But we're not using that.
> >
> > Anyway, all this async DNS/TLS stuff can probably be taken out of
> > wait_reading_process_output completely, and just be run from the
> > proposed new timer thing...  That'll be cleaner, too.
> 
> But we could probably make use of it and it would not require the timer
> at least for the async DNS resolution.  It would not solve the TLS issue
> though.  But maybe there is a similar notification for async sockets
> when they get connected?

How do you envision we should make use of these notifications through
signals?  We try very hard not to do anything non-trivial in a signal
handler, except setting a flag that is then tested in due time.  If
that is what you had in mind, then you will face the same problems
with testing the flag as we face now: if the loop in
wait_reading_process_output is stuck in a call to 'pselect' with a
large timeout, Emacs might not be able to test the flag until the
timeout ends.

> If there exists an approach that does not require timers, then I
> guess this would be the preferred one...

I think I suggested one such way.  In a nutshell, it does the same to
the loop as a timer would, but without actually running a timer.





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-01 15:59                                             ` Eli Zaretskii
@ 2016-03-01 16:19                                               ` Alain Schneble
  2016-03-01 17:00                                                 ` Eli Zaretskii
  2016-03-01 16:33                                               ` Andreas Schwab
  1 sibling, 1 reply; 124+ messages in thread
From: Alain Schneble @ 2016-03-01 16:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: schwab, larsi, j_l_domenech, 22789

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Alain Schneble <a.s@realize.ch>
>> Date: Tue, 1 Mar 2016 15:25:53 +0100
>> Cc: Andreas Schwab <schwab@suse.de>, j_l_domenech@yahoo.com,
>> 	22789@debbugs.gnu.org
>> 
>> But we could probably make use of it and it would not require the timer
>> at least for the async DNS resolution.  It would not solve the TLS issue
>> though.  But maybe there is a similar notification for async sockets
>> when they get connected?
>
> How do you envision we should make use of these notifications through
> signals?  We try very hard not to do anything non-trivial in a signal
> handler, except setting a flag that is then tested in due time.  If
> that is what you had in mind, then you will face the same problems
> with testing the flag as we face now: if the loop in
> wait_reading_process_output is stuck in a call to 'pselect' with a
> large timeout, Emacs might not be able to test the flag until the
> timeout ends.

What you describe above is what I had thought of.  Regarding large
timeout in pselect: isn't that what the sigmask to pselect is for?  To
wait for a signal in addition to a file descriptor?  Or am I
misunderstanding something?

>> If there exists an approach that does not require timers, then I
>> guess this would be the preferred one...
>
> I think I suggested one such way.  In a nutshell, it does the same to
> the loop as a timer would, but without actually running a timer.

That sounds like a better idea than having a separate timer...






^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-01 16:19                                               ` Alain Schneble
@ 2016-03-01 17:00                                                 ` Eli Zaretskii
  2016-03-01 17:09                                                   ` Alain Schneble
  0 siblings, 1 reply; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-01 17:00 UTC (permalink / raw)
  To: Alain Schneble; +Cc: schwab, larsi, j_l_domenech, 22789

> From: Alain Schneble <a.s@realize.ch>
> CC: <schwab@suse.de>, <larsi@gnus.org>, <j_l_domenech@yahoo.com>,
> 	<22789@debbugs.gnu.org>
> Date: Tue, 1 Mar 2016 17:19:00 +0100
> 
> >> But we could probably make use of it and it would not require the timer
> >> at least for the async DNS resolution.  It would not solve the TLS issue
> >> though.  But maybe there is a similar notification for async sockets
> >> when they get connected?
> >
> > How do you envision we should make use of these notifications through
> > signals?  We try very hard not to do anything non-trivial in a signal
> > handler, except setting a flag that is then tested in due time.  If
> > that is what you had in mind, then you will face the same problems
> > with testing the flag as we face now: if the loop in
> > wait_reading_process_output is stuck in a call to 'pselect' with a
> > large timeout, Emacs might not be able to test the flag until the
> > timeout ends.
> 
> What you describe above is what I had thought of.  Regarding large
> timeout in pselect: isn't that what the sigmask to pselect is for?  To
> wait for a signal in addition to a file descriptor?  Or am I
> misunderstanding something?

No, it's me: I forgot that a signal will interrupt 'pselect', which
will return with EINTR, and that is already handled.

So yes, if we have a signal that is delivered from one of these
handshakes, it will cause the loop to run again.





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-01 17:00                                                 ` Eli Zaretskii
@ 2016-03-01 17:09                                                   ` Alain Schneble
  2016-03-01 17:22                                                     ` Eli Zaretskii
  0 siblings, 1 reply; 124+ messages in thread
From: Alain Schneble @ 2016-03-01 17:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: schwab, larsi, j_l_domenech, 22789

Eli Zaretskii <eliz@gnu.org> writes:

> So yes, if we have a signal that is delivered from one of these
> handshakes, it will cause the loop to run again.

I guess if we do a fcntl(socket, F_SETFL, O_ASYNC), we may get a
notification if the socket has been connected.  This could be the
trigger to do the first TLS handshake try.  For subsequent tries, if
needed, we somehow have to rescedule them.  For example with just
another round, triggered by a short pselect timeout, like you proposed.

^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-01 17:09                                                   ` Alain Schneble
@ 2016-03-01 17:22                                                     ` Eli Zaretskii
  2016-03-01 17:55                                                       ` Alain Schneble
  0 siblings, 1 reply; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-01 17:22 UTC (permalink / raw)
  To: Alain Schneble; +Cc: schwab, larsi, j_l_domenech, 22789

> From: Alain Schneble <a.s@realize.ch>
> CC: <schwab@suse.de>, <larsi@gnus.org>, <j_l_domenech@yahoo.com>,
> 	<22789@debbugs.gnu.org>
> Date: Tue, 1 Mar 2016 18:09:40 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > So yes, if we have a signal that is delivered from one of these
> > handshakes, it will cause the loop to run again.
> 
> I guess if we do a fcntl(socket, F_SETFL, O_ASYNC), we may get a
> notification if the socket has been connected.

Doesn't the socket become read-ready in 'pselect' once it is
connected?





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-01 17:22                                                     ` Eli Zaretskii
@ 2016-03-01 17:55                                                       ` Alain Schneble
  2016-03-01 18:13                                                         ` Eli Zaretskii
  0 siblings, 1 reply; 124+ messages in thread
From: Alain Schneble @ 2016-03-01 17:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: schwab, larsi, j_l_domenech, 22789

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Alain Schneble <a.s@realize.ch>
>> CC: <schwab@suse.de>, <larsi@gnus.org>, <j_l_domenech@yahoo.com>,
>> 	<22789@debbugs.gnu.org>
>> Date: Tue, 1 Mar 2016 18:09:40 +0100
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > So yes, if we have a signal that is delivered from one of these
>> > handshakes, it will cause the loop to run again.
>> 
>> I guess if we do a fcntl(socket, F_SETFL, O_ASYNC), we may get a
>> notification if the socket has been connected.
>
> Doesn't the socket become read-ready in 'pselect' once it is
> connected?

True, that may indeed be the case (or write-ready?  I'm confused ;( ).






^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-01 17:55                                                       ` Alain Schneble
@ 2016-03-01 18:13                                                         ` Eli Zaretskii
  0 siblings, 0 replies; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-01 18:13 UTC (permalink / raw)
  To: Alain Schneble; +Cc: schwab, larsi, j_l_domenech, 22789

> From: Alain Schneble <a.s@realize.ch>
> CC: <schwab@suse.de>, <larsi@gnus.org>, <j_l_domenech@yahoo.com>,
> 	<22789@debbugs.gnu.org>
> Date: Tue, 1 Mar 2016 18:55:11 +0100
> 
> > Doesn't the socket become read-ready in 'pselect' once it is
> > connected?
> 
> True, that may indeed be the case (or write-ready?  I'm confused ;( ).

I think you are right, and it's write-ready.  At least that's what the
w32 emulation does.





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-01 15:59                                             ` Eli Zaretskii
  2016-03-01 16:19                                               ` Alain Schneble
@ 2016-03-01 16:33                                               ` Andreas Schwab
  1 sibling, 0 replies; 124+ messages in thread
From: Andreas Schwab @ 2016-03-01 16:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, Alain Schneble, j_l_domenech, 22789

Eli Zaretskii <eliz@gnu.org> writes:

> How do you envision we should make use of these notifications through
> signals?

I think that could be as simple as interrupting the pselect call by the
signal.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-01 14:12                                         ` Lars Ingebrigtsen
  2016-03-01 14:25                                           ` Alain Schneble
@ 2016-03-01 15:53                                           ` Eli Zaretskii
  1 sibling, 0 replies; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-01 15:53 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: schwab, j_l_domenech, a.s, 22789

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Eli Zaretskii <eliz@gnu.org>,  j_l_domenech@yahoo.com,  Alain Schneble <a.s@realize.ch>,  22789@debbugs.gnu.org
> Date: Wed, 02 Mar 2016 01:12:21 +1100
> 
> Anyway, all this async DNS/TLS stuff can probably be taken out of
> wait_reading_process_output completely, and just be run from the
> proposed new timer thing...  That'll be cleaner, too.

Not sure I understand: you want to launch a timer from C?  And you
want to have its function to do nothing but sit-for, or something
similar?  That doesn't sound like very clean to me, but maybe I'm
missing something.

I think adding simple logic in the wait_reading_process_output loop to
reduce the timeout for 'pselect', that I suggested in my previous
message to this thread is simpler and cleaner.





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-01  3:41                                   ` Eli Zaretskii
  2016-03-01  4:29                                     ` Lars Ingebrigtsen
  2016-03-01  4:30                                     ` Lars Ingebrigtsen
@ 2016-03-01 15:36                                     ` Alain Schneble
  2016-03-01 16:05                                       ` Eli Zaretskii
  2 siblings, 1 reply; 124+ messages in thread
From: Alain Schneble @ 2016-03-01 15:36 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, j_l_domenech, 22789

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Alain Schneble <a.s@realize.ch>
>> CC: <larsi@gnus.org>, <j_l_domenech@yahoo.com>, <22789@debbugs.gnu.org>
>> Date: Tue, 1 Mar 2016 00:13:04 +0100
>> 
>> > There is also some disturbing signs in retrying GnuTLS handshake from
>> > wait_reading_process_output -- I'm not sure the way that function
>> > works, at least on Windows, is according to you expectations.  The
>> > while loop there doesn't really spin all the time, did you know that?
>> > Can you describe what you think should happen there for a connection
>> > whose GnuTLS handshake is not yet complete?
>> 
>> Hmm.  What I observed is that it stops if the Emacs window looses its
>> focus (and no other window messages are dispatched to the window).  If
>> it has focus, it gets called at least once per second.
>
> Disable cursor blinking and try again.

Yes, you are right.  The frequency it enters the loop drops radically
once the cursor doesn't blink.

There is one strange thing I'm observing though.  If I constantly hover
over the Window with the mouse, also then, not all images get downloaded
most of the time.  The reason then is, as mentioned in a previous post,
some of the gnutls_try_handshake calls return -10
GNUTLS_E_INVALID_SESSION.  So there must still be another issue, because
with the mouse hovering, it seems to me that it enters the loop enough
frequently to do TRT.

(strangely enough, if I postpone gnutls_try_handshake to after the
socket is connected, it seems to work reliably in conjunction with the
mouse hovering...)






^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-01 15:36                                     ` Alain Schneble
@ 2016-03-01 16:05                                       ` Eli Zaretskii
  2016-03-01 16:25                                         ` Alain Schneble
  2016-03-04  8:56                                         ` Eli Zaretskii
  0 siblings, 2 replies; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-01 16:05 UTC (permalink / raw)
  To: Alain Schneble; +Cc: larsi, j_l_domenech, 22789

> From: Alain Schneble <a.s@realize.ch>
> CC: <larsi@gnus.org>, <j_l_domenech@yahoo.com>, <22789@debbugs.gnu.org>
> Date: Tue, 1 Mar 2016 16:36:03 +0100
> 
> There is one strange thing I'm observing though.  If I constantly hover
> over the Window with the mouse, also then, not all images get downloaded
> most of the time.  The reason then is, as mentioned in a previous post,
> some of the gnutls_try_handshake calls return -10
> GNUTLS_E_INVALID_SESSION.  So there must still be another issue, because
> with the mouse hovering, it seems to me that it enters the loop enough
> frequently to do TRT.

I have no doubt there are w32 aspects to this problem.  But I see no
reason to try debugging that as long as this machinery doesn't work
well on Posix hosts, because we will bump into issues that have
nothing to do with how w32 emulates Posix functionality.  Especially
if we are now seriously discussing radical changes in the design (like
running the handshake from a timer or such likes), which, if
implemented will change the code significantly, and will no doubt
affect what the w32 port needs (or doesn't meed) to do to support
that.

IOW, let's return to the w32-specific issues when the dust settles on
the Posix code.

^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-01 16:05                                       ` Eli Zaretskii
@ 2016-03-01 16:25                                         ` Alain Schneble
  2016-03-04  8:56                                         ` Eli Zaretskii
  1 sibling, 0 replies; 124+ messages in thread
From: Alain Schneble @ 2016-03-01 16:25 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, j_l_domenech, 22789

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Alain Schneble <a.s@realize.ch>
>> CC: <larsi@gnus.org>, <j_l_domenech@yahoo.com>, <22789@debbugs.gnu.org>
>> Date: Tue, 1 Mar 2016 16:36:03 +0100
>> 
>> There is one strange thing I'm observing though.  If I constantly hover
>> over the Window with the mouse, also then, not all images get downloaded
>> most of the time.  The reason then is, as mentioned in a previous post,
>> some of the gnutls_try_handshake calls return -10
>> GNUTLS_E_INVALID_SESSION.  So there must still be another issue, because
>> with the mouse hovering, it seems to me that it enters the loop enough
>> frequently to do TRT.
>
> I have no doubt there are w32 aspects to this problem.  But I see no
> reason to try debugging that as long as this machinery doesn't work
> well on Posix hosts, because we will bump into issues that have
> nothing to do with how w32 emulates Posix functionality.  Especially
> if we are now seriously discussing radical changes in the design (like
> running the handshake from a timer or such likes), which, if
> implemented will change the code significantly, and will no doubt
> affect what the w32 port needs (or doesn't meed) to do to support
> that.
>
> IOW, let's return to the w32-specific issues when the dust settles on
> the Posix code.

Ok, that sounds wise.






^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-01 16:05                                       ` Eli Zaretskii
  2016-03-01 16:25                                         ` Alain Schneble
@ 2016-03-04  8:56                                         ` Eli Zaretskii
  2016-03-04 16:55                                           ` Alain Schneble
  1 sibling, 1 reply; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-04  8:56 UTC (permalink / raw)
  To: a.s, larsi; +Cc: j_l_domenech, 22789

> Date: Tue, 01 Mar 2016 18:05:36 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: larsi@gnus.org, j_l_domenech@yahoo.com, 22789@debbugs.gnu.org
> 
> IOW, let's return to the w32-specific issues when the dust settles on
> the Posix code.

It sounds like that part happened already, so do you still see any
w32-specific issues with this?





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-04  8:56                                         ` Eli Zaretskii
@ 2016-03-04 16:55                                           ` Alain Schneble
  2016-03-04 21:36                                             ` Alain Schneble
  0 siblings, 1 reply; 124+ messages in thread
From: Alain Schneble @ 2016-03-04 16:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, j_l_domenech, 22789

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Tue, 01 Mar 2016 18:05:36 +0200
>> From: Eli Zaretskii <eliz@gnu.org>
>> Cc: larsi@gnus.org, j_l_domenech@yahoo.com, 22789@debbugs.gnu.org
>> 
>> IOW, let's return to the w32-specific issues when the dust settles on
>> the Posix code.
>
> It sounds like that part happened already, so do you still see any
> w32-specific issues with this?

Sorry for the delay.  It seems like there are still some issues, at
least on my system and even without any debugger attached.  I'm
currently trying to find the cause...






^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-04 16:55                                           ` Alain Schneble
@ 2016-03-04 21:36                                             ` Alain Schneble
  2016-03-04 22:33                                               ` Alain Schneble
                                                                 ` (2 more replies)
  0 siblings, 3 replies; 124+ messages in thread
From: Alain Schneble @ 2016-03-04 21:36 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, j_l_domenech, 22789

[-- Attachment #1: Type: text/plain, Size: 1876 bytes --]

Alain Schneble <a.s@realize.ch> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>>> Date: Tue, 01 Mar 2016 18:05:36 +0200
>>> From: Eli Zaretskii <eliz@gnu.org>
>>> Cc: larsi@gnus.org, j_l_domenech@yahoo.com, 22789@debbugs.gnu.org
>>> 
>>> IOW, let's return to the w32-specific issues when the dust settles on
>>> the Posix code.
>>
>> It sounds like that part happened already, so do you still see any
>> w32-specific issues with this?
>
> Sorry for the delay.  It seems like there are still some issues, at
> least on my system and even without any debugger attached.  I'm
> currently trying to find the cause...

I have the impression that GnuTLS doesn't like it too much if we start
retrying the handshake many times before the socket is connected.  At
least on MS-Windows.  In nearly all of the cases of loading websites
with around 20 images, I observe arbitrary failures of
gnutls_try_handshake which usually end up with -10
GNUTLS_E_INVALID_SESSION.

I believe this because the following patch solves the issue on my
MS-Windows system: Postponing the handshake until after the socket is
connected.  Still, I must be honest: I'm in a kind of a trial-and-error
mode.  I do not really understand all the aspects of the current
implementation.  Anyway, I think a change in that direction would
probably be a good thing.  Do you agree?  It eliminates all the
handshake-retries that would otherwise happen before the socket is
connected.

I wonder if the issues that you observed with gdb attached would go away
with this patch as well...  You had these issues under GNU/Linux, right?
It's a bit embarrassing, but I did not yet have time to learn how to use
gdb to debug Emacs.  (But its on my todo list.)  Otherwise I would have
tried it out quickly.

BTW, `libgnutls-version' evaluates to 30408 on my MS-Windows.

And here is the intermediate-and-not-cleaned-up patch:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Patch --]
[-- Type: text/x-patch, Size: 2796 bytes --]

From cefaba56a33046b588eab81b3ca58224830a44f9 Mon Sep 17 00:00:00 2001
From: Alain Schneble <a.s@realize.ch>
Date: Fri, 4 Mar 2016 21:51:31 +0100
Subject: [PATCH] Wait for GnuTLS handshake until socket is connected

* src/gnutls.c (emacs_gnutls_handshake, gnutls_try_handshake): Skip
GnuTLS handshake when gnutls_boot is called on async socket (aka non
blocking client).

* src/process.c (connect_network_socket, wait_reading_process_output):
Proceed with GnuTLS handshake only after async socket has been
connected.
---
 src/gnutls.c  | 11 ++++++++---
 src/process.c |  9 ++++++---
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/src/gnutls.c b/src/gnutls.c
index 988c010..ddf4648 100644
--- a/src/gnutls.c
+++ b/src/gnutls.c
@@ -403,9 +403,6 @@ gnutls_try_handshake (struct Lisp_Process *proc)
   gnutls_session_t state = proc->gnutls_state;
   int ret;
 
-  if (proc->is_non_blocking_client)
-    proc->gnutls_p = true;
-
   do
     {
       ret = gnutls_handshake (state);
@@ -426,6 +423,8 @@ gnutls_try_handshake (struct Lisp_Process *proc)
     {
       /* check_memory_full (gnutls_alert_send_appropriate (state, ret));  */
     }
+
+  //printf ("gnutls_try_handshake: proc fd=%d, ret=%d\n", proc->infd, ret);
   return ret;
 }
 
@@ -474,6 +473,12 @@ emacs_gnutls_handshake (struct Lisp_Process *proc)
       proc->gnutls_initstage = GNUTLS_STAGE_TRANSPORT_POINTERS_SET;
     }
 
+  if (proc->is_non_blocking_client)
+    {
+      proc->gnutls_p = true;      
+      return GNUTLS_E_AGAIN;
+    }
+
   return gnutls_try_handshake (proc);
 }
 
diff --git a/src/process.c b/src/process.c
index 4359f68..bd1c45f 100644
--- a/src/process.c
+++ b/src/process.c
@@ -3415,7 +3415,8 @@ connect_network_socket (Lisp_Object proc, Lisp_Object ip_addresses)
       if (p->gnutls_initstage == GNUTLS_STAGE_READY)
 	/* Run sentinels, etc. */
 	finish_after_tls_connection (proc);
-      else if (p->gnutls_initstage != GNUTLS_STAGE_HANDSHAKE_TRIED)
+      else if ((! p->is_non_blocking_client && p->gnutls_initstage != GNUTLS_STAGE_HANDSHAKE_TRIED) ||
+	       (p->is_non_blocking_client && p->gnutls_initstage != GNUTLS_STAGE_TRANSPORT_POINTERS_SET))
 	{
 	  deactivate_process (proc);
 	  if (NILP (boot))
@@ -4950,8 +4951,10 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd,
 #endif
 #ifdef HAVE_GNUTLS
 		/* Continue TLS negotiation. */
-		if (p->gnutls_initstage == GNUTLS_STAGE_HANDSHAKE_TRIED
-		    && p->is_non_blocking_client)
+		if ((p->gnutls_initstage == GNUTLS_STAGE_TRANSPORT_POINTERS_SET ||
+		     p->gnutls_initstage == GNUTLS_STAGE_HANDSHAKE_TRIED)
+		    && p->is_non_blocking_client
+		    && (! FD_ISSET (p->outfd, &connect_wait_mask)))
 		  {
 		    gnutls_try_handshake (p);
 		    p->gnutls_handshakes_tried++;
-- 
2.6.2.windows.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-04 21:36                                             ` Alain Schneble
@ 2016-03-04 22:33                                               ` Alain Schneble
  2016-03-05  8:23                                               ` Eli Zaretskii
  2016-03-05  8:46                                               ` Lars Magne Ingebrigtsen
  2 siblings, 0 replies; 124+ messages in thread
From: Alain Schneble @ 2016-03-04 22:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, j_l_domenech, 22789

[-- Attachment #1: Type: text/plain, Size: 262 bytes --]

Alain Schneble <a.s@realize.ch> writes:

> Alain Schneble <a.s@realize.ch> writes:
>
> And here is the intermediate-and-not-cleaned-up patch:

Sorry, the former patch did break the new retry_for_async flag.  Here is
a corrected one (but still experimental...):


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Patch --]
[-- Type: text/x-patch, Size: 2649 bytes --]

From eedbcea0224febf579bedb44a9bb2b280abcdc4a Mon Sep 17 00:00:00 2001
From: Alain Schneble <a.s@realize.ch>
Date: Fri, 4 Mar 2016 23:17:07 +0100
Subject: [PATCH] Wait for GnuTLS handshake until socket is connected

* src/gnutls.c (emacs_gnutls_handshake, gnutls_try_handshake): Skip
GnuTLS handshake when gnutls_boot is called on async socket (aka non
blocking client).

* src/process.c (connect_network_socket, wait_reading_process_output):
Proceed with GnuTLS handshake only after async socket has been
connected.
---
 src/gnutls.c  |  9 ++++++---
 src/process.c | 13 +++++++++----
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/src/gnutls.c b/src/gnutls.c
index 988c010..4962ecc 100644
--- a/src/gnutls.c
+++ b/src/gnutls.c
@@ -403,9 +403,6 @@ gnutls_try_handshake (struct Lisp_Process *proc)
   gnutls_session_t state = proc->gnutls_state;
   int ret;
 
-  if (proc->is_non_blocking_client)
-    proc->gnutls_p = true;
-
   do
     {
       ret = gnutls_handshake (state);
@@ -474,6 +471,12 @@ emacs_gnutls_handshake (struct Lisp_Process *proc)
       proc->gnutls_initstage = GNUTLS_STAGE_TRANSPORT_POINTERS_SET;
     }
 
+  if (proc->is_non_blocking_client)
+    {
+      proc->gnutls_p = true;
+      return GNUTLS_E_AGAIN;
+    }
+
   return gnutls_try_handshake (proc);
 }
 
diff --git a/src/process.c b/src/process.c
index 4359f68..d73586c 100644
--- a/src/process.c
+++ b/src/process.c
@@ -3415,7 +3415,8 @@ connect_network_socket (Lisp_Object proc, Lisp_Object ip_addresses)
       if (p->gnutls_initstage == GNUTLS_STAGE_READY)
 	/* Run sentinels, etc. */
 	finish_after_tls_connection (proc);
-      else if (p->gnutls_initstage != GNUTLS_STAGE_HANDSHAKE_TRIED)
+      else if ((! p->is_non_blocking_client && p->gnutls_initstage != GNUTLS_STAGE_HANDSHAKE_TRIED) ||
+	       (p->is_non_blocking_client && p->gnutls_initstage != GNUTLS_STAGE_TRANSPORT_POINTERS_SET))
 	{
 	  deactivate_process (proc);
 	  if (NILP (boot))
@@ -4950,11 +4951,15 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd,
 #endif
 #ifdef HAVE_GNUTLS
 		/* Continue TLS negotiation. */
-		if (p->gnutls_initstage == GNUTLS_STAGE_HANDSHAKE_TRIED
+		if ((p->gnutls_initstage == GNUTLS_STAGE_TRANSPORT_POINTERS_SET ||
+		     p->gnutls_initstage == GNUTLS_STAGE_HANDSHAKE_TRIED)
 		    && p->is_non_blocking_client)
 		  {
-		    gnutls_try_handshake (p);
-		    p->gnutls_handshakes_tried++;
+		    if (! FD_ISSET (p->outfd, &connect_wait_mask))
+		      {
+			gnutls_try_handshake (p);
+			p->gnutls_handshakes_tried++;
+		      }
 
 		    if (p->gnutls_initstage == GNUTLS_STAGE_READY)
 		      {
-- 
2.6.2.windows.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-04 21:36                                             ` Alain Schneble
  2016-03-04 22:33                                               ` Alain Schneble
@ 2016-03-05  8:23                                               ` Eli Zaretskii
  2016-03-05 18:27                                                 ` Alain Schneble
  2016-03-05  8:46                                               ` Lars Magne Ingebrigtsen
  2 siblings, 1 reply; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-05  8:23 UTC (permalink / raw)
  To: Alain Schneble; +Cc: larsi, j_l_domenech, 22789

> From: Alain Schneble <a.s@realize.ch>
> CC: <larsi@gnus.org>, <j_l_domenech@yahoo.com>, <22789@debbugs.gnu.org>
> Date: Fri, 4 Mar 2016 22:36:56 +0100
> 
> I have the impression that GnuTLS doesn't like it too much if we start
> retrying the handshake many times before the socket is connected.  At
> least on MS-Windows.  In nearly all of the cases of loading websites
> with around 20 images, I observe arbitrary failures of
> gnutls_try_handshake which usually end up with -10
> GNUTLS_E_INVALID_SESSION.

I think this warrants a question to GnuTLS developers.  We need to
understand this before we devise a solution.  What bothers me is the
"many times" part -- how many is "too much", and why?  Do you see any
difference in behavior of sys_write during those many attempts as
opposed to the first few?

Also, what URL do you use for testing this?

> I believe this because the following patch solves the issue on my
> MS-Windows system: Postponing the handshake until after the socket is
> connected.  Still, I must be honest: I'm in a kind of a trial-and-error
> mode.  I do not really understand all the aspects of the current
> implementation.

Feel free to ask, I think I can answer any question about the Emacs
part of this, but probably not about the GnuTLS part -- those we
should ask on the GnuTLS mailing list.

> Anyway, I think a change in that direction would
> probably be a good thing.  Do you agree?  It eliminates all the
> handshake-retries that would otherwise happen before the socket is
> connected.

Why is it needed only on Windows?  Why does it matter what reason
causes the failure of a handshake?  We need to understand these
aspects before we consider the solutions.

> BTW, `libgnutls-version' evaluates to 30408 on my MS-Windows.

It's 30311 here, but I'm not sure this is a factor.  We are talking
about basic functionality here.

Thanks.

^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-05  8:23                                               ` Eli Zaretskii
@ 2016-03-05 18:27                                                 ` Alain Schneble
  2016-03-05 19:21                                                   ` Eli Zaretskii
  2016-03-06  9:31                                                   ` Lars Magne Ingebrigtsen
  0 siblings, 2 replies; 124+ messages in thread
From: Alain Schneble @ 2016-03-05 18:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, j_l_domenech, 22789

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Alain Schneble <a.s@realize.ch>
>> CC: <larsi@gnus.org>, <j_l_domenech@yahoo.com>, <22789@debbugs.gnu.org>
>> Date: Fri, 4 Mar 2016 22:36:56 +0100
>> 
>> I have the impression that GnuTLS doesn't like it too much if we start
>> retrying the handshake many times before the socket is connected.  At
>> least on MS-Windows.  In nearly all of the cases of loading websites
>> with around 20 images, I observe arbitrary failures of
>> gnutls_try_handshake which usually end up with -10
>> GNUTLS_E_INVALID_SESSION.
>
> I think this warrants a question to GnuTLS developers.  We need to
> understand this before we devise a solution.  What bothers me is the
> "many times" part -- how many is "too much", and why?

Yes, of course, I agree we have to understand it.  I just thought that
maybe the patch would help us in narrowing down the issue further...
And maybe we should wait until somebody else sees the same effects I
described on MS-Windows before asking GnuTLS developers?  Did you see
these problems as well?

I should have written "multiple times" not "many times".  I just had a
case where for four network processes (get image requests)
gnutls_try_handshake returned for each of these processes -28
GNUTLS_E_AGAIN three times in a row, followed by a single -110
GNUTLS_E_PREMATURE_TERMINATION and then repeatedly by -10
GNUTLS_E_INVALID_SESSION.

>                                                        Do you see any
> difference in behavior of sys_write during those many attempts as
> opposed to the first few?

Good point.  I'll have to analyze this.

> Also, what URL do you use for testing this?

Because I'm on MS-Windows, I used https://www.microsoft.com :)  (it gets
redirected to https://www.microsoft.com/de-ch for me).  I currently only
have access to a WLAN not under my control.  But the same Website loads
reliably and pretty fast in other web browsers with the same connection.

>> I believe this because the following patch solves the issue on my
>> MS-Windows system: Postponing the handshake until after the socket is
>> connected.  Still, I must be honest: I'm in a kind of a trial-and-error
>> mode.  I do not really understand all the aspects of the current
>> implementation.
>
> Feel free to ask, I think I can answer any question about the Emacs
> part of this, but probably not about the GnuTLS part -- those we
> should ask on the GnuTLS mailing list.

Ok, thank you!  I'll be happy to ask, but I would like to spend some
more time to re-read the code once again and clean up my mind first.

>> Anyway, I think a change in that direction would
>> probably be a good thing.  Do you agree?  It eliminates all the
>> handshake-retries that would otherwise happen before the socket is
>> connected.
>
> Why is it needed only on Windows?  Why does it matter what reason
> causes the failure of a handshake?  We need to understand these
> aspects before we consider the solutions.

I'm currently unable to answer these questions.  I see that there are
many differencies in Emacs' "platform adaption layer" for w32 vs the
paths for GNU/Linux.  And I cannot see if and how the patch I sent could
be related to those differencies.  So I follow your advice and will try
to understand the /issue/ first.

>> BTW, `libgnutls-version' evaluates to 30408 on my MS-Windows.
>
> It's 30311 here, but I'm not sure this is a factor.  We are talking
> about basic functionality here.

Thanks.

^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-05 18:27                                                 ` Alain Schneble
@ 2016-03-05 19:21                                                   ` Eli Zaretskii
  2016-03-06 22:45                                                     ` Alain Schneble
  2016-03-06  9:31                                                   ` Lars Magne Ingebrigtsen
  1 sibling, 1 reply; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-05 19:21 UTC (permalink / raw)
  To: Alain Schneble; +Cc: larsi, j_l_domenech, 22789

> From: Alain Schneble <a.s@realize.ch>
> CC: <larsi@gnus.org>, <j_l_domenech@yahoo.com>, <22789@debbugs.gnu.org>
> Date: Sat, 5 Mar 2016 19:27:09 +0100
> 
> I should have written "multiple times" not "many times".  I just had a
> case where for four network processes (get image requests)
> gnutls_try_handshake returned for each of these processes -28
> GNUTLS_E_AGAIN three times in a row, followed by a single -110
> GNUTLS_E_PREMATURE_TERMINATION and then repeatedly by -10
> GNUTLS_E_INVALID_SESSION.

Sounds like we are terminating the connection while the handshake is
still in progress?





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-05 19:21                                                   ` Eli Zaretskii
@ 2016-03-06 22:45                                                     ` Alain Schneble
  2016-03-06 23:24                                                       ` Alain Schneble
  0 siblings, 1 reply; 124+ messages in thread
From: Alain Schneble @ 2016-03-06 22:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, j_l_domenech, 22789

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Alain Schneble <a.s@realize.ch>
>> CC: <larsi@gnus.org>, <j_l_domenech@yahoo.com>, <22789@debbugs.gnu.org>
>> Date: Sat, 5 Mar 2016 19:27:09 +0100
>> 
>> I should have written "multiple times" not "many times".  I just had a
>> case where for four network processes (get image requests)
>> gnutls_try_handshake returned for each of these processes -28
>> GNUTLS_E_AGAIN three times in a row, followed by a single -110
>> GNUTLS_E_PREMATURE_TERMINATION and then repeatedly by -10
>> GNUTLS_E_INVALID_SESSION.
>
> Sounds like we are terminating the connection while the handshake is
> still in progress?

I'm pretty sure we are not terminating the connection prematurely.

After some more debugging, I found a way to reliably solve the issue.
It seems like switching the socket to blocking before calling send and
back to non-blocking while the socket is not connected in sys_write
somehow makes the socket unreliable.

I'll send a patch for further discussions shortly.






^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-06 22:45                                                     ` Alain Schneble
@ 2016-03-06 23:24                                                       ` Alain Schneble
  2016-03-07  8:49                                                         ` Alain Schneble
  2016-03-07 16:07                                                         ` Eli Zaretskii
  0 siblings, 2 replies; 124+ messages in thread
From: Alain Schneble @ 2016-03-06 23:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, j_l_domenech, 22789

[-- Attachment #1: Type: text/plain, Size: 503 bytes --]

Alain Schneble <a.s@realize.ch> writes:

> I'll send a patch for further discussions shortly.

And here it is.  The fix is quite simple.  It ensures that sys_write
exits before touching the socket if it is not connected yet.
Unfortunately I didn't find any documentation on winsock ioctlsocket
that would prove that this is indeed required.  But it seems not wrong
to me anyway.  (I'll try to search the wisock documentation tomorrow to
find some hints that lead in this direction, or maybe you know?)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Patch --]
[-- Type: text/x-patch, Size: 1140 bytes --]

From 01a475ea41265929951e6d14f6dd216671b63331 Mon Sep 17 00:00:00 2001
From: Alain Schneble <a.s@realize.ch>
Date: Mon, 7 Mar 2016 00:00:57 +0100
Subject: [PATCH] Solve async GnuTLS handshake issue on w32

* src/w32.c (sys_write): For non-blocking sockets, return immediately
with EWOULDBLOCK.  This ensures we do not temporarily turn the socket
into blocking mode for the pfn_send call if the socket is not (yet)
connected.  It turned out that doing so causes arbitrary GnuTLS
handshake failures on MS-Windows.  (bug#22789)
---
 src/w32.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/src/w32.c b/src/w32.c
index 998f696..ee8cf6c 100644
--- a/src/w32.c
+++ b/src/w32.c
@@ -8647,6 +8647,12 @@ sys_write (int fd, const void * buffer, unsigned int count)
       unsigned long nblock = 0;
       if (winsock_lib == NULL) emacs_abort ();
 
+      if ((fd_info[fd].flags & FILE_CONNECT) != 0)
+	{
+	  errno = EWOULDBLOCK;
+	  return -1;
+	}
+
       /* TODO: implement select() properly so non-blocking I/O works. */
       /* For now, make sure the write blocks.  */
       if (fd_info[fd].flags & FILE_NDELAY)
-- 
2.6.2.windows.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-06 23:24                                                       ` Alain Schneble
@ 2016-03-07  8:49                                                         ` Alain Schneble
  2016-03-07 16:08                                                           ` Eli Zaretskii
  2016-03-07 16:07                                                         ` Eli Zaretskii
  1 sibling, 1 reply; 124+ messages in thread
From: Alain Schneble @ 2016-03-07  8:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, j_l_domenech, 22789

Alain Schneble <a.s@realize.ch> writes:

>                (I'll try to search the wisock documentation tomorrow to
> find some hints that lead in this direction, or maybe you know?)

IIUC, this could be the reason why it fails:

- The reader_thread in w32proc.c calls _sys_wait_connect which in turn
  does a WSAEventSelect.

- The documentation says:
  https://msdn.microsoft.com/en-us/library/windows/desktop/ms738573(v=vs.85).aspx

  The WSAAsyncSelect and WSAEventSelect functions automatically set a
  socket to nonblocking mode.  If WSAAsyncSelect or WSAEventSelect has
  been issued on a socket, then any attempt to use ioctlsocket to set
  the socket back to blocking mode will fail with WSAEINVAL.

- So there /is/ a dependency between these calls.  Unfortunately, I
  couldn't see that ioctlsocket returns with WSAEINVAL in the scenarios
  I tried.  Could it be a multi-threading issue then?  Multiple threads
  accessing the same socket...  I don't see how both threads are
  synchronized.  The patch I sent would synchronize them through the
  FILE_CONNECT flag, I think.  Did I miss something?







^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-07  8:49                                                         ` Alain Schneble
@ 2016-03-07 16:08                                                           ` Eli Zaretskii
  2016-03-07 17:20                                                             ` Alain Schneble
  0 siblings, 1 reply; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-07 16:08 UTC (permalink / raw)
  To: Alain Schneble; +Cc: larsi, j_l_domenech, 22789

> From: Alain Schneble <a.s@realize.ch>
> CC: <larsi@gnus.org>, <j_l_domenech@yahoo.com>, <22789@debbugs.gnu.org>
> Date: Mon, 7 Mar 2016 09:49:00 +0100
> 
> >                (I'll try to search the wisock documentation tomorrow to
> > find some hints that lead in this direction, or maybe you know?)
> 
> IIUC, this could be the reason why it fails:
> 
> - The reader_thread in w32proc.c calls _sys_wait_connect which in turn
>   does a WSAEventSelect.
> 
> - The documentation says:
> 
> https://msdn.microsoft.com/en-us/library/windows/desktop/ms738573(v=vs.85).aspx
> 
>   The WSAAsyncSelect and WSAEventSelect functions automatically set a
>   socket to nonblocking mode.  If WSAAsyncSelect or WSAEventSelect has
>   been issued on a socket, then any attempt to use ioctlsocket to set
>   the socket back to blocking mode will fail with WSAEINVAL.
> 
> - So there /is/ a dependency between these calls.  Unfortunately, I
>   couldn't see that ioctlsocket returns with WSAEINVAL in the scenarios
>   I tried.

Yes, I know about this gotcha.  It's just that it never produced any
problems, and I never was able to see the ioctlsocket call fail with
WSAEINVAL.

>   Could it be a multi-threading issue then?  Multiple threads
>   accessing the same socket...

Not sure I follow -- are you trying to explain why ioctlsocket doesn't
fail as expected, or are you trying to explain some other phenomenon?

>   I don't see how both threads are synchronized.

The synchronization is between reader_thread and sys_select.  The
latter runs in the main (a.k.a. "Lisp") thread, the same thread where
sys_write is called.

>   The patch I sent would synchronize them through the FILE_CONNECT
>   flag, I think.  Did I miss something?

Well, that's not really "thread synchronization", but see my comments
and questions there.

Thanks.





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-07 16:08                                                           ` Eli Zaretskii
@ 2016-03-07 17:20                                                             ` Alain Schneble
  2016-03-07 17:33                                                               ` Eli Zaretskii
  0 siblings, 1 reply; 124+ messages in thread
From: Alain Schneble @ 2016-03-07 17:20 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, j_l_domenech, 22789

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Alain Schneble <a.s@realize.ch>
>> CC: <larsi@gnus.org>, <j_l_domenech@yahoo.com>, <22789@debbugs.gnu.org>
>> Date: Mon, 7 Mar 2016 09:49:00 +0100
>>
>> - So there /is/ a dependency between these calls.  Unfortunately, I
>>   couldn't see that ioctlsocket returns with WSAEINVAL in the scenarios
>>   I tried.
>
> Yes, I know about this gotcha.  It's just that it never produced any
> problems, and I never was able to see the ioctlsocket call fail with
> WSAEINVAL.

As said, I didn't see WSAEINVAL as well with the current implementation.

>>   Could it be a multi-threading issue then?  Multiple threads
>>   accessing the same socket...
>
> Not sure I follow -- are you trying to explain why ioctlsocket doesn't
> fail as expected, or are you trying to explain some other phenomenon?

Both.  1) I expected it to (sometime) fail but didn't see it.  2) The
arbitrary gnutls_handshake failures I observed seem like the socket's
state is getting corrupted.

Do you agree that WSAEventSelect is called from the reader thread and
the ioctlsocket is called from the main thread?  And AFAIK, these
functions are not thread safe...  So I suspect that we ran into a
multi-threading issue here, which corrupted the socket.  And may have
lead to the above mentioned issues.

>>   I don't see how both threads are synchronized.
>
> The synchronization is between reader_thread and sys_select.  The
> latter runs in the main (a.k.a. "Lisp") thread, the same thread where
> sys_write is called.

This is how I understood it as well...

>>   The patch I sent would synchronize them through the FILE_CONNECT
>>   flag, I think.  Did I miss something?
>
> Well, that's not really "thread synchronization", but see my comments
> and questions there.

Yes, you are right.  I just meant that both threads no longer will call
WSAEventSelect and ioctlsocket at the same time with the proposed patch.
And this is a kind of "synchronization".

^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-07 17:20                                                             ` Alain Schneble
@ 2016-03-07 17:33                                                               ` Eli Zaretskii
  2016-03-07 18:03                                                                 ` Alain Schneble
  0 siblings, 1 reply; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-07 17:33 UTC (permalink / raw)
  To: Alain Schneble; +Cc: larsi, j_l_domenech, 22789

> From: Alain Schneble <a.s@realize.ch>
> CC: <larsi@gnus.org>, <j_l_domenech@yahoo.com>, <22789@debbugs.gnu.org>
> Date: Mon, 7 Mar 2016 18:20:20 +0100
> 
> >>   Could it be a multi-threading issue then?  Multiple threads
> >>   accessing the same socket...
> >
> > Not sure I follow -- are you trying to explain why ioctlsocket doesn't
> > fail as expected, or are you trying to explain some other phenomenon?
> 
> Both.  1) I expected it to (sometime) fail but didn't see it.  2) The
> arbitrary gnutls_handshake failures I observed seem like the socket's
> state is getting corrupted.

I cannot argue with facts.  I'm not sure the ioctlsocket calls are
what causes the failures, but since you say the failures disappear
when we don't, it's a fact that we should accept.

> Do you agree that WSAEventSelect is called from the reader thread and
> the ioctlsocket is called from the main thread?

Yes, of course.

> And AFAIK, these functions are not thread safe...

I don't think this is true.  AFAIK, the WSA* functions are all
thread-safe.

> So I suspect that we ran into a multi-threading issue here, which
> corrupted the socket.  And may have lead to the above mentioned
> issues.

It's possible.  But if you see the problems solved after the change, I
see no reason to continue arguing.

Thanks.





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-07 17:33                                                               ` Eli Zaretskii
@ 2016-03-07 18:03                                                                 ` Alain Schneble
  2016-03-07 18:10                                                                   ` Eli Zaretskii
  0 siblings, 1 reply; 124+ messages in thread
From: Alain Schneble @ 2016-03-07 18:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, j_l_domenech, 22789

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Alain Schneble <a.s@realize.ch>
>> CC: <larsi@gnus.org>, <j_l_domenech@yahoo.com>, <22789@debbugs.gnu.org>
>> Date: Mon, 7 Mar 2016 18:20:20 +0100
>> 
>> >>   Could it be a multi-threading issue then?  Multiple threads
>> >>   accessing the same socket...
>> >
>> > Not sure I follow -- are you trying to explain why ioctlsocket doesn't
>> > fail as expected, or are you trying to explain some other phenomenon?
>> 
>> Both.  1) I expected it to (sometime) fail but didn't see it.  2) The
>> arbitrary gnutls_handshake failures I observed seem like the socket's
>> state is getting corrupted.
>
> I cannot argue with facts.  I'm not sure the ioctlsocket calls are
> what causes the failures, but since you say the failures disappear
> when we don't, it's a fact that we should accept.

Neither do I.  Yes, the failures disappear completely.

>> Do you agree that WSAEventSelect is called from the reader thread and
>> the ioctlsocket is called from the main thread?
>
> Yes, of course.
>
>> And AFAIK, these functions are not thread safe...
>
> I don't think this is true.  AFAIK, the WSA* functions are all
> thread-safe.

Hm, it seems you are right, but I was not able to quickly find a *clear*
statement about this on MSDN...

>> So I suspect that we ran into a multi-threading issue here, which
>> corrupted the socket.  And may have lead to the above mentioned
>> issues.
>
> It's possible.  But if you see the problems solved after the change, I
> see no reason to continue arguing.

Yes, it solves the problems.  (Nevertheless, it would have been nice if
somebody else were able to reproduce the failures...)

Thanks.






^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-07 18:03                                                                 ` Alain Schneble
@ 2016-03-07 18:10                                                                   ` Eli Zaretskii
  2016-03-07 18:26                                                                     ` Alain Schneble
  0 siblings, 1 reply; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-07 18:10 UTC (permalink / raw)
  To: Alain Schneble; +Cc: larsi, j_l_domenech, 22789

> From: Alain Schneble <a.s@realize.ch>
> CC: <larsi@gnus.org>, <j_l_domenech@yahoo.com>, <22789@debbugs.gnu.org>
> Date: Mon, 7 Mar 2016 19:03:58 +0100
> 
> Yes, it solves the problems.  (Nevertheless, it would have been nice if
> somebody else were able to reproduce the failures...)

If you can describe the recipe for reproducing them, I will do that
before and after the change.

Thanks.





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-07 18:10                                                                   ` Eli Zaretskii
@ 2016-03-07 18:26                                                                     ` Alain Schneble
  0 siblings, 0 replies; 124+ messages in thread
From: Alain Schneble @ 2016-03-07 18:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, j_l_domenech, 22789

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Alain Schneble <a.s@realize.ch>
>> CC: <larsi@gnus.org>, <j_l_domenech@yahoo.com>, <22789@debbugs.gnu.org>
>> Date: Mon, 7 Mar 2016 19:03:58 +0100
>> 
>> Yes, it solves the problems.  (Nevertheless, it would have been nice if
>> somebody else were able to reproduce the failures...)
>
> If you can describe the recipe for reproducing them, I will do that
> before and after the change.

Thank you!

emacs -Q

(require 'shr)
(require 'url-queue)
(blink-cursor-mode -1)
(setq url-queue-timeout 120)
(setq shr-ignore-cache t)

M-x eww https://www.microsoft.com

=> Before the patch, several images were not displayed, even after
waiting 120s.  After the patch, all images should be displayed after a
short while (~6s on my machine).






^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-06 23:24                                                       ` Alain Schneble
  2016-03-07  8:49                                                         ` Alain Schneble
@ 2016-03-07 16:07                                                         ` Eli Zaretskii
  2016-03-07 16:47                                                           ` Alain Schneble
  2016-03-07 22:21                                                           ` Alain Schneble
  1 sibling, 2 replies; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-07 16:07 UTC (permalink / raw)
  To: Alain Schneble; +Cc: larsi, j_l_domenech, 22789

> From: Alain Schneble <a.s@realize.ch>
> CC: <larsi@gnus.org>, <j_l_domenech@yahoo.com>, <22789@debbugs.gnu.org>
> Date: Mon, 7 Mar 2016 00:24:35 +0100
> 
> > I'll send a patch for further discussions shortly.
> 
> And here it is.  The fix is quite simple.  It ensures that sys_write
> exits before touching the socket if it is not connected yet.
> Unfortunately I didn't find any documentation on winsock ioctlsocket
> that would prove that this is indeed required.  But it seems not wrong
> to me anyway.  (I'll try to search the wisock documentation tomorrow to
> find some hints that lead in this direction, or maybe you know?)

I think this change should be installed regardless, as it fixes an
oversight.  However, I think it needs to be augmented, because the
fact that FILE_CONNECT flag is set doesn't necessarily mean the
connection is in progress: it could have failed already.  We need to
look at the status as well.

The possible states of the FILE_CONNECT flag and the cp->status values
are:

  flag    status                      description
  ----------------------------------------------------------------------------
  ON      STATUS_READ_READY           reader thread is about to try connect
  ON      STATUS_READ_FAILED          reader thread waits in _sys_wait_connect
  ON      STATUS_READ_SUCCEEDED       reader thread successfully connected
  ON      STATUS_CONNECT_FAILED       reader thread failed to connect
  OFF     STATUS_READ_ACKNOWLEDGED    sys_select acknowledged successful connect
  OFF     STATUS_READ_READY           reader thread is about to read
  OFF     STATUS_READ_IN_PROGRESS     reader thread waits in _sys_read_ahead
  OFF     STATUS_READ_SUCCEEDED       reader thread succeeded in reading
  OFF     STATUS_READ_FAILED          reader thread failed to read

So we should only return EWOULDBLOCK when FILE_CONNECT is set _and_
the status is not STATUS_CONNECT_FAILED.  If FILE_CONNECT is set, but
the status is STATUS_CONNECT_FAILED, we should instead return the
value computed from cp->errcode (if it is non-zero).  There's an
example of that in sys_read.

Other than that, what specific problem does your change try or is
known to solve?  IOW, what didn't work before the change, and works
after it?

Thanks.





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-07 16:07                                                         ` Eli Zaretskii
@ 2016-03-07 16:47                                                           ` Alain Schneble
  2016-03-07 22:21                                                           ` Alain Schneble
  1 sibling, 0 replies; 124+ messages in thread
From: Alain Schneble @ 2016-03-07 16:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, j_l_domenech, 22789

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Alain Schneble <a.s@realize.ch>
>> CC: <larsi@gnus.org>, <j_l_domenech@yahoo.com>, <22789@debbugs.gnu.org>
>> Date: Mon, 7 Mar 2016 00:24:35 +0100
>> 
>> > I'll send a patch for further discussions shortly.
>> 
>> And here it is.  The fix is quite simple.  It ensures that sys_write
>> exits before touching the socket if it is not connected yet.
>> Unfortunately I didn't find any documentation on winsock ioctlsocket
>> that would prove that this is indeed required.  But it seems not wrong
>> to me anyway.  (I'll try to search the wisock documentation tomorrow to
>> find some hints that lead in this direction, or maybe you know?)
>
> I think this change should be installed regardless, as it fixes an
> oversight.  However, I think it needs to be augmented, because the
> fact that FILE_CONNECT flag is set doesn't necessarily mean the
> connection is in progress: it could have failed already.  We need to
> look at the status as well.

Thank you.  I'll study the state table later...

> So we should only return EWOULDBLOCK when FILE_CONNECT is set _and_
> the status is not STATUS_CONNECT_FAILED.  If FILE_CONNECT is set, but
> the status is STATUS_CONNECT_FAILED, we should instead return the
> value computed from cp->errcode (if it is non-zero).  There's an
> example of that in sys_read.

Ok, thanks for this information.  I'll read through that code once
again...

> Other than that, what specific problem does your change try or is
> known to solve?  IOW, what didn't work before the change, and works
> after it?

Aha.  Sorry, I was not clear about that.

It fixes all the reproducible issues I had when "asynchronously" loading
a website with images in eww on MS-Windows
(e.g. https://www.microsoft.com).  gnutls_handshake returned with
arbitrary failures when loading the images.  It returned with errors -15
GNUTLS_E_UNEXPECTED_PACKET or -110 GNUTLS_E_PREMATURE_TERMINATION,
followed by -10 GNUTLS_E_INVALID_SESSION.  It happend all the time, but
arbitrarily only for some of the images.  The affected images were not
downloaded and displayed in eww at all because the GnuTLS session could
not be established.






^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-07 16:07                                                         ` Eli Zaretskii
  2016-03-07 16:47                                                           ` Alain Schneble
@ 2016-03-07 22:21                                                           ` Alain Schneble
  2016-03-08 16:40                                                             ` Eli Zaretskii
  2016-03-10 14:45                                                             ` Eli Zaretskii
  1 sibling, 2 replies; 124+ messages in thread
From: Alain Schneble @ 2016-03-07 22:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, j_l_domenech, 22789

[-- Attachment #1: Type: text/plain, Size: 1817 bytes --]

Eli Zaretskii <eliz@gnu.org> writes:

> I think this change should be installed regardless, as it fixes an
> oversight.  However, I think it needs to be augmented, because the
> fact that FILE_CONNECT flag is set doesn't necessarily mean the
> connection is in progress: it could have failed already.  We need to
> look at the status as well.
>
> The possible states of the FILE_CONNECT flag and the cp->status values
> are:
>
>   flag    status                      description
>   ----------------------------------------------------------------------------
>   ON      STATUS_READ_READY           reader thread is about to try connect
>   ON      STATUS_READ_FAILED          reader thread waits in _sys_wait_connect
>   ON      STATUS_READ_SUCCEEDED       reader thread successfully connected
>   ON      STATUS_CONNECT_FAILED       reader thread failed to connect
>   OFF     STATUS_READ_ACKNOWLEDGED    sys_select acknowledged successful connect
>   OFF     STATUS_READ_READY           reader thread is about to read
>   OFF     STATUS_READ_IN_PROGRESS     reader thread waits in _sys_read_ahead
>   OFF     STATUS_READ_SUCCEEDED       reader thread succeeded in reading
>   OFF     STATUS_READ_FAILED          reader thread failed to read
>
> So we should only return EWOULDBLOCK when FILE_CONNECT is set _and_
> the status is not STATUS_CONNECT_FAILED.  If FILE_CONNECT is set, but
> the status is STATUS_CONNECT_FAILED, we should instead return the
> value computed from cp->errcode (if it is non-zero).  There's an
> example of that in sys_read.

Thank you very much for these inputs.  I rearranged the patch to include
these two cases and removed another special case that should no longer
be needed as it is covered by the first one.

Is this what you had in mind?  Do you agree with the change?

Thanks.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Patch --]
[-- Type: text/x-patch, Size: 2727 bytes --]

From 81295b036eb0a43dee968e8aa3f031030589cddd Mon Sep 17 00:00:00 2001
From: Alain Schneble <a.s@realize.ch>
Date: Mon, 7 Mar 2016 23:05:40 +0100
Subject: [PATCH] Resolve non-blocking socket connection issue on w32

* src/w32.c (sys_write): For non-blocking sockets, return immediately
with EWOULDBLOCK if connection is still in progress.  If connection
attempt has failed already, return proper code stashed in cp->errcode.
BTW, this ensures we do not temporarily turn the socket into blocking
mode for the pfn_send call if the connection is in progress.  It turned
out that doing so causes arbitrary GnuTLS handshake failures on
MS-Windows.  (bug#22789)
---
 src/w32.c | 34 ++++++++++++++++++++++++++--------
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/src/w32.c b/src/w32.c
index ccf7cc3..c553152 100644
--- a/src/w32.c
+++ b/src/w32.c
@@ -8772,6 +8772,30 @@ sys_write (int fd, const void * buffer, unsigned int count)
       unsigned long nblock = 0;
       if (winsock_lib == NULL) emacs_abort ();
 
+      child_process *cp = fd_info[fd].cp;
+
+      /* If this is a non-blocking socket whose connection is in
+	 progress or terminated with an error already, return the
+	 proper error code to the caller. */
+      if (cp != NULL && (fd_info[fd].flags & FILE_CONNECT) != 0)
+	{
+	  /* In case connection is in progress, ENOTCONN that would
+	     result from calling pfn_send is not what callers expect. */
+	  if (cp->status != STATUS_CONNECT_FAILED)
+	    {
+	      errno = EWOULDBLOCK;
+	      return -1;
+	    }
+	  /* In case connection failed, use the actual error code
+	     stashed by '_sys_wait_connect' in cp->errcode. */
+	  else if (cp->errcode != 0)
+	    {
+	      pfn_WSASetLastError (cp->errcode);
+	      set_errno ();
+	      return -1;
+	    }
+	}
+
       /* TODO: implement select() properly so non-blocking I/O works. */
       /* For now, make sure the write blocks.  */
       if (fd_info[fd].flags & FILE_NDELAY)
@@ -8782,14 +8806,8 @@ sys_write (int fd, const void * buffer, unsigned int count)
       if (nchars == SOCKET_ERROR)
         {
 	  set_errno ();
-	  /* If this is a non-blocking socket whose connection is in
-	     progress, return the proper error code to the caller;
-	     ENOTCONN is not what they expect . */
-	  if (errno == ENOTCONN && (fd_info[fd].flags & FILE_CONNECT) != 0)
-	    errno = EWOULDBLOCK;
-	  else
-	    DebPrint (("sys_write.send failed with error %d on socket %ld\n",
-		       pfn_WSAGetLastError (), SOCK_HANDLE (fd)));
+	  DebPrint (("sys_write.send failed with error %d on socket %ld\n",
+		     pfn_WSAGetLastError (), SOCK_HANDLE (fd)));
 	}
 
       /* Set the socket back to non-blocking if it was before,
-- 
2.6.2.windows.1


^ permalink raw reply related	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-07 22:21                                                           ` Alain Schneble
@ 2016-03-08 16:40                                                             ` Eli Zaretskii
  2016-03-08 16:43                                                               ` Alain Schneble
  2016-03-10 14:45                                                             ` Eli Zaretskii
  1 sibling, 1 reply; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-08 16:40 UTC (permalink / raw)
  To: Alain Schneble; +Cc: larsi, j_l_domenech, 22789

> From: Alain Schneble <a.s@realize.ch>
> CC: <larsi@gnus.org>, <j_l_domenech@yahoo.com>, <22789@debbugs.gnu.org>
> Date: Mon, 7 Mar 2016 23:21:56 +0100
> 
> > So we should only return EWOULDBLOCK when FILE_CONNECT is set _and_
> > the status is not STATUS_CONNECT_FAILED.  If FILE_CONNECT is set, but
> > the status is STATUS_CONNECT_FAILED, we should instead return the
> > value computed from cp->errcode (if it is non-zero).  There's an
> > example of that in sys_read.
> 
> Thank you very much for these inputs.  I rearranged the patch to include
> these two cases and removed another special case that should no longer
> be needed as it is covered by the first one.
> 
> Is this what you had in mind?  Do you agree with the change?

Yes, it looks good to me.  I will test it in a couple of days.

Thanks.





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-08 16:40                                                             ` Eli Zaretskii
@ 2016-03-08 16:43                                                               ` Alain Schneble
  0 siblings, 0 replies; 124+ messages in thread
From: Alain Schneble @ 2016-03-08 16:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, j_l_domenech, 22789

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Alain Schneble <a.s@realize.ch>
>> CC: <larsi@gnus.org>, <j_l_domenech@yahoo.com>, <22789@debbugs.gnu.org>
>> Date: Mon, 7 Mar 2016 23:21:56 +0100
>> 
>> > So we should only return EWOULDBLOCK when FILE_CONNECT is set _and_
>> > the status is not STATUS_CONNECT_FAILED.  If FILE_CONNECT is set, but
>> > the status is STATUS_CONNECT_FAILED, we should instead return the
>> > value computed from cp->errcode (if it is non-zero).  There's an
>> > example of that in sys_read.
>> 
>> Thank you very much for these inputs.  I rearranged the patch to include
>> these two cases and removed another special case that should no longer
>> be needed as it is covered by the first one.
>> 
>> Is this what you had in mind?  Do you agree with the change?
>
> Yes, it looks good to me.  I will test it in a couple of days.

Thank you, Eli.






^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-07 22:21                                                           ` Alain Schneble
  2016-03-08 16:40                                                             ` Eli Zaretskii
@ 2016-03-10 14:45                                                             ` Eli Zaretskii
  2016-03-10 14:59                                                               ` Alain Schneble
  1 sibling, 1 reply; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-10 14:45 UTC (permalink / raw)
  To: Alain Schneble; +Cc: larsi, j_l_domenech, 22789-done

> From: Alain Schneble <a.s@realize.ch>
> CC: <larsi@gnus.org>, <j_l_domenech@yahoo.com>, <22789@debbugs.gnu.org>
> Date: Mon, 7 Mar 2016 23:21:56 +0100
> 
> Thank you very much for these inputs.  I rearranged the patch to include
> these two cases and removed another special case that should no longer
> be needed as it is covered by the first one.
> 
> Is this what you had in mind?  Do you agree with the change?

Thanks, I pushed this to master.

I'm marking the bug done.





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-10 14:45                                                             ` Eli Zaretskii
@ 2016-03-10 14:59                                                               ` Alain Schneble
  0 siblings, 0 replies; 124+ messages in thread
From: Alain Schneble @ 2016-03-10 14:59 UTC (permalink / raw)
  To: 22789; +Cc: j_l_domenech

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Alain Schneble <a.s@realize.ch>
>> CC: <larsi@gnus.org>, <j_l_domenech@yahoo.com>, <22789@debbugs.gnu.org>
>> Date: Mon, 7 Mar 2016 23:21:56 +0100
>> 
>> Thank you very much for these inputs.  I rearranged the patch to include
>> these two cases and removed another special case that should no longer
>> be needed as it is covered by the first one.
>> 
>> Is this what you had in mind?  Do you agree with the change?
>
> Thanks, I pushed this to master.
>
> I'm marking the bug done.

Thanks.






^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-05 18:27                                                 ` Alain Schneble
  2016-03-05 19:21                                                   ` Eli Zaretskii
@ 2016-03-06  9:31                                                   ` Lars Magne Ingebrigtsen
  2016-03-06 15:24                                                     ` Eli Zaretskii
  1 sibling, 1 reply; 124+ messages in thread
From: Lars Magne Ingebrigtsen @ 2016-03-06  9:31 UTC (permalink / raw)
  To: Alain Schneble; +Cc: j_l_domenech, 22789

Alain Schneble <a.s@realize.ch> writes:

> I should have written "multiple times" not "many times".  I just had a
> case where for four network processes (get image requests)
> gnutls_try_handshake returned for each of these processes -28
> GNUTLS_E_AGAIN three times in a row, followed by a single -110
> GNUTLS_E_PREMATURE_TERMINATION and then repeatedly by -10
> GNUTLS_E_INVALID_SESSION.

Hm!  That GNUTLS_E_PREMATURE_TERMINATION is interesting.  Does that
mean that Emacs has closed the socket, or that the peer has closed the
connection, I wonder?

I've had a look at the gnutls source code, and if I'm reading the code
correctly, that error code doesn't really distinguish the two cases...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-06  9:31                                                   ` Lars Magne Ingebrigtsen
@ 2016-03-06 15:24                                                     ` Eli Zaretskii
  0 siblings, 0 replies; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-06 15:24 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: j_l_domenech, a.s, 22789

> From: Lars Magne Ingebrigtsen <larsi@gnus.org>
> Cc: Eli Zaretskii <eliz@gnu.org>,  <j_l_domenech@yahoo.com>,  <22789@debbugs.gnu.org>
> Date: Sun, 06 Mar 2016 10:31:43 +0100
> 
> Alain Schneble <a.s@realize.ch> writes:
> 
> > I should have written "multiple times" not "many times".  I just had a
> > case where for four network processes (get image requests)
> > gnutls_try_handshake returned for each of these processes -28
> > GNUTLS_E_AGAIN three times in a row, followed by a single -110
> > GNUTLS_E_PREMATURE_TERMINATION and then repeatedly by -10
> > GNUTLS_E_INVALID_SESSION.
> 
> Hm!  That GNUTLS_E_PREMATURE_TERMINATION is interesting.  Does that
> mean that Emacs has closed the socket, or that the peer has closed the
> connection, I wonder?

I cannot imagine it's the other side: why would they?  I think it's
us, when we give up (too early, perhaps) and delete the process
object.





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-04 21:36                                             ` Alain Schneble
  2016-03-04 22:33                                               ` Alain Schneble
  2016-03-05  8:23                                               ` Eli Zaretskii
@ 2016-03-05  8:46                                               ` Lars Magne Ingebrigtsen
  2016-03-05 18:32                                                 ` Alain Schneble
  2 siblings, 1 reply; 124+ messages in thread
From: Lars Magne Ingebrigtsen @ 2016-03-05  8:46 UTC (permalink / raw)
  To: Alain Schneble; +Cc: j_l_domenech, 22789

Alain Schneble <a.s@realize.ch> writes:

> I have the impression that GnuTLS doesn't like it too much if we start
> retrying the handshake many times before the socket is connected.  At
> least on MS-Windows.  In nearly all of the cases of loading websites
> with around 20 images, I observe arbitrary failures of
> gnutls_try_handshake which usually end up with -10
> GNUTLS_E_INVALID_SESSION.

Try to add some printfs around the handshaking, and then strace Emacs
while it's doing all this.  GNUTLS_E_INVALID_SESSION is usually the
result of libgnutls losing control of the socket -- something else is
writing or reading from the socket, and that makes the libgnutls state
machine become unsynchronised.

If you see any reads/writes to the sockets outside the handshake section
of the code, you'll have found the culprit.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-05  8:46                                               ` Lars Magne Ingebrigtsen
@ 2016-03-05 18:32                                                 ` Alain Schneble
  0 siblings, 0 replies; 124+ messages in thread
From: Alain Schneble @ 2016-03-05 18:32 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: j_l_domenech, 22789

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> Alain Schneble <a.s@realize.ch> writes:
>
>> I have the impression that GnuTLS doesn't like it too much if we start
>> retrying the handshake many times before the socket is connected.  At
>> least on MS-Windows.  In nearly all of the cases of loading websites
>> with around 20 images, I observe arbitrary failures of
>> gnutls_try_handshake which usually end up with -10
>> GNUTLS_E_INVALID_SESSION.
>
> Try to add some printfs around the handshaking, and then strace Emacs
> while it's doing all this.  GNUTLS_E_INVALID_SESSION is usually the
> result of libgnutls losing control of the socket -- something else is
> writing or reading from the socket, and that makes the libgnutls state
> machine become unsynchronised.

Thank you.  I will try to do that.  I'll have to learn how to strace
first, though.  Sorry, that was the reason why I did not already strace
it (you suggested it already in a previous message, thanks!  Sorry for
not having followed your advice in the first place).

> If you see any reads/writes to the sockets outside the handshake section
> of the code, you'll have found the culprit.

Ok, thanks.  Hopefully, I'll have time to do it tomorrow...






^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-29 17:57                             ` Alain Schneble
  2016-02-29 18:45                               ` Eli Zaretskii
@ 2016-02-29 21:18                               ` Lars Ingebrigtsen
  2016-02-29 23:20                                 ` Alain Schneble
  1 sibling, 1 reply; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-29 21:18 UTC (permalink / raw)
  To: Alain Schneble; +Cc: j_l_domenech, 22789

Alain Schneble <a.s@realize.ch> writes:

> But the point I tried to address is the following: /When/ shall we start
> with the handshake "series" and start counting the number of tries (or
> stopwatch)?  Don't you agree that with async sockets, it doesn't make
> much sense to start it before the socket is connected?

Yeah, it probably doesn't make any sense to start trying until it's
connected.  What's the incantation to check whether a socket has
finished its three-way TCP handshake?  

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-29 21:18                               ` Lars Ingebrigtsen
@ 2016-02-29 23:20                                 ` Alain Schneble
  2016-03-01  3:43                                   ` Eli Zaretskii
  0 siblings, 1 reply; 124+ messages in thread
From: Alain Schneble @ 2016-02-29 23:20 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: j_l_domenech, 22789

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Alain Schneble <a.s@realize.ch> writes:
>
>> But the point I tried to address is the following: /When/ shall we start
>> with the handshake "series" and start counting the number of tries (or
>> stopwatch)?  Don't you agree that with async sockets, it doesn't make
>> much sense to start it before the socket is connected?
>
> Yeah, it probably doesn't make any sense to start trying until it's
> connected.  What's the incantation to check whether a socket has
> finished its three-way TCP handshake?  

I'm not quite sure, but my understanding was that this expression does
answer this question:

(fd_info[p->infd].flags & FILE_CONNECT) == 0

Or am I wrong?

See also patch in the previous message.  If we decide to go this
direction, then I think we should also inhibit the first call to
gnutls_try_handshake from gnutls_boot, for async sockets.  This call is
currently implicit and will probably fail (return with EAGAIN) in all
practical cases.






^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-29 23:20                                 ` Alain Schneble
@ 2016-03-01  3:43                                   ` Eli Zaretskii
  2016-03-01  5:17                                     ` Lars Ingebrigtsen
  2016-03-01 15:43                                     ` Alain Schneble
  0 siblings, 2 replies; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-01  3:43 UTC (permalink / raw)
  To: Alain Schneble; +Cc: larsi, j_l_domenech, 22789

> From: Alain Schneble <a.s@realize.ch>
> Date: Tue, 1 Mar 2016 00:20:45 +0100
> Cc: j_l_domenech@yahoo.com, 22789@debbugs.gnu.org
> 
> See also patch in the previous message.  If we decide to go this
> direction, then I think we should also inhibit the first call to
> gnutls_try_handshake from gnutls_boot, for async sockets.  This call is
> currently implicit and will probably fail (return with EAGAIN) in all
> practical cases.

That patch exposes a w32-specific stuff to process.c, so it cannot be
right.

That's why I asked for a description of how this works on GNU/Linux.
With that information in hand, we can reason what is different on w32.





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-01  3:43                                   ` Eli Zaretskii
@ 2016-03-01  5:17                                     ` Lars Ingebrigtsen
  2016-03-01 15:46                                       ` Eli Zaretskii
  2016-03-01 15:43                                     ` Alain Schneble
  1 sibling, 1 reply; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-03-01  5:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: j_l_domenech, Alain Schneble, 22789

All the immediate ideas I have to ensure we have a timer that triggers
`wait_reading_process_output' now and then when we have processes
waiting for DNS or to complete TLS negotiation remind me too much of
reference counting.  There's always an off by one error or a race
condition when counting.  :-)

But, basically, we have to have a way to say "I'm starting this stuff
now, and the timer should continue to trigger every 50ms until I'm
done".  And then stop when they're all done...

Perhaps it isn't as difficult as all that, since Emacs is pretty single
threaded.  That is, when calling getaddrinfo_a or try_negotiate, we'd
have a function that would start the timer unless it's already running.
And the timer itself could just look through the process list and see if
any such processes remain, and then just commit sudoku if there are no
such processes remaining.

Now, if Emacs were multithreaded in many dimensions, this would be
pretty error prone, but perhaps it's not the way Emacs is now?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-01  5:17                                     ` Lars Ingebrigtsen
@ 2016-03-01 15:46                                       ` Eli Zaretskii
  2016-03-02 18:03                                         ` Lars Ingebrigtsen
  0 siblings, 1 reply; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-01 15:46 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: j_l_domenech, a.s, 22789

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Alain Schneble <a.s@realize.ch>,  j_l_domenech@yahoo.com,  22789@debbugs.gnu.org
> Date: Tue, 01 Mar 2016 16:17:13 +1100
> 
> All the immediate ideas I have to ensure we have a timer that triggers
> `wait_reading_process_output' now and then when we have processes
> waiting for DNS or to complete TLS negotiation remind me too much of
> reference counting.  There's always an off by one error or a race
> condition when counting.  :-)
> 
> But, basically, we have to have a way to say "I'm starting this stuff
> now, and the timer should continue to trigger every 50ms until I'm
> done".  And then stop when they're all done...

You don't actually need a timer, since there's nothing you'd need the
timer function do.  What you want is a way to ensure the loop in
wait_reading_process_output continues to run, without becoming stuck
for too long in a call to 'pselect', for as long as there are process
objects waiting for this async stuff to complete; and you want to make
sure this happens even if Emacs doesn't get any events from the
window-system, the keyboard, the subprocesses, whatever.

Timers solve this problem for you because that loop computes the
timeout for the 'pselect' call such that the timeout ends when the
next timer expires.  For example, the blink-cursor timer causes the
timeout to be at most 0.5 sec.  Indirectly, this causes the loop to
crank one more iteration, which helps you, because Emacs then gets an
opportunity to check on the status of the TLS negotiation etc.  The
timer itself is redundant; its effect on the loop is what you want.

So what you can do instead of launching a timer is this: as long as
some process waits for some sync stuff to complete, reduce the timeout
with which we call 'pselect' to some reasonably small value, like half
a second or maybe 0.25 sec.  This will ensure the loop doesn't stop as
long as we wait for at least one such connection.  (This will need a
simple logic to not exit the loop too early; see the variable
timeout_reduced_for_timers for a similar logic we employ already for
timers.)

^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-01 15:46                                       ` Eli Zaretskii
@ 2016-03-02 18:03                                         ` Lars Ingebrigtsen
  2016-03-02 19:07                                           ` Alain Schneble
  2016-03-02 19:43                                           ` Eli Zaretskii
  0 siblings, 2 replies; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-03-02 18:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: j_l_domenech, a.s, 22789

Eli Zaretskii <eliz@gnu.org> writes:

> So what you can do instead of launching a timer is this: as long as
> some process waits for some sync stuff to complete, reduce the timeout
> with which we call 'pselect' to some reasonably small value, like half
> a second or maybe 0.25 sec.  This will ensure the loop doesn't stop as
> long as we wait for at least one such connection.  (This will need a
> simple logic to not exit the loop too early; see the variable
> timeout_reduced_for_timers for a similar logic we employ already for
> timers.)

Aha!  With the following (for debugging purposes only) patch, it looks
like I'm getting progress on my https connections even if I don't have a
blinking cursor.  (I chose 50ms as my timeout, if I counted my zeroes
correctly...)  Were you thinking about something along these lines?  If
so, I can clean the patch up...

diff --git a/src/process.c b/src/process.c
index 85a4885..5376492 100644
--- a/src/process.c
+++ b/src/process.c
@@ -4870,6 +4870,7 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd,
   struct timespec got_output_end_time = invalid_timespec ();
   enum { MINIMUM = -1, TIMEOUT, INFINITY } wait;
   int got_some_output = -1;
+  bool retry_for_async;
   ptrdiff_t count = SPECPDL_INDEX ();
 
   /* Close to the current time if known, an invalid timespec otherwise.  */
@@ -4922,6 +4923,7 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd,
 	Lisp_Object process_list_head, aproc;
 	struct Lisp_Process *p;
 
+	retry_for_async = false;
 	FOR_EACH_PROCESS(process_list_head, aproc)
 	  {
 	    p = XPROCESS (aproc);
@@ -4935,6 +4937,8 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd,
 		    Lisp_Object ip_addresses = check_for_dns (aproc);
 		    if (!NILP (ip_addresses) && !EQ (ip_addresses, Qt))
 		      connect_network_socket (aproc, ip_addresses);
+		    else
+		      retry_for_async = true;
 		  }
 #endif
 #ifdef HAVE_GNUTLS
@@ -4950,12 +4954,16 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd,
 			gnutls_verify_boot (aproc, Qnil);
 			finish_after_tls_connection (aproc);
 		      }
-		    else if (p->gnutls_handshakes_tried
-			     > GNUTLS_EMACS_HANDSHAKES_LIMIT)
+		    else
 		      {
-			deactivate_process (aproc);
-			pset_status (p, list2 (Qfailed,
-					       build_string ("TLS negotiation failed")));
+			retry_for_async = true;
+			if (p->gnutls_handshakes_tried
+			    > GNUTLS_EMACS_HANDSHAKES_LIMIT)
+			  {
+			    deactivate_process (aproc);
+			    pset_status (p, list2 (Qfailed,
+						   build_string ("TLS negotiation failed")));
+			  }
 		      }
 		  }
 #endif
@@ -5044,6 +5052,7 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd,
 	  Ctemp = write_mask;
 
 	  timeout = make_timespec (0, 0);
+	  printf("Timeout is %lu\n", timeout.tv_sec);
 	  if ((pselect (max (max_process_desc, max_input_desc) + 1,
 			&Atemp,
 #ifdef NON_BLOCKING_CONNECT
@@ -5222,6 +5231,15 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd,
 	  if (timeout.tv_sec > 0 || timeout.tv_nsec > 0)
 	    now = invalid_timespec ();
 
+	  if (retry_for_async
+	      && (timeout.tv_sec > 0 || timeout.tv_nsec > 50000000))
+	    {
+	      timeout.tv_sec = 0;
+	      timeout.tv_nsec = 50000000;
+	    }
+
+	  printf("Here timeout is %lu/%lu\n", timeout.tv_sec, timeout.tv_nsec);
+
 #if defined (HAVE_NS)
           nfds = ns_select
 #elif defined (HAVE_GLIB)


-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply related	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-02 18:03                                         ` Lars Ingebrigtsen
@ 2016-03-02 19:07                                           ` Alain Schneble
  2016-03-02 19:15                                             ` Lars Ingebrigtsen
  2016-03-02 19:43                                           ` Eli Zaretskii
  1 sibling, 1 reply; 124+ messages in thread
From: Alain Schneble @ 2016-03-02 19:07 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: j_l_domenech, 22789

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>> So what you can do instead of launching a timer is this: as long as
>> some process waits for some sync stuff to complete, reduce the timeout
>> with which we call 'pselect' to some reasonably small value, like half
>> a second or maybe 0.25 sec.  This will ensure the loop doesn't stop as
>> long as we wait for at least one such connection.  (This will need a
>> simple logic to not exit the loop too early; see the variable
>> timeout_reduced_for_timers for a similar logic we employ already for
>> timers.)
>
> Aha!  With the following (for debugging purposes only) patch, it looks
> like I'm getting progress on my https connections even if I don't have a
> blinking cursor.  (I chose 50ms as my timeout, if I counted my zeroes
> correctly...)  Were you thinking about something along these lines?  If
> so, I can clean the patch up...

I was debugging it further and AFAICS, there must also be an issue in
shr with the processing of images.  Because without any patch, if all
the images were loaded properly, i.e. the shr-image-fetched callback in
shr.el was invoked for each of the images in a page, the images are
sometimes not displayed.

In shr.el shr-image-fetched, if I comment the line...
(url-store-in-cache image-buffer)
...then images are never shown.  Strange, not?  Or is this expected?

And now, even with ordinary http connections, if I delete
~/.emacs.d/url/cache, no images are displayed the first time I load the
page.

Did I mess up something with my build/installation?

^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-02 19:07                                           ` Alain Schneble
@ 2016-03-02 19:15                                             ` Lars Ingebrigtsen
  2016-03-02 19:38                                               ` Alain Schneble
  0 siblings, 1 reply; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-03-02 19:15 UTC (permalink / raw)
  To: Alain Schneble; +Cc: j_l_domenech, 22789

Alain Schneble <a.s@realize.ch> writes:

> And now, even with ordinary http connections, if I delete
> ~/.emacs.d/url/cache, no images are displayed the first time I load the
> page.

Perhaps your Emacs isn't completing the DNS resolutions?  There's
nothing there to ensure their progress (unless you have a blinking
cursor).  

> Did I mess up something with my build/installation?

That's also possible.  :-)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-02 19:15                                             ` Lars Ingebrigtsen
@ 2016-03-02 19:38                                               ` Alain Schneble
  2016-03-02 20:46                                                 ` Alain Schneble
  0 siblings, 1 reply; 124+ messages in thread
From: Alain Schneble @ 2016-03-02 19:38 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: j_l_domenech, 22789

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Alain Schneble <a.s@realize.ch> writes:
>
>> And now, even with ordinary http connections, if I delete
>> ~/.emacs.d/url/cache, no images are displayed the first time I load the
>> page.
>
> Perhaps your Emacs isn't completing the DNS resolutions?  There's
> nothing there to ensure their progress (unless you have a blinking
> cursor).  

I don't think so, because I see that shr-image-fetched is called the
correct number of times and the status doesn't indicate an error.  But
the insert-image calls do not update the *eww* buffer and the image data
looks, well, not empty.  Or could it be an encoding issue, that the
images get displayed properly only after being saved as binary to disk
and re-read from there?  I'll try to find out...

^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-02 19:38                                               ` Alain Schneble
@ 2016-03-02 20:46                                                 ` Alain Schneble
  2016-03-02 22:02                                                   ` Alain Schneble
  0 siblings, 1 reply; 124+ messages in thread
From: Alain Schneble @ 2016-03-02 20:46 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: j_l_domenech, 22789

Alain Schneble <a.s@realize.ch> writes:

> Lars Ingebrigtsen <larsi@gnus.org> writes:
>
>> Alain Schneble <a.s@realize.ch> writes:
>>
>>> And now, even with ordinary http connections, if I delete
>>> ~/.emacs.d/url/cache, no images are displayed the first time I load the
>>> page.
>>
>> Perhaps your Emacs isn't completing the DNS resolutions?  There's
>> nothing there to ensure their progress (unless you have a blinking
>> cursor).  
>
> I don't think so, because I see that shr-image-fetched is called the
> correct number of times and the status doesn't indicate an error.  But
> the insert-image calls do not update the *eww* buffer and the image data
> looks, well, not empty.  Or could it be an encoding issue, that the
> images get displayed properly only after being saved as binary to disk
> and re-read from there?  I'll try to find out...

That's funny:

- In `shr-tag-img', if the image has not yet been cached, the following
  code is evaluated:

  (url-queue-retrieve
	   (shr-encode-url url) 'shr-image-fetched
	   (list (current-buffer) start (set-marker (make-marker) (1- (point)))
                 (list :width width :height height))

- When the `callback shr-image-fetched' is invoked, start and end will
  form an empty range.

- The image inserted will not be displayed because the range (= alt tag
  value) is empty.

- If I add (setq alt "[empty alt]") on top of `shr-put-image', then
  loading of images works quite well.  Also with https connections.  (I
  do not say that the issue with the loop progress is not there...)






^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-02 20:46                                                 ` Alain Schneble
@ 2016-03-02 22:02                                                   ` Alain Schneble
  2016-03-02 22:22                                                     ` Lars Ingebrigtsen
  0 siblings, 1 reply; 124+ messages in thread
From: Alain Schneble @ 2016-03-02 22:02 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: j_l_domenech, 22789

[-- Attachment #1: Type: text/plain, Size: 343 bytes --]

Alain Schneble <a.s@realize.ch> writes:

> - If I add (setq alt "[empty alt]") on top of `shr-put-image', then
>   loading of images works quite well.  Also with https connections.  (I
>   do not say that the issue with the loop progress is not there...)

Here is a patch to fix the empty-range issue when inserting non-cached
images in eww:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Patch --]
[-- Type: text/x-patch, Size: 1086 bytes --]

From ebf4b78ff53b92d99ef3ca771ff0d592d80b1c3d Mon Sep 17 00:00:00 2001
From: Alain Schneble <a.s@realize.ch>
Date: Wed, 2 Mar 2016 22:49:32 +0100
Subject: [PATCH] Fix insertion of non-cached images in eww

* lisp/net/shr.el (shr-tag-img): Construct a non-empty range to pass to
shr-image-fetched, to indicate where to insert the image.  Fixes the
issue introduced with commit 80852f843e69b81618f29cfb9aa4b074946cb3c4.
---
 lisp/net/shr.el | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lisp/net/shr.el b/lisp/net/shr.el
index c469e69..e463c7e 100644
--- a/lisp/net/shr.el
+++ b/lisp/net/shr.el
@@ -1499,7 +1499,7 @@ The preference is a float determined from `shr-prefer-media-type'."
           (insert " ")
 	  (url-queue-retrieve
 	   (shr-encode-url url) 'shr-image-fetched
-	   (list (current-buffer) start (set-marker (make-marker) (1- (point)))
+	   (list (current-buffer) start (set-marker (make-marker) (point))
                  (list :width width :height height))
 	   t t)))
 	(when (zerop shr-table-depth) ;; We are not in a table.
-- 
2.6.2.windows.1


[-- Attachment #3: Type: text/plain, Size: 172 bytes --]


And by the way, with this fix, disabling (url-store-in-cache
image-buffer) in `shr-image-fetched' does no longer break loading of
images, as I would have expected before.

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-02 22:02                                                   ` Alain Schneble
@ 2016-03-02 22:22                                                     ` Lars Ingebrigtsen
  2016-03-02 22:38                                                       ` Alain Schneble
  0 siblings, 1 reply; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-03-02 22:22 UTC (permalink / raw)
  To: Alain Schneble; +Cc: j_l_domenech, 22789

Alain Schneble <a.s@realize.ch> writes:

> * lisp/net/shr.el (shr-tag-img): Construct a non-empty range to pass to
> shr-image-fetched, to indicate where to insert the image.  Fixes the
> issue introduced with commit 80852f843e69b81618f29cfb9aa4b074946cb3c4.
> ---
>  lisp/net/shr.el | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lisp/net/shr.el b/lisp/net/shr.el
> index c469e69..e463c7e 100644
> --- a/lisp/net/shr.el
> +++ b/lisp/net/shr.el
> @@ -1499,7 +1499,7 @@ The preference is a float determined from `shr-prefer-media-type'."
>            (insert " ")
>  	  (url-queue-retrieve
>  	   (shr-encode-url url) 'shr-image-fetched
> -	   (list (current-buffer) start (set-marker (make-marker) (1- (point)))
> +	   (list (current-buffer) start (set-marker (make-marker) (point))
>                   (list :width width :height height))
>  	   t t)))
>  	(when (zerop shr-table-depth) ;; We are not in a table.

I'm not seeing this issue.  And that patch is two weeks old...  did you
just start seeing this today?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-02 22:22                                                     ` Lars Ingebrigtsen
@ 2016-03-02 22:38                                                       ` Alain Schneble
  2016-03-03  0:07                                                         ` Alain Schneble
  0 siblings, 1 reply; 124+ messages in thread
From: Alain Schneble @ 2016-03-02 22:38 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: j_l_domenech, 22789

Lars Ingebrigtsen <larsi@gnus.org> writes:

> I'm not seeing this issue.  And that patch is two weeks old...  did you
> just start seeing this today?

Yes, I saw it today.  The last week or so I was only working on MS
Windows.  Today I switched to GNU/Linux again.  I'll try to do a fresh
checkout to see if it vanishes...






^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-02 22:38                                                       ` Alain Schneble
@ 2016-03-03  0:07                                                         ` Alain Schneble
  2016-03-03  5:32                                                           ` Lars Ingebrigtsen
  0 siblings, 1 reply; 124+ messages in thread
From: Alain Schneble @ 2016-03-03  0:07 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: j_l_domenech, 22789

Alain Schneble <a.s@realize.ch> writes:

> Lars Ingebrigtsen <larsi@gnus.org> writes:
>
>> I'm not seeing this issue.  And that patch is two weeks old...  did you
>> just start seeing this today?
>
> Yes, I saw it today.  The last week or so I was only working on MS
> Windows.  Today I switched to GNU/Linux again.  I'll try to do a fresh
> checkout to see if it vanishes...

Fresh build, still the same issue here...  hmm...






^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-03  0:07                                                         ` Alain Schneble
@ 2016-03-03  5:32                                                           ` Lars Ingebrigtsen
  2016-03-03  9:03                                                             ` Alain Schneble
  0 siblings, 1 reply; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-03-03  5:32 UTC (permalink / raw)
  To: Alain Schneble; +Cc: j_l_domenech, 22789

Alain Schneble <a.s@realize.ch> writes:

> Alain Schneble <a.s@realize.ch> writes:
>
>> Lars Ingebrigtsen <larsi@gnus.org> writes:
>>
>>> I'm not seeing this issue.  And that patch is two weeks old...  did you
>>> just start seeing this today?
>>
>> Yes, I saw it today.  The last week or so I was only working on MS
>> Windows.  Today I switched to GNU/Linux again.  I'll try to do a fresh
>> checkout to see if it vanishes...
>
> Fresh build, still the same issue here...  hmm...

Odd that I'm not seeing this, too...  What GNU/Linux system are you
using?  I'm on Ubuntu, but I can't really see why that would change
anything...  Does your Emacs have SVG support?

Anyway, I think your patch is correct, so I've applied it.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-03  5:32                                                           ` Lars Ingebrigtsen
@ 2016-03-03  9:03                                                             ` Alain Schneble
  0 siblings, 0 replies; 124+ messages in thread
From: Alain Schneble @ 2016-03-03  9:03 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: j_l_domenech, 22789

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Odd that I'm not seeing this, too...  What GNU/Linux system are you
> using?  I'm on Ubuntu, but I can't really see why that would change
> anything...  Does your Emacs have SVG support?
>
> Anyway, I think your patch is correct, so I've applied it.

Thank you, Lars.  You are right!  It didn't happen if compiled with SVG
support.  I had /no/ SVG support before, so I ran into this issue.  With
SVG support it takes another path, where insert-image is not called with
an empty string.  That's why it didn't happen there.






^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-02 18:03                                         ` Lars Ingebrigtsen
  2016-03-02 19:07                                           ` Alain Schneble
@ 2016-03-02 19:43                                           ` Eli Zaretskii
  2016-03-03  5:23                                             ` Lars Ingebrigtsen
  1 sibling, 1 reply; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-02 19:43 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: j_l_domenech, a.s, 22789

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: a.s@realize.ch,  j_l_domenech@yahoo.com,  22789@debbugs.gnu.org
> Date: Wed, 02 Mar 2016 18:03:57 +0000
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > So what you can do instead of launching a timer is this: as long as
> > some process waits for some sync stuff to complete, reduce the timeout
> > with which we call 'pselect' to some reasonably small value, like half
> > a second or maybe 0.25 sec.  This will ensure the loop doesn't stop as
> > long as we wait for at least one such connection.  (This will need a
> > simple logic to not exit the loop too early; see the variable
> > timeout_reduced_for_timers for a similar logic we employ already for
> > timers.)
> 
> Aha!  With the following (for debugging purposes only) patch, it looks
> like I'm getting progress on my https connections even if I don't have a
> blinking cursor.  (I chose 50ms as my timeout, if I counted my zeroes
> correctly...)  Were you thinking about something along these lines?

Yes.





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-02 19:43                                           ` Eli Zaretskii
@ 2016-03-03  5:23                                             ` Lars Ingebrigtsen
  2016-03-04  8:51                                               ` Eli Zaretskii
  0 siblings, 1 reply; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-03-03  5:23 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: j_l_domenech, a.s, 22789

Eli Zaretskii <eliz@gnu.org> writes:

> Yes.

Ok, I've now committed a version that leads to

rm -rf ~/.emacs.d/url/cache/; ./src/emacs -Q

(progn
  (blink-cursor-mode -1)
  (eww "https://www.fsf.org"))

loading the images reliably.  From the hotel wifi they load fine
without the patch, but using the LTE card in the laptop, that form
always fails to load the images without the patch.  I guess that from
the hotel wifi the negotiation ends before Emacs backs off the pselect
timeout too much, so it finishes anyway.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-03  5:23                                             ` Lars Ingebrigtsen
@ 2016-03-04  8:51                                               ` Eli Zaretskii
  2016-03-04 11:33                                                 ` Lars Ingebrigtsen
  2016-03-04 11:37                                                 ` Lars Ingebrigtsen
  0 siblings, 2 replies; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-04  8:51 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: j_l_domenech, a.s, 22789

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: a.s@realize.ch,  j_l_domenech@yahoo.com,  22789@debbugs.gnu.org
> Date: Thu, 03 Mar 2016 05:23:51 +0000
> 
> Ok, I've now committed a version that leads to
> 
> rm -rf ~/.emacs.d/url/cache/; ./src/emacs -Q
> 
> (progn
>   (blink-cursor-mode -1)
>   (eww "https://www.fsf.org"))
> 
> loading the images reliably.

Same here on MS-Windows, thanks.

Interestingly, if I run the same under GDB, the images load with about
50% reliability.  So I guess there are still some subtle timing issues
somewhere.  Can you try this under GDB and see if there's some
difference?





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-04  8:51                                               ` Eli Zaretskii
@ 2016-03-04 11:33                                                 ` Lars Ingebrigtsen
  2016-03-04 14:48                                                   ` Eli Zaretskii
  2016-03-04 11:37                                                 ` Lars Ingebrigtsen
  1 sibling, 1 reply; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-03-04 11:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: j_l_domenech, a.s, 22789

Eli Zaretskii <eliz@gnu.org> writes:

> Interestingly, if I run the same under GDB, the images load with about
> 50% reliability.  So I guess there are still some subtle timing issues
> somewhere.  Can you try this under GDB and see if there's some
> difference?

It this with just running under gdb?  No special breakpoints or anything
that would pause Emacs for a long time?  

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-04 11:33                                                 ` Lars Ingebrigtsen
@ 2016-03-04 14:48                                                   ` Eli Zaretskii
  2016-03-05 12:26                                                     ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-04 14:48 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: j_l_domenech, a.s, 22789

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: a.s@realize.ch,  j_l_domenech@yahoo.com,  22789@debbugs.gnu.org
> Date: Fri, 04 Mar 2016 11:33:58 +0000
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Interestingly, if I run the same under GDB, the images load with about
> > 50% reliability.  So I guess there are still some subtle timing issues
> > somewhere.  Can you try this under GDB and see if there's some
> > difference?
> 
> It this with just running under gdb?

Yes.

> No special breakpoints or anything that would pause Emacs for a long
> time?

Not user breakpoints, no.  But GDB always inserts various hooks into
the program it runs, and in Emacs (if you run GDB from the src
directory), it also sets breakpoints in a few strategic places.  It
also announces new threads birth and demise etc.  All that evidently
does have some effect.





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-04 14:48                                                   ` Eli Zaretskii
@ 2016-03-05 12:26                                                     ` Lars Magne Ingebrigtsen
  2016-03-05 13:24                                                       ` Eli Zaretskii
  0 siblings, 1 reply; 124+ messages in thread
From: Lars Magne Ingebrigtsen @ 2016-03-05 12:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: j_l_domenech, a.s, 22789

This is probably totally unrelated, but I saw a "hang" in Emacs for the
first time in weeks.  shr was downloading an image over https, and Emacs
became unresponsive.  strace showed the following:

[pid 12890] pselect6(39, [3 6 7 12 13 14 15 16 17 18 19 20 38], [], NULL, {0, 0}, {NULL, 8}) = 2 (in [18 19], left {0, 0})
[pid 12890] pselect6(39, [3 6 7 12 13 14 15 16 17 18 19 20 38], [], NULL, {0, 0}, {NULL, 8}) = 2 (in [18 19], left {0, 0})
[pid 12890] pselect6(39, [3 6 7 12 13 14 15 16 17 18 19 20 38], [], NULL, {0, 0}, {NULL, 8}) = 2 (in [18 19], left {0, 0})
[pid 12890] pselect6(39, [3 6 7 12 13 14 15 16 17 18 19 20 38], [], NULL, {0, 0}, {NULL, 8}) = 2 (in [18 19], left {0, 0})
[pid 12890] pselect6(39, [3 6 7 12 13 14 15 16 17 18 19 20 38], [], NULL, {0, 0}, {NULL, 8}) = 2 (in [18 19], left {0, 0})
[pid 12890] pselect6(39, [3 6 7 12 13 14 15 16 17 18 19 20 38], [], NULL, {0, 0}, {NULL, 8}) = 2 (in [18 19], left {0, 0})
[pid 12890] pselect6(39, [3 6 7 12 13 14 15 16 17 18 19 20 38], [], NULL, {0, 0}, {NULL, 8}) = 2 (in [18 19], left {0, 0})
[pid 12890] pselect6(39, [3 6 7 12 13 14 15 16 17 18 19 20 38], [], NULL, {0, 0}, {NULL, 8}) = 2 (in [18 19], left {0, 0})
[pid 12890] pselect6(39, [3 6 7 12 13 14 15 16 17 18 19 20 38], [], NULL, {0, 0}, {NULL, 8}) = 2 (in [18 19], left {0, 0})

This went on for perhaps 30 seconds?  (Hm.  Which is what I have
`url-queue-timeout' set to...  Hm...) Then it stopped without
downloading the image.

Now, the pselect6 call has a timeout of {0, 0}?  If I'm reading the man
page right.

I think it's probably this code?

      /* If status of something has changed, and no input is
	 available, notify the user of the change right away.  After
	 this explicit check, we'll let the SIGCHLD handler zap
	 timeout to get our attention.  */
      if (update_tick != process_tick)
	{
	  fd_set Atemp;
	  fd_set Ctemp;

          if (kbd_on_hold_p ())
            FD_ZERO (&Atemp);
          else
            Atemp = input_wait_mask;
	  Ctemp = write_mask;

	  timeout = make_timespec (0, 0);
	  if ((pselect (max (max_process_desc, max_input_desc) + 1,
			&Atemp,
#ifdef NON_BLOCKING_CONNECT
			(num_pending_connects > 0 ? &Ctemp : NULL),
#else
			NULL,
#endif
			NULL, &timeout, NULL)
	       <= 0))

I'm not quite sure what it's trying to do...  Hm...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no






^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-05 12:26                                                     ` Lars Magne Ingebrigtsen
@ 2016-03-05 13:24                                                       ` Eli Zaretskii
  2016-03-06  9:33                                                         ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-05 13:24 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: j_l_domenech, a.s, 22789

> From: Lars Magne Ingebrigtsen <larsi@gnus.org>
> Cc: a.s@realize.ch,  j_l_domenech@yahoo.com,  22789@debbugs.gnu.org
> Date: Sat, 05 Mar 2016 13:26:50 +0100
> 
> This is probably totally unrelated, but I saw a "hang" in Emacs for the
> first time in weeks.  shr was downloading an image over https, and Emacs
> became unresponsive.  strace showed the following:
> 
> [pid 12890] pselect6(39, [3 6 7 12 13 14 15 16 17 18 19 20 38], [], NULL, {0, 0}, {NULL, 8}) = 2 (in [18 19], left {0, 0})
> [pid 12890] pselect6(39, [3 6 7 12 13 14 15 16 17 18 19 20 38], [], NULL, {0, 0}, {NULL, 8}) = 2 (in [18 19], left {0, 0})
> [pid 12890] pselect6(39, [3 6 7 12 13 14 15 16 17 18 19 20 38], [], NULL, {0, 0}, {NULL, 8}) = 2 (in [18 19], left {0, 0})
> [pid 12890] pselect6(39, [3 6 7 12 13 14 15 16 17 18 19 20 38], [], NULL, {0, 0}, {NULL, 8}) = 2 (in [18 19], left {0, 0})
> [pid 12890] pselect6(39, [3 6 7 12 13 14 15 16 17 18 19 20 38], [], NULL, {0, 0}, {NULL, 8}) = 2 (in [18 19], left {0, 0})
> [pid 12890] pselect6(39, [3 6 7 12 13 14 15 16 17 18 19 20 38], [], NULL, {0, 0}, {NULL, 8}) = 2 (in [18 19], left {0, 0})
> [pid 12890] pselect6(39, [3 6 7 12 13 14 15 16 17 18 19 20 38], [], NULL, {0, 0}, {NULL, 8}) = 2 (in [18 19], left {0, 0})
> [pid 12890] pselect6(39, [3 6 7 12 13 14 15 16 17 18 19 20 38], [], NULL, {0, 0}, {NULL, 8}) = 2 (in [18 19], left {0, 0})
> [pid 12890] pselect6(39, [3 6 7 12 13 14 15 16 17 18 19 20 38], [], NULL, {0, 0}, {NULL, 8}) = 2 (in [18 19], left {0, 0})
> 
> This went on for perhaps 30 seconds?  (Hm.  Which is what I have
> `url-queue-timeout' set to...  Hm...) Then it stopped without
> downloading the image.

Hard to say anything intelligent without knowing what are descriptors
18 and 19.

> Now, the pselect6 call has a timeout of {0, 0}?  If I'm reading the man
> page right.
> 
> I think it's probably this code?
> 
>       /* If status of something has changed, and no input is
> 	 available, notify the user of the change right away.  After
> 	 this explicit check, we'll let the SIGCHLD handler zap
> 	 timeout to get our attention.  */
>       if (update_tick != process_tick)
> 	{
> 	  fd_set Atemp;
> 	  fd_set Ctemp;
> 
>           if (kbd_on_hold_p ())
>             FD_ZERO (&Atemp);
>           else
>             Atemp = input_wait_mask;
> 	  Ctemp = write_mask;
> 
> 	  timeout = make_timespec (0, 0);
> 	  if ((pselect (max (max_process_desc, max_input_desc) + 1,
> 			&Atemp,
> #ifdef NON_BLOCKING_CONNECT
> 			(num_pending_connects > 0 ? &Ctemp : NULL),
> #else
> 			NULL,
> #endif
> 			NULL, &timeout, NULL)
> 	       <= 0))
> 
> I'm not quite sure what it's trying to do...  Hm...

The comment above seems to explain what it does, no?





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-05 13:24                                                       ` Eli Zaretskii
@ 2016-03-06  9:33                                                         ` Lars Magne Ingebrigtsen
  2016-03-06 15:26                                                           ` Eli Zaretskii
  0 siblings, 1 reply; 124+ messages in thread
From: Lars Magne Ingebrigtsen @ 2016-03-06  9:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: j_l_domenech, a.s, 22789

Eli Zaretskii <eliz@gnu.org> writes:

>>       /* If status of something has changed, and no input is
>> 	 available, notify the user of the change right away.  After
>> 	 this explicit check, we'll let the SIGCHLD handler zap
>> 	 timeout to get our attention.  */

[...]

> The comment above seems to explain what it does, no?

I don't understand it, or how it relates to calling pselect with a zero
timeout.  :-)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-06  9:33                                                         ` Lars Magne Ingebrigtsen
@ 2016-03-06 15:26                                                           ` Eli Zaretskii
  2016-03-06 18:33                                                             ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-06 15:26 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: j_l_domenech, a.s, 22789

> From: Lars Magne Ingebrigtsen <larsi@gnus.org>
> Cc: a.s@realize.ch,  j_l_domenech@yahoo.com,  22789@debbugs.gnu.org
> Date: Sun, 06 Mar 2016 10:33:08 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >>       /* If status of something has changed, and no input is
> >> 	 available, notify the user of the change right away.  After
> >> 	 this explicit check, we'll let the SIGCHLD handler zap
> >> 	 timeout to get our attention.  */
> 
> [...]
> 
> > The comment above seems to explain what it does, no?
> 
> I don't understand it, or how it relates to calling pselect with a zero
> timeout.  :-)

It wants to poll, so it sets the time-out at zero, meaning "don't wait
at all".  Or am _I_ missing something?





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-06 15:26                                                           ` Eli Zaretskii
@ 2016-03-06 18:33                                                             ` Lars Magne Ingebrigtsen
  2016-03-06 18:41                                                               ` Eli Zaretskii
  0 siblings, 1 reply; 124+ messages in thread
From: Lars Magne Ingebrigtsen @ 2016-03-06 18:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: j_l_domenech, a.s, 22789

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Lars Magne Ingebrigtsen <larsi@gnus.org>
>> Cc: a.s@realize.ch,  j_l_domenech@yahoo.com,  22789@debbugs.gnu.org
>> Date: Sun, 06 Mar 2016 10:33:08 +0100
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> >>       /* If status of something has changed, and no input is
>> >> 	 available, notify the user of the change right away.  After
>> >> 	 this explicit check, we'll let the SIGCHLD handler zap
>> >> 	 timeout to get our attention.  */
>> 
>> [...]
>> 
>> > The comment above seems to explain what it does, no?
>> 
>> I don't understand it, or how it relates to calling pselect with a zero
>> timeout.  :-)
>
> It wants to poll, so it sets the time-out at zero, meaning "don't wait
> at all".  Or am _I_ missing something?

But why does it want to poll?  I'm wondering whether any of the
async-related timeout changes are...  provoking the "if (update_tick !=
process_tick)" bit to always be true in some cases, and making this
polling behaviour happen infinitely...  or something...

But I've been completely unable to reproduce the error, so I'm just
speculating.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-06 18:33                                                             ` Lars Magne Ingebrigtsen
@ 2016-03-06 18:41                                                               ` Eli Zaretskii
  0 siblings, 0 replies; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-06 18:41 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: j_l_domenech, a.s, 22789

> From: Lars Magne Ingebrigtsen <larsi@gnus.org>
> Cc: a.s@realize.ch,  j_l_domenech@yahoo.com,  22789@debbugs.gnu.org
> Date: Sun, 06 Mar 2016 19:33:43 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> From: Lars Magne Ingebrigtsen <larsi@gnus.org>
> >> Cc: a.s@realize.ch,  j_l_domenech@yahoo.com,  22789@debbugs.gnu.org
> >> Date: Sun, 06 Mar 2016 10:33:08 +0100
> >> 
> >> Eli Zaretskii <eliz@gnu.org> writes:
> >> 
> >> >>       /* If status of something has changed, and no input is
> >> >> 	 available, notify the user of the change right away.  After
> >> >> 	 this explicit check, we'll let the SIGCHLD handler zap
> >> >> 	 timeout to get our attention.  */
> >> 
> >> [...]
> >> 
> >> > The comment above seems to explain what it does, no?
> >> 
> >> I don't understand it, or how it relates to calling pselect with a zero
> >> timeout.  :-)
> >
> > It wants to poll, so it sets the time-out at zero, meaning "don't wait
> > at all".  Or am _I_ missing something?
> 
> But why does it want to poll?

To "notify the user of the change right away".





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-04  8:51                                               ` Eli Zaretskii
  2016-03-04 11:33                                                 ` Lars Ingebrigtsen
@ 2016-03-04 11:37                                                 ` Lars Ingebrigtsen
  2016-03-04 11:40                                                   ` Lars Ingebrigtsen
  2016-03-04 15:40                                                   ` Eli Zaretskii
  1 sibling, 2 replies; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-03-04 11:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: j_l_domenech, a.s, 22789

Eli Zaretskii <eliz@gnu.org> writes:

> Interestingly, if I run the same under GDB, the images load with about
> 50% reliability.  So I guess there are still some subtle timing issues
> somewhere.  Can you try this under GDB and see if there's some
> difference?

To avoid the image cache getting in the way (which may be confusing when
testing this stuff), can you try

(progn
  (setq shr-ignore-cache t)
  (blink-cursor-mode -1)
  (eww "https://www.fsf.org"))

And you need Alain's fix for the shr image insertion stuff (which I
checked in yesterday) if your Emacs doesn't have SVG support.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-04 11:37                                                 ` Lars Ingebrigtsen
@ 2016-03-04 11:40                                                   ` Lars Ingebrigtsen
  2016-03-04 15:41                                                     ` Eli Zaretskii
  2016-03-04 15:40                                                   ` Eli Zaretskii
  1 sibling, 1 reply; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-03-04 11:40 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: j_l_domenech, a.s, 22789

(And I'm not able to reproduce the problem when running under gdb,
either with the airport wifi or the LTE card.)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no






^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-04 11:40                                                   ` Lars Ingebrigtsen
@ 2016-03-04 15:41                                                     ` Eli Zaretskii
  2016-03-04 15:43                                                       ` Lars Ingebrigtsen
  0 siblings, 1 reply; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-04 15:41 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: j_l_domenech, a.s, 22789

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: j_l_domenech@yahoo.com,  a.s@realize.ch,  22789@debbugs.gnu.org
> Date: Fri, 04 Mar 2016 11:40:12 +0000
> 
> (And I'm not able to reproduce the problem when running under gdb,
> either with the airport wifi or the LTE card.)

Not sure what that means, or where should I look for the problems.





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-04 15:41                                                     ` Eli Zaretskii
@ 2016-03-04 15:43                                                       ` Lars Ingebrigtsen
  2016-03-04 16:12                                                         ` Eli Zaretskii
  0 siblings, 1 reply; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-03-04 15:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: j_l_domenech, a.s, 22789

Eli Zaretskii <eliz@gnu.org> writes:

>> (And I'm not able to reproduce the problem when running under gdb,
>> either with the airport wifi or the LTE card.)
>
> Not sure what that means, or where should I look for the problems.

I was trying to say "it works for me (under gbd) both with a fast and a
slow network connection".  :-)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-04 15:43                                                       ` Lars Ingebrigtsen
@ 2016-03-04 16:12                                                         ` Eli Zaretskii
  0 siblings, 0 replies; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-04 16:12 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: j_l_domenech, a.s, 22789

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: j_l_domenech@yahoo.com,  a.s@realize.ch,  22789@debbugs.gnu.org
> Date: Fri, 04 Mar 2016 15:43:55 +0000
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> (And I'm not able to reproduce the problem when running under gdb,
> >> either with the airport wifi or the LTE card.)
> >
> > Not sure what that means, or where should I look for the problems.
> 
> I was trying to say "it works for me (under gbd) both with a fast and a
> slow network connection".  :-)

This part, I did understand ;-)





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-04 11:37                                                 ` Lars Ingebrigtsen
  2016-03-04 11:40                                                   ` Lars Ingebrigtsen
@ 2016-03-04 15:40                                                   ` Eli Zaretskii
  1 sibling, 0 replies; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-04 15:40 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: j_l_domenech, a.s, 22789

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: a.s@realize.ch,  j_l_domenech@yahoo.com,  22789@debbugs.gnu.org
> Date: Fri, 04 Mar 2016 11:37:26 +0000
> 
> To avoid the image cache getting in the way (which may be confusing when
> testing this stuff), can you try
> 
> (progn
>   (setq shr-ignore-cache t)
>   (blink-cursor-mode -1)
>   (eww "https://www.fsf.org"))

Works 100% reliably if I invoke Emacs from the shell, but only about
50% when run under GDB.  Maybe an additional factor is that my network
connection is somewhat slow.  But enlarging the url-queue-timeout to
120 didn't help.

> And you need Alain's fix for the shr image insertion stuff (which I
> checked in yesterday) if your Emacs doesn't have SVG support.

My Emacs does have SVG support (via librsvg), and my master branch is
current anyway.





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-01  3:43                                   ` Eli Zaretskii
  2016-03-01  5:17                                     ` Lars Ingebrigtsen
@ 2016-03-01 15:43                                     ` Alain Schneble
  2016-03-01 16:07                                       ` Eli Zaretskii
  1 sibling, 1 reply; 124+ messages in thread
From: Alain Schneble @ 2016-03-01 15:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, j_l_domenech, 22789

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Alain Schneble <a.s@realize.ch>
>> Date: Tue, 1 Mar 2016 00:20:45 +0100
>> Cc: j_l_domenech@yahoo.com, 22789@debbugs.gnu.org
>> 
>> See also patch in the previous message.  If we decide to go this
>> direction, then I think we should also inhibit the first call to
>> gnutls_try_handshake from gnutls_boot, for async sockets.  This call is
>> currently implicit and will probably fail (return with EAGAIN) in all
>> practical cases.
>
> That patch exposes a w32-specific stuff to process.c, so it cannot be
> right.

Thanks.  I was unaware of that.  I should have done a grep on the usage
of FILE_CONNECT first before proposing this patch.  That reveals clearly
that it is w32-specific.  I'm sorry.






^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-01 15:43                                     ` Alain Schneble
@ 2016-03-01 16:07                                       ` Eli Zaretskii
  2016-03-01 16:26                                         ` Alain Schneble
  0 siblings, 1 reply; 124+ messages in thread
From: Eli Zaretskii @ 2016-03-01 16:07 UTC (permalink / raw)
  To: Alain Schneble; +Cc: larsi, j_l_domenech, 22789

> From: Alain Schneble <a.s@realize.ch>
> CC: <larsi@gnus.org>, <j_l_domenech@yahoo.com>, <22789@debbugs.gnu.org>
> Date: Tue, 1 Mar 2016 16:43:50 +0100
> 
> > That patch exposes a w32-specific stuff to process.c, so it cannot be
> > right.
> 
> Thanks.  I was unaware of that.  I should have done a grep on the usage
> of FILE_CONNECT first before proposing this patch.  That reveals clearly
> that it is w32-specific.  I'm sorry.

No need to apologize, you couldn't know this.  (You may wish reading
the large comment around line 800 in w32proc.c, which describes how
this works on Windows.)





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-03-01 16:07                                       ` Eli Zaretskii
@ 2016-03-01 16:26                                         ` Alain Schneble
  0 siblings, 0 replies; 124+ messages in thread
From: Alain Schneble @ 2016-03-01 16:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, j_l_domenech, 22789

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Alain Schneble <a.s@realize.ch>
>> CC: <larsi@gnus.org>, <j_l_domenech@yahoo.com>, <22789@debbugs.gnu.org>
>> Date: Tue, 1 Mar 2016 16:43:50 +0100
>> 
>> > That patch exposes a w32-specific stuff to process.c, so it cannot be
>> > right.
>> 
>> Thanks.  I was unaware of that.  I should have done a grep on the usage
>> of FILE_CONNECT first before proposing this patch.  That reveals clearly
>> that it is w32-specific.  I'm sorry.
>
> No need to apologize, you couldn't know this.  (You may wish reading
> the large comment around line 800 in w32proc.c, which describes how
> this works on Windows.)

Thank you, Eli.






^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-27 18:05           ` Alain Schneble
  2016-02-27 22:38             ` Lars Ingebrigtsen
@ 2016-02-28 16:47             ` Eli Zaretskii
  1 sibling, 0 replies; 124+ messages in thread
From: Eli Zaretskii @ 2016-02-28 16:47 UTC (permalink / raw)
  To: Alain Schneble; +Cc: larsi, j_l_domenech, 22789

> From: Alain Schneble <a.s@realize.ch>
> CC: Eli Zaretskii <eliz@gnu.org>, José L. Doménech
> 	<j_l_domenech@yahoo.com>, <22789@debbugs.gnu.org>
> Date: Sat, 27 Feb 2016 19:05:20 +0100
> 
> Lars Ingebrigtsen <larsi@gnus.org> writes:
> 
> > (setq proc
> > (make-network-process :name "foo"
> > 		      :buffer (get-buffer-create "*foo*")
> > 		      :host "imap.gmail.com"
> > 		      :service 993
> > 		      :nowait t
> > 		      :tls-parameters
> > 		      (cons 'gnutls-x509pki
> > 			    (gnutls-boot-parameters
> > 			     :type 'gnutls-x509pki
> > 			     :hostname "imap.gmail.com"))))
> >
> > * OK Gimap ready for requests from 60.225.211.161 qr7mb410250987iec
> >
> > should appear.  Also, after evaling that, what does
> 
> It seems to be a timing issue.  If I set gnutls-log-level to 5, this
> works also on Windows (i.e i get OK Gimap...).

Actually, with the latest trunk this works even without increasing the
log level.  I guess the latest changes in gnutls.c did it.

> I guess that it enters this path because the socket is not ready yet.
> But why?

Because the connection didn't complete yet, and we are already trying
the GnuTLS handshake.





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-24 23:48       ` Lars Ingebrigtsen
  2016-02-25  0:02         ` Lars Ingebrigtsen
@ 2016-02-25  3:46         ` Eli Zaretskii
  2016-02-25  5:00           ` Lars Ingebrigtsen
  1 sibling, 1 reply; 124+ messages in thread
From: Eli Zaretskii @ 2016-02-25  3:46 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: j_l_domenech, 22789

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: José L. Doménech
>  <j_l_domenech@yahoo.com>,  22789@debbugs.gnu.org
> Date: Thu, 25 Feb 2016 10:48:26 +1100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > I confirm the problem with the MS-Windows build: on master, https
> > doesn't work; on emacs-25 it does.
> >
> > First suspect is the async changes, of course.
> 
> Yup.  I'll try do do a build without getaddrinfo_a support and see
> whether I can reproduce the https error here...

Note that the Windows build doesn't use getaddrinfo, either, it uses
gethostbyname etc.





^ permalink raw reply	[flat|nested] 124+ messages in thread

* bug#22789: 25.1.50; In last master build https connections stop working
  2016-02-25  3:46         ` Eli Zaretskii
@ 2016-02-25  5:00           ` Lars Ingebrigtsen
  0 siblings, 0 replies; 124+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-25  5:00 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: j_l_domenech, 22789

Eli Zaretskii <eliz@gnu.org> writes:

> Note that the Windows build doesn't use getaddrinfo, either, it uses
> gethostbyname etc.

Yeah, I undefined HAVE_GETADDRINFO, too, but wasn't able to reproduce.
Are there other defines that differ between Linux and Windows (other
than the obvious WINDOWSNT/GNU_LINUX ones) that are relevant for
process.c?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 124+ messages in thread

end of thread, other threads:[~2016-03-10 14:59 UTC | newest]

Thread overview: 124+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-24 10:26 bug#22789: 25.1.50; In last master build https connections stop working José L. Doménech
2016-02-24 14:00 ` Lars Ingebrigtsen
2016-02-24 16:09   ` José L. Doménech
2016-02-24 18:06     ` Eli Zaretskii
2016-02-24 23:48       ` Lars Ingebrigtsen
2016-02-25  0:02         ` Lars Ingebrigtsen
2016-02-25  1:09           ` Lars Ingebrigtsen
2016-02-25 16:41           ` Eli Zaretskii
2016-02-26  2:29             ` Lars Ingebrigtsen
2016-02-26  9:36               ` Eli Zaretskii
2016-02-27  2:30                 ` Lars Ingebrigtsen
2016-02-27  2:43                   ` John Wiegley
2016-02-27  3:50                     ` Lars Ingebrigtsen
2016-02-27  8:14                       ` Eli Zaretskii
2016-02-27  3:49                   ` Lars Ingebrigtsen
2016-02-27  8:10                     ` Eli Zaretskii
2016-02-27  8:13                   ` Eli Zaretskii
2016-02-27 18:05           ` Alain Schneble
2016-02-27 22:38             ` Lars Ingebrigtsen
2016-02-27 23:06               ` Alain Schneble
2016-02-27 23:49                 ` Alain Schneble
2016-02-28  3:31                   ` Lars Ingebrigtsen
2016-02-28  9:58                     ` Alain Schneble
2016-02-28 16:53                     ` Eli Zaretskii
2016-02-29  2:37                       ` Lars Ingebrigtsen
2016-02-28  3:43                   ` Eli Zaretskii
2016-02-28  9:48                     ` Alain Schneble
2016-02-28 17:00                       ` Eli Zaretskii
2016-02-29  2:49                         ` Lars Ingebrigtsen
2016-02-29  3:43                           ` Eli Zaretskii
2016-02-29  4:38                             ` Lars Ingebrigtsen
2016-02-29  9:55                         ` Alain Schneble
2016-02-29 10:03                           ` Lars Ingebrigtsen
2016-02-29 17:57                             ` Alain Schneble
2016-02-29 18:45                               ` Eli Zaretskii
2016-02-29 21:22                                 ` Lars Ingebrigtsen
2016-03-01  3:35                                   ` Eli Zaretskii
2016-02-29 23:13                                 ` Alain Schneble
2016-03-01  0:41                                   ` Lars Ingebrigtsen
2016-03-01  3:41                                   ` Eli Zaretskii
2016-03-01  4:29                                     ` Lars Ingebrigtsen
2016-03-01  4:30                                     ` Lars Ingebrigtsen
2016-03-01  9:00                                       ` Andreas Schwab
2016-03-01 14:12                                         ` Lars Ingebrigtsen
2016-03-01 14:25                                           ` Alain Schneble
2016-03-01 14:43                                             ` Lars Ingebrigtsen
2016-03-01 15:59                                             ` Eli Zaretskii
2016-03-01 16:19                                               ` Alain Schneble
2016-03-01 17:00                                                 ` Eli Zaretskii
2016-03-01 17:09                                                   ` Alain Schneble
2016-03-01 17:22                                                     ` Eli Zaretskii
2016-03-01 17:55                                                       ` Alain Schneble
2016-03-01 18:13                                                         ` Eli Zaretskii
2016-03-01 16:33                                               ` Andreas Schwab
2016-03-01 15:53                                           ` Eli Zaretskii
2016-03-01 15:36                                     ` Alain Schneble
2016-03-01 16:05                                       ` Eli Zaretskii
2016-03-01 16:25                                         ` Alain Schneble
2016-03-04  8:56                                         ` Eli Zaretskii
2016-03-04 16:55                                           ` Alain Schneble
2016-03-04 21:36                                             ` Alain Schneble
2016-03-04 22:33                                               ` Alain Schneble
2016-03-05  8:23                                               ` Eli Zaretskii
2016-03-05 18:27                                                 ` Alain Schneble
2016-03-05 19:21                                                   ` Eli Zaretskii
2016-03-06 22:45                                                     ` Alain Schneble
2016-03-06 23:24                                                       ` Alain Schneble
2016-03-07  8:49                                                         ` Alain Schneble
2016-03-07 16:08                                                           ` Eli Zaretskii
2016-03-07 17:20                                                             ` Alain Schneble
2016-03-07 17:33                                                               ` Eli Zaretskii
2016-03-07 18:03                                                                 ` Alain Schneble
2016-03-07 18:10                                                                   ` Eli Zaretskii
2016-03-07 18:26                                                                     ` Alain Schneble
2016-03-07 16:07                                                         ` Eli Zaretskii
2016-03-07 16:47                                                           ` Alain Schneble
2016-03-07 22:21                                                           ` Alain Schneble
2016-03-08 16:40                                                             ` Eli Zaretskii
2016-03-08 16:43                                                               ` Alain Schneble
2016-03-10 14:45                                                             ` Eli Zaretskii
2016-03-10 14:59                                                               ` Alain Schneble
2016-03-06  9:31                                                   ` Lars Magne Ingebrigtsen
2016-03-06 15:24                                                     ` Eli Zaretskii
2016-03-05  8:46                                               ` Lars Magne Ingebrigtsen
2016-03-05 18:32                                                 ` Alain Schneble
2016-02-29 21:18                               ` Lars Ingebrigtsen
2016-02-29 23:20                                 ` Alain Schneble
2016-03-01  3:43                                   ` Eli Zaretskii
2016-03-01  5:17                                     ` Lars Ingebrigtsen
2016-03-01 15:46                                       ` Eli Zaretskii
2016-03-02 18:03                                         ` Lars Ingebrigtsen
2016-03-02 19:07                                           ` Alain Schneble
2016-03-02 19:15                                             ` Lars Ingebrigtsen
2016-03-02 19:38                                               ` Alain Schneble
2016-03-02 20:46                                                 ` Alain Schneble
2016-03-02 22:02                                                   ` Alain Schneble
2016-03-02 22:22                                                     ` Lars Ingebrigtsen
2016-03-02 22:38                                                       ` Alain Schneble
2016-03-03  0:07                                                         ` Alain Schneble
2016-03-03  5:32                                                           ` Lars Ingebrigtsen
2016-03-03  9:03                                                             ` Alain Schneble
2016-03-02 19:43                                           ` Eli Zaretskii
2016-03-03  5:23                                             ` Lars Ingebrigtsen
2016-03-04  8:51                                               ` Eli Zaretskii
2016-03-04 11:33                                                 ` Lars Ingebrigtsen
2016-03-04 14:48                                                   ` Eli Zaretskii
2016-03-05 12:26                                                     ` Lars Magne Ingebrigtsen
2016-03-05 13:24                                                       ` Eli Zaretskii
2016-03-06  9:33                                                         ` Lars Magne Ingebrigtsen
2016-03-06 15:26                                                           ` Eli Zaretskii
2016-03-06 18:33                                                             ` Lars Magne Ingebrigtsen
2016-03-06 18:41                                                               ` Eli Zaretskii
2016-03-04 11:37                                                 ` Lars Ingebrigtsen
2016-03-04 11:40                                                   ` Lars Ingebrigtsen
2016-03-04 15:41                                                     ` Eli Zaretskii
2016-03-04 15:43                                                       ` Lars Ingebrigtsen
2016-03-04 16:12                                                         ` Eli Zaretskii
2016-03-04 15:40                                                   ` Eli Zaretskii
2016-03-01 15:43                                     ` Alain Schneble
2016-03-01 16:07                                       ` Eli Zaretskii
2016-03-01 16:26                                         ` Alain Schneble
2016-02-28 16:47             ` Eli Zaretskii
2016-02-25  3:46         ` Eli Zaretskii
2016-02-25  5:00           ` Lars Ingebrigtsen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).