unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Syscalls stat() and faccessat() sometimes fail with errno EINTR when accessing SMB share through VPN
@ 2021-02-12 15:11 Tobias Bading
  2021-02-12 15:24 ` Tobias Bading
  0 siblings, 1 reply; 5+ messages in thread
From: Tobias Bading @ 2021-02-12 15:11 UTC (permalink / raw)
  To: emacs-devel

Hi.

Everyone doing alright hacking/working from home?

Yesterday I encountered a curious problem while using (a self-built)
Emacs 26.3.50 on my GNU/Linux machine at home:

I've set up the automounter to mount SMB shares of Windows servers in
the office through the company's VPN. This works fine for e.g. "ls -lR
/smb/server/share/dir" in a shell, except for bad performance. The
problems start when I try to work with the same directory from within
Emacs with dired-mode, i.e. a simple C-x C-f /smb/server/share/dir.
Quite regularly I get errors like "dired-get-file-for-visit: File no
longer exists; type ‘g’ to update Dired buffer", although nothing has
changed in the directory.

So I put Emacs under a microscope with

strace -f -e trace=%file -tt emacs 2>&1 | grep --line-buffered -A2 /smb/
 >emacs.log

which revealed errors like

faccessat(AT_FDCWD, "/smb/server/share/dir", F_OK) = -1 EINTR
(Interrupted system call)
--- SIGIO {si_signo=SIGIO, si_code=SI_KERNEL} ---
[...]
stat("/smb/server/share/dir", 0x7fffe49383b0) = -1 EINTR (Interrupted
system call)
--- SIGIO {si_signo=SIGIO, si_code=SI_KERNEL} ---
--- SIGIO {si_signo=SIGIO, si_code=SI_KERNEL} ---

(The timestamps of the SIGIO lines suggest that these signals have
nothing to do with the EINTR errors reported beforehand, the timestamps
are often over 0.1 seconds apart.)

So far I've only seen stat() and faccessat() failing with EINTR. The
funny thing is, the man pages of those two system calls don't mention
EINTR at all. man signal(7) also doesn't mention these functions in the
paragraph about SA_RESTART. Anyway, I've checked the source code of my
Emacs 26.3.50 build and found emacs_sigaction_flags() in src/sysdep.c,
which does return 0 (as intended by the dev(s) who wrote the code). I've
changed the implementation to "return SA_RESTART;", but that had no effect.

To make sure I didn't mess up my own Emacs 26 git branch somehow, I did
a quick test with the current HEAD of origin/master and "src/emacs -Q",
which seems to have the same problem, revealed by error messages like
"apply: Setting current directory: Interrupted system call,
/smb/server/share/dir/".

I'm stumped. A (shell-command "ls -lAFNR /smb/server/share/big-dir/")
works fine, as does a "cp -a" of that directory. But when the Emacs
process itself calls stat() or faccessat(), things go sideways? Why?
What am I missing? Are stat() and faccessat() even allowed to fail with
EINTR? Is this a kernel bug, maybe somewhere in the CIFS client
implementation? But an strace of "ls -lAFNR /smb/server/share/big-dir/"
shows not a single EINTR! So why would only Emacs be affected?

Please enlighten me... ;)

Tobias




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Syscalls stat() and faccessat() sometimes fail with errno EINTR when accessing SMB share through VPN
  2021-02-12 15:11 Syscalls stat() and faccessat() sometimes fail with errno EINTR when accessing SMB share through VPN Tobias Bading
@ 2021-02-12 15:24 ` Tobias Bading
  2021-02-12 18:46   ` Tobias Bading
  0 siblings, 1 reply; 5+ messages in thread
From: Tobias Bading @ 2021-02-12 15:24 UTC (permalink / raw)
  To: emacs-devel

Ok, the grep -A2 wasn't that bright XD. With -C2 I just got

16:19:23.935175 --- SIGIO {si_signo=SIGIO, si_code=SI_KERNEL} ---
16:19:23.937296 faccessat(AT_FDCWD, "/smb/server/share/dir/", F_OK) = -1
EINTR (Interrupted system call)
[...]
16:19:34.191156 --- SIGIO {si_signo=SIGIO, si_code=SI_KERNEL} ---
16:19:34.192562 stat("/smb/server/share/dir/subdir", 0x7fff40feb5c0) =
-1 EINTR (Interrupted system call)
[...]
16:19:39.358023 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED,
si_pid=68151, si_uid=501, si_status=0, si_utime=1, si_stime=1} ---
16:19:39.358477 stat("/smb/server/share/dir", 0x7fff40fea8b0) = -1 EINTR
(Interrupted system call)
[...]
16:20:01.111670 --- SIGIO {si_signo=SIGIO, si_code=SI_KERNEL} ---
16:20:01.113070 stat("/smb/server/share/dir/subdir", 0x7fff40feb5c0) =
-1 EINTR (Interrupted system call)
[...]
16:20:05.519682 --- SIGIO {si_signo=SIGIO, si_code=SI_KERNEL} ---
16:20:05.520931 stat("/smb/server/share/dir/subdir", 0x7fff40feb5c0) =
-1 EINTR (Interrupted system call)

which makes a bit more sense I guess.

Tobias

---

On 12.02.21 16:11, Tobias Bading wrote:
> Hi.
>
> Everyone doing alright hacking/working from home?
>
> Yesterday I encountered a curious problem while using (a self-built)
> Emacs 26.3.50 on my GNU/Linux machine at home:
>
> I've set up the automounter to mount SMB shares of Windows servers in
> the office through the company's VPN. This works fine for e.g. "ls -lR
> /smb/server/share/dir" in a shell, except for bad performance. The
> problems start when I try to work with the same directory from within
> Emacs with dired-mode, i.e. a simple C-x C-f /smb/server/share/dir.
> Quite regularly I get errors like "dired-get-file-for-visit: File no
> longer exists; type ‘g’ to update Dired buffer", although nothing has
> changed in the directory.
>
> So I put Emacs under a microscope with
>
> strace -f -e trace=%file -tt emacs 2>&1 | grep --line-buffered -A2
> /smb/ >emacs.log
>
> which revealed errors like
>
> faccessat(AT_FDCWD, "/smb/server/share/dir", F_OK) = -1 EINTR
> (Interrupted system call)
> --- SIGIO {si_signo=SIGIO, si_code=SI_KERNEL} ---
> [...]
> stat("/smb/server/share/dir", 0x7fffe49383b0) = -1 EINTR (Interrupted
> system call)
> --- SIGIO {si_signo=SIGIO, si_code=SI_KERNEL} ---
> --- SIGIO {si_signo=SIGIO, si_code=SI_KERNEL} ---
>
> (The timestamps of the SIGIO lines suggest that these signals have
> nothing to do with the EINTR errors reported beforehand, the
> timestamps are often over 0.1 seconds apart.)
>
> So far I've only seen stat() and faccessat() failing with EINTR. The
> funny thing is, the man pages of those two system calls don't mention
> EINTR at all. man signal(7) also doesn't mention these functions in
> the paragraph about SA_RESTART. Anyway, I've checked the source code
> of my Emacs 26.3.50 build and found emacs_sigaction_flags() in
> src/sysdep.c, which does return 0 (as intended by the dev(s) who wrote
> the code). I've changed the implementation to "return SA_RESTART;",
> but that had no effect.
>
> To make sure I didn't mess up my own Emacs 26 git branch somehow, I
> did a quick test with the current HEAD of origin/master and "src/emacs
> -Q", which seems to have the same problem, revealed by error messages
> like "apply: Setting current directory: Interrupted system call,
> /smb/server/share/dir/".
>
> I'm stumped. A (shell-command "ls -lAFNR /smb/server/share/big-dir/")
> works fine, as does a "cp -a" of that directory. But when the Emacs
> process itself calls stat() or faccessat(), things go sideways? Why?
> What am I missing? Are stat() and faccessat() even allowed to fail
> with EINTR? Is this a kernel bug, maybe somewhere in the CIFS client
> implementation? But an strace of "ls -lAFNR
> /smb/server/share/big-dir/" shows not a single EINTR! So why would
> only Emacs be affected?
>
> Please enlighten me... ;)
>
> Tobias
>




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Syscalls stat() and faccessat() sometimes fail with errno EINTR when accessing SMB share through VPN
  2021-02-12 15:24 ` Tobias Bading
@ 2021-02-12 18:46   ` Tobias Bading
  2021-02-14 11:23     ` Tobias Bading
  0 siblings, 1 reply; 5+ messages in thread
From: Tobias Bading @ 2021-02-12 18:46 UTC (permalink / raw)
  To: emacs-devel

I've placed 22 TEMP_FAILURE_RETRY() macros around the 14 stat() and 8
faccessat() calls in dired.c, fileio.c, filelock.c, lread.c, process.c,
and sysdep.c. So far this band-aid seems to circumvent the problem, but
I still have no idea whether those calls are even permitted to fail with
errno EINTR.

Tobias

---

On 12.02.21 16:24, Tobias Bading wrote:
> Ok, the grep -A2 wasn't that bright XD. With -C2 I just got
>
> 16:19:23.935175 --- SIGIO {si_signo=SIGIO, si_code=SI_KERNEL} ---
> 16:19:23.937296 faccessat(AT_FDCWD, "/smb/server/share/dir/", F_OK) =
> -1 EINTR (Interrupted system call)
> [...]
> 16:19:34.191156 --- SIGIO {si_signo=SIGIO, si_code=SI_KERNEL} ---
> 16:19:34.192562 stat("/smb/server/share/dir/subdir", 0x7fff40feb5c0) =
> -1 EINTR (Interrupted system call)
> [...]
> 16:19:39.358023 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED,
> si_pid=68151, si_uid=501, si_status=0, si_utime=1, si_stime=1} ---
> 16:19:39.358477 stat("/smb/server/share/dir", 0x7fff40fea8b0) = -1
> EINTR (Interrupted system call)
> [...]
> 16:20:01.111670 --- SIGIO {si_signo=SIGIO, si_code=SI_KERNEL} ---
> 16:20:01.113070 stat("/smb/server/share/dir/subdir", 0x7fff40feb5c0) =
> -1 EINTR (Interrupted system call)
> [...]
> 16:20:05.519682 --- SIGIO {si_signo=SIGIO, si_code=SI_KERNEL} ---
> 16:20:05.520931 stat("/smb/server/share/dir/subdir", 0x7fff40feb5c0) =
> -1 EINTR (Interrupted system call)
>
> which makes a bit more sense I guess.
>
> Tobias
>
> ---
>
> On 12.02.21 16:11, Tobias Bading wrote:
>> Hi.
>>
>> Everyone doing alright hacking/working from home?
>>
>> Yesterday I encountered a curious problem while using (a self-built)
>> Emacs 26.3.50 on my GNU/Linux machine at home:
>>
>> I've set up the automounter to mount SMB shares of Windows servers in
>> the office through the company's VPN. This works fine for e.g. "ls
>> -lR /smb/server/share/dir" in a shell, except for bad performance.
>> The problems start when I try to work with the same directory from
>> within Emacs with dired-mode, i.e. a simple C-x C-f
>> /smb/server/share/dir. Quite regularly I get errors like
>> "dired-get-file-for-visit: File no longer exists; type ‘g’ to update
>> Dired buffer", although nothing has changed in the directory.
>>
>> So I put Emacs under a microscope with
>>
>> strace -f -e trace=%file -tt emacs 2>&1 | grep --line-buffered -A2
>> /smb/ >emacs.log
>>
>> which revealed errors like
>>
>> faccessat(AT_FDCWD, "/smb/server/share/dir", F_OK) = -1 EINTR
>> (Interrupted system call)
>> --- SIGIO {si_signo=SIGIO, si_code=SI_KERNEL} ---
>> [...]
>> stat("/smb/server/share/dir", 0x7fffe49383b0) = -1 EINTR (Interrupted
>> system call)
>> --- SIGIO {si_signo=SIGIO, si_code=SI_KERNEL} ---
>> --- SIGIO {si_signo=SIGIO, si_code=SI_KERNEL} ---
>>
>> (The timestamps of the SIGIO lines suggest that these signals have
>> nothing to do with the EINTR errors reported beforehand, the
>> timestamps are often over 0.1 seconds apart.)
>>
>> So far I've only seen stat() and faccessat() failing with EINTR. The
>> funny thing is, the man pages of those two system calls don't mention
>> EINTR at all. man signal(7) also doesn't mention these functions in
>> the paragraph about SA_RESTART. Anyway, I've checked the source code
>> of my Emacs 26.3.50 build and found emacs_sigaction_flags() in
>> src/sysdep.c, which does return 0 (as intended by the dev(s) who
>> wrote the code). I've changed the implementation to "return
>> SA_RESTART;", but that had no effect.
>>
>> To make sure I didn't mess up my own Emacs 26 git branch somehow, I
>> did a quick test with the current HEAD of origin/master and
>> "src/emacs -Q", which seems to have the same problem, revealed by
>> error messages like "apply: Setting current directory: Interrupted
>> system call, /smb/server/share/dir/".
>>
>> I'm stumped. A (shell-command "ls -lAFNR /smb/server/share/big-dir/")
>> works fine, as does a "cp -a" of that directory. But when the Emacs
>> process itself calls stat() or faccessat(), things go sideways? Why?
>> What am I missing? Are stat() and faccessat() even allowed to fail
>> with EINTR? Is this a kernel bug, maybe somewhere in the CIFS client
>> implementation? But an strace of "ls -lAFNR
>> /smb/server/share/big-dir/" shows not a single EINTR! So why would
>> only Emacs be affected?
>>
>> Please enlighten me... ;)
>>
>> Tobias
>>
>




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Syscalls stat() and faccessat() sometimes fail with errno EINTR when accessing SMB share through VPN
  2021-02-12 18:46   ` Tobias Bading
@ 2021-02-14 11:23     ` Tobias Bading
  2021-02-14 17:59       ` Matt Armstrong
  0 siblings, 1 reply; 5+ messages in thread
From: Tobias Bading @ 2021-02-14 11:23 UTC (permalink / raw)
  To: emacs-devel

I've asked the GNU libc hackers as well:
https://sourceware.org/pipermail/libc-help/2021-February/005663.html

Tobias

---

On 12.02.21 19:46, Tobias Bading wrote:
> I've placed 22 TEMP_FAILURE_RETRY() macros around the 14 stat() and 8
> faccessat() calls in dired.c, fileio.c, filelock.c, lread.c,
> process.c, and sysdep.c. So far this band-aid seems to circumvent the
> problem, but I still have no idea whether those calls are even
> permitted to fail with errno EINTR.
>
> Tobias
>
> ---
>
> On 12.02.21 16:24, Tobias Bading wrote:
>> Ok, the grep -A2 wasn't that bright XD. With -C2 I just got
>>
>> 16:19:23.935175 --- SIGIO {si_signo=SIGIO, si_code=SI_KERNEL} ---
>> 16:19:23.937296 faccessat(AT_FDCWD, "/smb/server/share/dir/", F_OK) =
>> -1 EINTR (Interrupted system call)
>> [...]
>> 16:19:34.191156 --- SIGIO {si_signo=SIGIO, si_code=SI_KERNEL} ---
>> 16:19:34.192562 stat("/smb/server/share/dir/subdir", 0x7fff40feb5c0)
>> = -1 EINTR (Interrupted system call)
>> [...]
>> 16:19:39.358023 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED,
>> si_pid=68151, si_uid=501, si_status=0, si_utime=1, si_stime=1} ---
>> 16:19:39.358477 stat("/smb/server/share/dir", 0x7fff40fea8b0) = -1
>> EINTR (Interrupted system call)
>> [...]
>> 16:20:01.111670 --- SIGIO {si_signo=SIGIO, si_code=SI_KERNEL} ---
>> 16:20:01.113070 stat("/smb/server/share/dir/subdir", 0x7fff40feb5c0)
>> = -1 EINTR (Interrupted system call)
>> [...]
>> 16:20:05.519682 --- SIGIO {si_signo=SIGIO, si_code=SI_KERNEL} ---
>> 16:20:05.520931 stat("/smb/server/share/dir/subdir", 0x7fff40feb5c0)
>> = -1 EINTR (Interrupted system call)
>>
>> which makes a bit more sense I guess.
>>
>> Tobias
>>
>> ---
>>
>> On 12.02.21 16:11, Tobias Bading wrote:
>>> Hi.
>>>
>>> Everyone doing alright hacking/working from home?
>>>
>>> Yesterday I encountered a curious problem while using (a self-built)
>>> Emacs 26.3.50 on my GNU/Linux machine at home:
>>>
>>> I've set up the automounter to mount SMB shares of Windows servers
>>> in the office through the company's VPN. This works fine for e.g.
>>> "ls -lR /smb/server/share/dir" in a shell, except for bad
>>> performance. The problems start when I try to work with the same
>>> directory from within Emacs with dired-mode, i.e. a simple C-x C-f
>>> /smb/server/share/dir. Quite regularly I get errors like
>>> "dired-get-file-for-visit: File no longer exists; type ‘g’ to update
>>> Dired buffer", although nothing has changed in the directory.
>>>
>>> So I put Emacs under a microscope with
>>>
>>> strace -f -e trace=%file -tt emacs 2>&1 | grep --line-buffered -A2
>>> /smb/ >emacs.log
>>>
>>> which revealed errors like
>>>
>>> faccessat(AT_FDCWD, "/smb/server/share/dir", F_OK) = -1 EINTR
>>> (Interrupted system call)
>>> --- SIGIO {si_signo=SIGIO, si_code=SI_KERNEL} ---
>>> [...]
>>> stat("/smb/server/share/dir", 0x7fffe49383b0) = -1 EINTR
>>> (Interrupted system call)
>>> --- SIGIO {si_signo=SIGIO, si_code=SI_KERNEL} ---
>>> --- SIGIO {si_signo=SIGIO, si_code=SI_KERNEL} ---
>>>
>>> (The timestamps of the SIGIO lines suggest that these signals have
>>> nothing to do with the EINTR errors reported beforehand, the
>>> timestamps are often over 0.1 seconds apart.)
>>>
>>> So far I've only seen stat() and faccessat() failing with EINTR. The
>>> funny thing is, the man pages of those two system calls don't
>>> mention EINTR at all. man signal(7) also doesn't mention these
>>> functions in the paragraph about SA_RESTART. Anyway, I've checked
>>> the source code of my Emacs 26.3.50 build and found
>>> emacs_sigaction_flags() in src/sysdep.c, which does return 0 (as
>>> intended by the dev(s) who wrote the code). I've changed the
>>> implementation to "return SA_RESTART;", but that had no effect.
>>>
>>> To make sure I didn't mess up my own Emacs 26 git branch somehow, I
>>> did a quick test with the current HEAD of origin/master and
>>> "src/emacs -Q", which seems to have the same problem, revealed by
>>> error messages like "apply: Setting current directory: Interrupted
>>> system call, /smb/server/share/dir/".
>>>
>>> I'm stumped. A (shell-command "ls -lAFNR
>>> /smb/server/share/big-dir/") works fine, as does a "cp -a" of that
>>> directory. But when the Emacs process itself calls stat() or
>>> faccessat(), things go sideways? Why? What am I missing? Are stat()
>>> and faccessat() even allowed to fail with EINTR? Is this a kernel
>>> bug, maybe somewhere in the CIFS client implementation? But an
>>> strace of "ls -lAFNR /smb/server/share/big-dir/" shows not a single
>>> EINTR! So why would only Emacs be affected?
>>>
>>> Please enlighten me... ;)
>>>
>>> Tobias
>>>
>>
>




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Syscalls stat() and faccessat() sometimes fail with errno EINTR when accessing SMB share through VPN
  2021-02-14 11:23     ` Tobias Bading
@ 2021-02-14 17:59       ` Matt Armstrong
  0 siblings, 0 replies; 5+ messages in thread
From: Matt Armstrong @ 2021-02-14 17:59 UTC (permalink / raw)
  To: Tobias Bading, emacs-devel

Tobias Bading <tbading@web.de> writes:

> I've asked the GNU libc hackers as well:
> https://sourceware.org/pipermail/libc-help/2021-February/005663.html

Tobias, I suspect that using TEMP_FAILURE_RETRY in Emacs is the right
pragmatic fix, even if the "spec" does not indicate it should be
necessary. I just wanted to encourage you to follow through with
this. If you lose the time or energy, I encourage you to file an Emacs
bug about it.



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-02-14 17:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-12 15:11 Syscalls stat() and faccessat() sometimes fail with errno EINTR when accessing SMB share through VPN Tobias Bading
2021-02-12 15:24 ` Tobias Bading
2021-02-12 18:46   ` Tobias Bading
2021-02-14 11:23     ` Tobias Bading
2021-02-14 17:59       ` Matt Armstrong

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).