Paul Eggert <eggert@cs.ucla.edu> writes:

> On 2/12/20 2:37 PM, David Koppelman wrote:
>> Except I don't get the
>> efficient kernel-space-to-kernel-space transfer that copy_file_range
>> uses.)
>
> It's more than just kernel-space-to-kernel-space copying. When copying
> a file within an NFS server, you don't need to ship its contents over
> the network; the server can do the copy. Also, many modern filesystems
> can copy files by fiddling with pointers rather than data and thus can
> copy much faster than read+write would do, even on local filesystems.
> So avoiding copy_file_range entirely would mean a big performance loss
> on big files.
>> I do not experience the problem on the version of Emacs packaged with
>> rhel 8, "GNU Emacs 26.1 (build 1, x86_64-redhat-linux-gnu, GTK+
>> Version 3.22.30) of 2018-09-10".
>
> Emacs 26.1 doesn't use copy_file_range, which explains why it doesn't
> encounter your problem. Emacs 27 is planned to use it, though, so we
> should see how to best fix the problem.
>
> As you say, it's a serious bug in your filesystem. It strikes me that
> it is likely to affect programs other than Emacs, so it should be high
> priority to fix regardless of what we do in Emacs.
>
> Some questions: What is the NFS fileserver (NetApp, etc.)? What's the
> blocksize on the remote file system? Does copy_file_range work
> correctly when the size is a multiple of 32*1024? If so, perhaps we
> could tweak Emacs to use copy_file_range for most of the file, and use
> read+write only for the trailing <32 KiB.
>
>> When I have time I'll try to reproduce the problem with a quick C++
>> routine using copy_file_range.
>
> To save you some time, attached is a quick C routine that attempts to
> reproduce the problem. Does it reproduce the problem for you? If so,
> you can use it in your bug report to Red Hat.
>
> Also, can you strace the failing Emacs? Something like this:
>
> strace -o trace.log emacs -Q -batch -eval '(copy-file "a" "b" t t)'
>
> and then look at the relevant part of trace.log.