unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Dataloss copying file using copy-file on RHEL 8.
@ 2020-02-12 22:37 David Koppelman
  2020-02-13  1:32 ` Paul Eggert
  0 siblings, 1 reply; 9+ messages in thread
From: David Koppelman @ 2020-02-12 22:37 UTC (permalink / raw)
  To: emacs-devel

I'm experiencing incompletely copied files when using (copy-file). I'm
not sure if this is an Emacs problem, but because it's serious I
thought I'd report it here in case it is.

On a Red Hat Enterprise Linux 8 build of recent git pulls of Emacs I
experienced file corruption when copying a file into an existing file.
I discovered it using C in dired, but the problem is reliably
reproduced calling copy-file with the KEEP-TIME argument t: (copy-file
"porig.svg" "pcopy.svg" t t) (I recently upgraded from rhel 7 to rhel
8 so the problem may have nothing to do with a recent change to
Emacs.)

I get the problem when copying a 36368-byte file to an existing file
of the same size, both files are on the same NFS-mounted filesystem.
The problem does not occur on XFS. The contents of the destination
file is correct for the first 32768 bytes, then the remainder of the
file--which is the right size--is set to 0.

Running under gdb reveals that the file is copied using
copy_file_range (in src/fileio.c). I can work around the problem by
forcing Emacs to avoid the loop using copy_file_range, in which case
it uses fallback code and everything is fine. (Except I don't get the
efficient kernel-space-to-kernel-space transfer that copy_file_range
uses.)

I do not experience the problem on the version of Emacs packaged with
rhel 8, "GNU Emacs 26.1 (build 1, x86_64-redhat-linux-gnu, GTK+
Version 3.22.30) of 2018-09-10".

When I have time I'll try to reproduce the problem with a quick C++
routine using copy_file_range. If successful, I'll file a bug with Red
Hat. Even if this is a rhel 8 problem, Emacs ought to avoid
copy_file_range when it does or might occur.

David Koppelman





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Dataloss copying file using copy-file on RHEL 8.
  2020-02-12 22:37 Dataloss copying file using copy-file on RHEL 8 David Koppelman
@ 2020-02-13  1:32 ` Paul Eggert
  2020-02-13 17:08   ` David Koppelman
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Eggert @ 2020-02-13  1:32 UTC (permalink / raw)
  To: David Koppelman; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1935 bytes --]

On 2/12/20 2:37 PM, David Koppelman wrote:
> Except I don't get the
> efficient kernel-space-to-kernel-space transfer that copy_file_range
> uses.)

It's more than just kernel-space-to-kernel-space copying. When copying a 
file within an NFS server, you don't need to ship its contents over the 
network; the server can do the copy. Also, many modern filesystems can 
copy files by fiddling with pointers rather than data and thus can copy 
much faster than read+write would do, even on local filesystems. So 
avoiding copy_file_range entirely would mean a big performance loss on 
big files.
> I do not experience the problem on the version of Emacs packaged with
> rhel 8, "GNU Emacs 26.1 (build 1, x86_64-redhat-linux-gnu, GTK+
> Version 3.22.30) of 2018-09-10".

Emacs 26.1 doesn't use copy_file_range, which explains why it doesn't 
encounter your problem. Emacs 27 is planned to use it, though, so we 
should see how to best fix the problem.

As you say, it's a serious bug in your filesystem. It strikes me that it 
is likely to affect programs other than Emacs, so it should be high 
priority to fix regardless of what we do in Emacs.

Some questions: What is the NFS fileserver (NetApp, etc.)? What's the 
blocksize on the remote file system? Does copy_file_range work correctly 
when the size is a multiple of 32*1024? If so, perhaps we could tweak 
Emacs to use copy_file_range for most of the file, and use read+write 
only for the trailing <32 KiB.

> When I have time I'll try to reproduce the problem with a quick C++
> routine using copy_file_range.

To save you some time, attached is a quick C routine that attempts to 
reproduce the problem. Does it reproduce the problem for you? If so, you 
can use it in your bug report to Red Hat.

Also, can you strace the failing Emacs? Something like this:

strace -o trace.log emacs -Q -batch -eval '(copy-file "a" "b" t t)'

and then look at the relevant part of trace.log.

[-- Attachment #2: cfrbug.c --]
[-- Type: text/x-csrc, Size: 1103 bytes --]

#define _GNU_SOURCE
#include <fcntl.h>
#include <string.h>
#include <unistd.h>

int
main (void)
{
  char buf[36368];
  char abuf[sizeof buf]; memset (abuf, 'a', sizeof abuf);
  char bbuf[sizeof buf]; memset (bbuf, 'b', sizeof bbuf);
  int src = open ("src", O_WRONLY | O_CREAT | O_TRUNC, 0666);
  if (src < 0)
    return 2;
  if (write (src, abuf, sizeof buf) != sizeof buf)
    return 3;
  if (close (src) != 0)
    return 4;
  int dst = open ("dst", O_WRONLY | O_CREAT | O_TRUNC, 0666);
  if (dst < 0)
    return 5;
  if (write (dst, bbuf, sizeof buf) != sizeof buf)
    return 6;
  if (close (dst) != 0)
    return 7;
  src = open ("src", O_RDONLY);
  if (src < 0)
    return 8;
  dst = open ("dst", O_WRONLY);
  if (dst < 0)
    return 9;
  if (copy_file_range (src, 0, dst, 0, sizeof buf, 0) != sizeof buf)
    return 10;
  if (close (src) != 0)
    return 11;
  if (close (dst) != 0)
    return 12;
  dst = open ("dst", O_RDONLY);
  if (dst < 0)
    return 13;
  if (read (dst, buf, sizeof buf) != sizeof buf)
    return 14;
  if (memcmp (buf, abuf, sizeof buf) != 0)
    return 15;
  return 0;
}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Dataloss copying file using copy-file on RHEL 8.
  2020-02-13  1:32 ` Paul Eggert
@ 2020-02-13 17:08   ` David Koppelman
  2020-02-13 18:57     ` Paul Eggert
  0 siblings, 1 reply; 9+ messages in thread
From: David Koppelman @ 2020-02-13 17:08 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 477 bytes --]

Thank you for the reproducer! I was able to reproduce the file
corruption with a modified version of the C file in which the
destination file times were set. Otherwise there is no corruption. I'm
attaching the modified file. I'm going to file a bug with Red Hat using
Paul's modified reproducer, if that's okay.

The nfs mounted filesystem is on another Red Hat system. I'm going to
file the Red Hat bug before gathering additional information.

Thanks for your help!

David



[-- Attachment #2: Reproducer for copy file problem. --]
[-- Type: text/plain, Size: 1395 bytes --]

#define _GNU_SOURCE
#include <fcntl.h>
#include <string.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/stat.h>

int
mayn (void)
{
  char buf[36368];
  char abuf[sizeof buf]; memset (abuf, 'a', sizeof abuf);
  char bbuf[sizeof buf]; memset (bbuf, 'b', sizeof bbuf);
  int src = open ("src", O_WRONLY | O_CREAT | O_TRUNC, 0666);
  if (src < 0)
    return 2;
  if (write (src, abuf, sizeof buf) != sizeof buf)
    return 3;

  struct stat st;
  if (fstat (src, &st) != 0)
    return 21;
  struct timespec tam[2] = { st.st_atim, st.st_mtim };

  if (close (src) != 0)
    return 4;
  int dst = open ("dst", O_WRONLY | O_CREAT | O_TRUNC, 0666);
  if (dst < 0)
    return 5;
  if (write (dst, bbuf, sizeof buf) != sizeof buf)
    return 6;
  if (close (dst) != 0)
    return 7;
  src = open ("src", O_RDONLY);
  if (src < 0)
    return 8;
  dst = open ("dst", O_WRONLY);
  if (dst < 0)
    return 9;
  if (copy_file_range (src, 0, dst, 0, sizeof buf, 0) != sizeof buf)
    return 10;

  if ( 1 && futimens(dst,tam) )
    return 22;

  if (close (src) != 0)
    return 11;
  if (close (dst) != 0)
    return 12;
  dst = open ("dst", O_RDONLY);
  if (dst < 0)
    return 13;
  if (read (dst, buf, sizeof buf) != sizeof buf)
    return 14;
  if (memcmp (buf, abuf, sizeof buf) != 0)
    return 15;
  return 0;
}

int
main()
{
  const int rv = mayn();
  printf("Outcome %d\n",rv);
  return rv;
}

[-- Attachment #3: Type: text/plain, Size: 2031 bytes --]




Paul Eggert <eggert@cs.ucla.edu> writes:

> On 2/12/20 2:37 PM, David Koppelman wrote:
>> Except I don't get the
>> efficient kernel-space-to-kernel-space transfer that copy_file_range
>> uses.)
>
> It's more than just kernel-space-to-kernel-space copying. When copying
> a file within an NFS server, you don't need to ship its contents over
> the network; the server can do the copy. Also, many modern filesystems
> can copy files by fiddling with pointers rather than data and thus can
> copy much faster than read+write would do, even on local filesystems.
> So avoiding copy_file_range entirely would mean a big performance loss
> on big files.
>> I do not experience the problem on the version of Emacs packaged with
>> rhel 8, "GNU Emacs 26.1 (build 1, x86_64-redhat-linux-gnu, GTK+
>> Version 3.22.30) of 2018-09-10".
>
> Emacs 26.1 doesn't use copy_file_range, which explains why it doesn't
> encounter your problem. Emacs 27 is planned to use it, though, so we
> should see how to best fix the problem.
>
> As you say, it's a serious bug in your filesystem. It strikes me that
> it is likely to affect programs other than Emacs, so it should be high
> priority to fix regardless of what we do in Emacs.
>
> Some questions: What is the NFS fileserver (NetApp, etc.)? What's the
> blocksize on the remote file system? Does copy_file_range work
> correctly when the size is a multiple of 32*1024? If so, perhaps we
> could tweak Emacs to use copy_file_range for most of the file, and use
> read+write only for the trailing <32 KiB.
>
>> When I have time I'll try to reproduce the problem with a quick C++
>> routine using copy_file_range.
>
> To save you some time, attached is a quick C routine that attempts to
> reproduce the problem. Does it reproduce the problem for you? If so,
> you can use it in your bug report to Red Hat.
>
> Also, can you strace the failing Emacs? Something like this:
>
> strace -o trace.log emacs -Q -batch -eval '(copy-file "a" "b" t t)'
>
> and then look at the relevant part of trace.log.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Dataloss copying file using copy-file on RHEL 8.
  2020-02-13 17:08   ` David Koppelman
@ 2020-02-13 18:57     ` Paul Eggert
  2020-02-14 15:22       ` David Koppelman
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Eggert @ 2020-02-13 18:57 UTC (permalink / raw)
  To: David Koppelman; +Cc: emacs-devel

On 2/13/20 9:08 AM, David Koppelman wrote:
> I'm going to file a bug with Red Hat using
> Paul's modified reproducer, if that's okay.

Please do that, and please let me know the Red Hat bug number. Thanks.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Dataloss copying file using copy-file on RHEL 8.
  2020-02-13 18:57     ` Paul Eggert
@ 2020-02-14 15:22       ` David Koppelman
  2020-02-14 15:58         ` Paul Eggert
  0 siblings, 1 reply; 9+ messages in thread
From: David Koppelman @ 2020-02-14 15:22 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

I've reported the problem in Red Hat bug 1803162.

https://bugzilla.redhat.com/show_bug.cgi?id=1803162

Paul Eggert <eggert@cs.ucla.edu> writes:

> On 2/13/20 9:08 AM, David Koppelman wrote:
>> I'm going to file a bug with Red Hat using
>> Paul's modified reproducer, if that's okay.
>
> Please do that, and please let me know the Red Hat bug number. Thanks.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Dataloss copying file using copy-file on RHEL 8.
  2020-02-14 15:22       ` David Koppelman
@ 2020-02-14 15:58         ` Paul Eggert
  2020-02-14 16:03           ` Dmitry Gutov
                             ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Paul Eggert @ 2020-02-14 15:58 UTC (permalink / raw)
  To: David Koppelman; +Cc: emacs-devel

On 2/14/20 7:22 AM, David Koppelman wrote:
> I've reported the problem in Red Hat bug 1803162.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1803162

Thanks, but I am not authorized to access that Bug#. I suppose I could also file 
a public (Fedora) bug report but I wouldn't like to bother them if they fix the 
bug quickly anyway.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Dataloss copying file using copy-file on RHEL 8.
  2020-02-14 15:58         ` Paul Eggert
@ 2020-02-14 16:03           ` Dmitry Gutov
  2020-02-16 16:25           ` David Koppelman
  2020-02-18 16:10           ` David Koppelman
  2 siblings, 0 replies; 9+ messages in thread
From: Dmitry Gutov @ 2020-02-14 16:03 UTC (permalink / raw)
  To: Paul Eggert, David Koppelman; +Cc: emacs-devel

On 14.02.2020 17:58, Paul Eggert wrote:
> Thanks, but I am not authorized to access that Bug#. I suppose I could 
> also file a public (Fedora) bug report but I wouldn't like to bother 
> them if they fix the bug quickly anyway.

Interesting. I opened this bug URL like 30 minutes ago, and the contents 
were publicly accessible. And now it's not.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Dataloss copying file using copy-file on RHEL 8.
  2020-02-14 15:58         ` Paul Eggert
  2020-02-14 16:03           ` Dmitry Gutov
@ 2020-02-16 16:25           ` David Koppelman
  2020-02-18 16:10           ` David Koppelman
  2 siblings, 0 replies; 9+ messages in thread
From: David Koppelman @ 2020-02-16 16:25 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

Red Hat has verified the copy_file_range flaw, so it's not a quirk
unique to my system. They reproduced the flaw, using Paul Eggert's
reproducer with futimens added, on kernel 4.18.0-80.11.2.el8_0.x86_64
and I'm suffering it on 4.18.0-147.3.1.el8_1.x86_64. The copy ran
correctly on 5.6.0-0.rc0.git5.1.fc32.x86_64.

I've asked twice that the bug,
https://bugzilla.redhat.com/show_bug.cgi?id=1803162, be made publicly
accessible.

Paul Eggert <eggert@cs.ucla.edu> writes:

> On 2/14/20 7:22 AM, David Koppelman wrote:
>> I've reported the problem in Red Hat bug 1803162.
>> https://bugzilla.redhat.com/show_bug.cgi?id=1803162
>
> Thanks, but I am not authorized to access that Bug#. I suppose I could
> also file a public (Fedora) bug report but I wouldn't like to bother
> them if they fix the bug quickly anyway.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Dataloss copying file using copy-file on RHEL 8.
  2020-02-14 15:58         ` Paul Eggert
  2020-02-14 16:03           ` Dmitry Gutov
  2020-02-16 16:25           ` David Koppelman
@ 2020-02-18 16:10           ` David Koppelman
  2 siblings, 0 replies; 9+ messages in thread
From: David Koppelman @ 2020-02-18 16:10 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

The bug is now open for public access:

https://bugzilla.redhat.com/show_bug.cgi?id=1803162

Paul Eggert <eggert@cs.ucla.edu> writes:

> On 2/14/20 7:22 AM, David Koppelman wrote:
>> I've reported the problem in Red Hat bug 1803162.
>> https://bugzilla.redhat.com/show_bug.cgi?id=1803162
>
> Thanks, but I am not authorized to access that Bug#. I suppose I could
> also file a public (Fedora) bug report but I wouldn't like to bother
> them if they fix the bug quickly anyway.



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-02-18 16:10 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-12 22:37 Dataloss copying file using copy-file on RHEL 8 David Koppelman
2020-02-13  1:32 ` Paul Eggert
2020-02-13 17:08   ` David Koppelman
2020-02-13 18:57     ` Paul Eggert
2020-02-14 15:22       ` David Koppelman
2020-02-14 15:58         ` Paul Eggert
2020-02-14 16:03           ` Dmitry Gutov
2020-02-16 16:25           ` David Koppelman
2020-02-18 16:10           ` David Koppelman

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).