* Dataloss copying file using copy-file on RHEL 8. @ 2020-02-12 22:37 David Koppelman 2020-02-13 1:32 ` Paul Eggert 0 siblings, 1 reply; 9+ messages in thread From: David Koppelman @ 2020-02-12 22:37 UTC (permalink / raw) To: emacs-devel I'm experiencing incompletely copied files when using (copy-file). I'm not sure if this is an Emacs problem, but because it's serious I thought I'd report it here in case it is. On a Red Hat Enterprise Linux 8 build of recent git pulls of Emacs I experienced file corruption when copying a file into an existing file. I discovered it using C in dired, but the problem is reliably reproduced calling copy-file with the KEEP-TIME argument t: (copy-file "porig.svg" "pcopy.svg" t t) (I recently upgraded from rhel 7 to rhel 8 so the problem may have nothing to do with a recent change to Emacs.) I get the problem when copying a 36368-byte file to an existing file of the same size, both files are on the same NFS-mounted filesystem. The problem does not occur on XFS. The contents of the destination file is correct for the first 32768 bytes, then the remainder of the file--which is the right size--is set to 0. Running under gdb reveals that the file is copied using copy_file_range (in src/fileio.c). I can work around the problem by forcing Emacs to avoid the loop using copy_file_range, in which case it uses fallback code and everything is fine. (Except I don't get the efficient kernel-space-to-kernel-space transfer that copy_file_range uses.) I do not experience the problem on the version of Emacs packaged with rhel 8, "GNU Emacs 26.1 (build 1, x86_64-redhat-linux-gnu, GTK+ Version 3.22.30) of 2018-09-10". When I have time I'll try to reproduce the problem with a quick C++ routine using copy_file_range. If successful, I'll file a bug with Red Hat. Even if this is a rhel 8 problem, Emacs ought to avoid copy_file_range when it does or might occur. David Koppelman ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Dataloss copying file using copy-file on RHEL 8. 2020-02-12 22:37 Dataloss copying file using copy-file on RHEL 8 David Koppelman @ 2020-02-13 1:32 ` Paul Eggert 2020-02-13 17:08 ` David Koppelman 0 siblings, 1 reply; 9+ messages in thread From: Paul Eggert @ 2020-02-13 1:32 UTC (permalink / raw) To: David Koppelman; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 1935 bytes --] On 2/12/20 2:37 PM, David Koppelman wrote: > Except I don't get the > efficient kernel-space-to-kernel-space transfer that copy_file_range > uses.) It's more than just kernel-space-to-kernel-space copying. When copying a file within an NFS server, you don't need to ship its contents over the network; the server can do the copy. Also, many modern filesystems can copy files by fiddling with pointers rather than data and thus can copy much faster than read+write would do, even on local filesystems. So avoiding copy_file_range entirely would mean a big performance loss on big files. > I do not experience the problem on the version of Emacs packaged with > rhel 8, "GNU Emacs 26.1 (build 1, x86_64-redhat-linux-gnu, GTK+ > Version 3.22.30) of 2018-09-10". Emacs 26.1 doesn't use copy_file_range, which explains why it doesn't encounter your problem. Emacs 27 is planned to use it, though, so we should see how to best fix the problem. As you say, it's a serious bug in your filesystem. It strikes me that it is likely to affect programs other than Emacs, so it should be high priority to fix regardless of what we do in Emacs. Some questions: What is the NFS fileserver (NetApp, etc.)? What's the blocksize on the remote file system? Does copy_file_range work correctly when the size is a multiple of 32*1024? If so, perhaps we could tweak Emacs to use copy_file_range for most of the file, and use read+write only for the trailing <32 KiB. > When I have time I'll try to reproduce the problem with a quick C++ > routine using copy_file_range. To save you some time, attached is a quick C routine that attempts to reproduce the problem. Does it reproduce the problem for you? If so, you can use it in your bug report to Red Hat. Also, can you strace the failing Emacs? Something like this: strace -o trace.log emacs -Q -batch -eval '(copy-file "a" "b" t t)' and then look at the relevant part of trace.log. [-- Attachment #2: cfrbug.c --] [-- Type: text/x-csrc, Size: 1103 bytes --] #define _GNU_SOURCE #include <fcntl.h> #include <string.h> #include <unistd.h> int main (void) { char buf[36368]; char abuf[sizeof buf]; memset (abuf, 'a', sizeof abuf); char bbuf[sizeof buf]; memset (bbuf, 'b', sizeof bbuf); int src = open ("src", O_WRONLY | O_CREAT | O_TRUNC, 0666); if (src < 0) return 2; if (write (src, abuf, sizeof buf) != sizeof buf) return 3; if (close (src) != 0) return 4; int dst = open ("dst", O_WRONLY | O_CREAT | O_TRUNC, 0666); if (dst < 0) return 5; if (write (dst, bbuf, sizeof buf) != sizeof buf) return 6; if (close (dst) != 0) return 7; src = open ("src", O_RDONLY); if (src < 0) return 8; dst = open ("dst", O_WRONLY); if (dst < 0) return 9; if (copy_file_range (src, 0, dst, 0, sizeof buf, 0) != sizeof buf) return 10; if (close (src) != 0) return 11; if (close (dst) != 0) return 12; dst = open ("dst", O_RDONLY); if (dst < 0) return 13; if (read (dst, buf, sizeof buf) != sizeof buf) return 14; if (memcmp (buf, abuf, sizeof buf) != 0) return 15; return 0; } ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Dataloss copying file using copy-file on RHEL 8. 2020-02-13 1:32 ` Paul Eggert @ 2020-02-13 17:08 ` David Koppelman 2020-02-13 18:57 ` Paul Eggert 0 siblings, 1 reply; 9+ messages in thread From: David Koppelman @ 2020-02-13 17:08 UTC (permalink / raw) To: Paul Eggert; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 477 bytes --] Thank you for the reproducer! I was able to reproduce the file corruption with a modified version of the C file in which the destination file times were set. Otherwise there is no corruption. I'm attaching the modified file. I'm going to file a bug with Red Hat using Paul's modified reproducer, if that's okay. The nfs mounted filesystem is on another Red Hat system. I'm going to file the Red Hat bug before gathering additional information. Thanks for your help! David [-- Attachment #2: Reproducer for copy file problem. --] [-- Type: text/plain, Size: 1395 bytes --] #define _GNU_SOURCE #include <fcntl.h> #include <string.h> #include <unistd.h> #include <stdio.h> #include <sys/stat.h> int mayn (void) { char buf[36368]; char abuf[sizeof buf]; memset (abuf, 'a', sizeof abuf); char bbuf[sizeof buf]; memset (bbuf, 'b', sizeof bbuf); int src = open ("src", O_WRONLY | O_CREAT | O_TRUNC, 0666); if (src < 0) return 2; if (write (src, abuf, sizeof buf) != sizeof buf) return 3; struct stat st; if (fstat (src, &st) != 0) return 21; struct timespec tam[2] = { st.st_atim, st.st_mtim }; if (close (src) != 0) return 4; int dst = open ("dst", O_WRONLY | O_CREAT | O_TRUNC, 0666); if (dst < 0) return 5; if (write (dst, bbuf, sizeof buf) != sizeof buf) return 6; if (close (dst) != 0) return 7; src = open ("src", O_RDONLY); if (src < 0) return 8; dst = open ("dst", O_WRONLY); if (dst < 0) return 9; if (copy_file_range (src, 0, dst, 0, sizeof buf, 0) != sizeof buf) return 10; if ( 1 && futimens(dst,tam) ) return 22; if (close (src) != 0) return 11; if (close (dst) != 0) return 12; dst = open ("dst", O_RDONLY); if (dst < 0) return 13; if (read (dst, buf, sizeof buf) != sizeof buf) return 14; if (memcmp (buf, abuf, sizeof buf) != 0) return 15; return 0; } int main() { const int rv = mayn(); printf("Outcome %d\n",rv); return rv; } [-- Attachment #3: Type: text/plain, Size: 2031 bytes --] Paul Eggert <eggert@cs.ucla.edu> writes: > On 2/12/20 2:37 PM, David Koppelman wrote: >> Except I don't get the >> efficient kernel-space-to-kernel-space transfer that copy_file_range >> uses.) > > It's more than just kernel-space-to-kernel-space copying. When copying > a file within an NFS server, you don't need to ship its contents over > the network; the server can do the copy. Also, many modern filesystems > can copy files by fiddling with pointers rather than data and thus can > copy much faster than read+write would do, even on local filesystems. > So avoiding copy_file_range entirely would mean a big performance loss > on big files. >> I do not experience the problem on the version of Emacs packaged with >> rhel 8, "GNU Emacs 26.1 (build 1, x86_64-redhat-linux-gnu, GTK+ >> Version 3.22.30) of 2018-09-10". > > Emacs 26.1 doesn't use copy_file_range, which explains why it doesn't > encounter your problem. Emacs 27 is planned to use it, though, so we > should see how to best fix the problem. > > As you say, it's a serious bug in your filesystem. It strikes me that > it is likely to affect programs other than Emacs, so it should be high > priority to fix regardless of what we do in Emacs. > > Some questions: What is the NFS fileserver (NetApp, etc.)? What's the > blocksize on the remote file system? Does copy_file_range work > correctly when the size is a multiple of 32*1024? If so, perhaps we > could tweak Emacs to use copy_file_range for most of the file, and use > read+write only for the trailing <32 KiB. > >> When I have time I'll try to reproduce the problem with a quick C++ >> routine using copy_file_range. > > To save you some time, attached is a quick C routine that attempts to > reproduce the problem. Does it reproduce the problem for you? If so, > you can use it in your bug report to Red Hat. > > Also, can you strace the failing Emacs? Something like this: > > strace -o trace.log emacs -Q -batch -eval '(copy-file "a" "b" t t)' > > and then look at the relevant part of trace.log. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Dataloss copying file using copy-file on RHEL 8. 2020-02-13 17:08 ` David Koppelman @ 2020-02-13 18:57 ` Paul Eggert 2020-02-14 15:22 ` David Koppelman 0 siblings, 1 reply; 9+ messages in thread From: Paul Eggert @ 2020-02-13 18:57 UTC (permalink / raw) To: David Koppelman; +Cc: emacs-devel On 2/13/20 9:08 AM, David Koppelman wrote: > I'm going to file a bug with Red Hat using > Paul's modified reproducer, if that's okay. Please do that, and please let me know the Red Hat bug number. Thanks. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Dataloss copying file using copy-file on RHEL 8. 2020-02-13 18:57 ` Paul Eggert @ 2020-02-14 15:22 ` David Koppelman 2020-02-14 15:58 ` Paul Eggert 0 siblings, 1 reply; 9+ messages in thread From: David Koppelman @ 2020-02-14 15:22 UTC (permalink / raw) To: Paul Eggert; +Cc: emacs-devel I've reported the problem in Red Hat bug 1803162. https://bugzilla.redhat.com/show_bug.cgi?id=1803162 Paul Eggert <eggert@cs.ucla.edu> writes: > On 2/13/20 9:08 AM, David Koppelman wrote: >> I'm going to file a bug with Red Hat using >> Paul's modified reproducer, if that's okay. > > Please do that, and please let me know the Red Hat bug number. Thanks. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Dataloss copying file using copy-file on RHEL 8. 2020-02-14 15:22 ` David Koppelman @ 2020-02-14 15:58 ` Paul Eggert 2020-02-14 16:03 ` Dmitry Gutov ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Paul Eggert @ 2020-02-14 15:58 UTC (permalink / raw) To: David Koppelman; +Cc: emacs-devel On 2/14/20 7:22 AM, David Koppelman wrote: > I've reported the problem in Red Hat bug 1803162. > > https://bugzilla.redhat.com/show_bug.cgi?id=1803162 Thanks, but I am not authorized to access that Bug#. I suppose I could also file a public (Fedora) bug report but I wouldn't like to bother them if they fix the bug quickly anyway. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Dataloss copying file using copy-file on RHEL 8. 2020-02-14 15:58 ` Paul Eggert @ 2020-02-14 16:03 ` Dmitry Gutov 2020-02-16 16:25 ` David Koppelman 2020-02-18 16:10 ` David Koppelman 2 siblings, 0 replies; 9+ messages in thread From: Dmitry Gutov @ 2020-02-14 16:03 UTC (permalink / raw) To: Paul Eggert, David Koppelman; +Cc: emacs-devel On 14.02.2020 17:58, Paul Eggert wrote: > Thanks, but I am not authorized to access that Bug#. I suppose I could > also file a public (Fedora) bug report but I wouldn't like to bother > them if they fix the bug quickly anyway. Interesting. I opened this bug URL like 30 minutes ago, and the contents were publicly accessible. And now it's not. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Dataloss copying file using copy-file on RHEL 8. 2020-02-14 15:58 ` Paul Eggert 2020-02-14 16:03 ` Dmitry Gutov @ 2020-02-16 16:25 ` David Koppelman 2020-02-18 16:10 ` David Koppelman 2 siblings, 0 replies; 9+ messages in thread From: David Koppelman @ 2020-02-16 16:25 UTC (permalink / raw) To: Paul Eggert; +Cc: emacs-devel Red Hat has verified the copy_file_range flaw, so it's not a quirk unique to my system. They reproduced the flaw, using Paul Eggert's reproducer with futimens added, on kernel 4.18.0-80.11.2.el8_0.x86_64 and I'm suffering it on 4.18.0-147.3.1.el8_1.x86_64. The copy ran correctly on 5.6.0-0.rc0.git5.1.fc32.x86_64. I've asked twice that the bug, https://bugzilla.redhat.com/show_bug.cgi?id=1803162, be made publicly accessible. Paul Eggert <eggert@cs.ucla.edu> writes: > On 2/14/20 7:22 AM, David Koppelman wrote: >> I've reported the problem in Red Hat bug 1803162. >> https://bugzilla.redhat.com/show_bug.cgi?id=1803162 > > Thanks, but I am not authorized to access that Bug#. I suppose I could > also file a public (Fedora) bug report but I wouldn't like to bother > them if they fix the bug quickly anyway. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Dataloss copying file using copy-file on RHEL 8. 2020-02-14 15:58 ` Paul Eggert 2020-02-14 16:03 ` Dmitry Gutov 2020-02-16 16:25 ` David Koppelman @ 2020-02-18 16:10 ` David Koppelman 2 siblings, 0 replies; 9+ messages in thread From: David Koppelman @ 2020-02-18 16:10 UTC (permalink / raw) To: Paul Eggert; +Cc: emacs-devel The bug is now open for public access: https://bugzilla.redhat.com/show_bug.cgi?id=1803162 Paul Eggert <eggert@cs.ucla.edu> writes: > On 2/14/20 7:22 AM, David Koppelman wrote: >> I've reported the problem in Red Hat bug 1803162. >> https://bugzilla.redhat.com/show_bug.cgi?id=1803162 > > Thanks, but I am not authorized to access that Bug#. I suppose I could > also file a public (Fedora) bug report but I wouldn't like to bother > them if they fix the bug quickly anyway. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-02-18 16:10 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-02-12 22:37 Dataloss copying file using copy-file on RHEL 8 David Koppelman 2020-02-13 1:32 ` Paul Eggert 2020-02-13 17:08 ` David Koppelman 2020-02-13 18:57 ` Paul Eggert 2020-02-14 15:22 ` David Koppelman 2020-02-14 15:58 ` Paul Eggert 2020-02-14 16:03 ` Dmitry Gutov 2020-02-16 16:25 ` David Koppelman 2020-02-18 16:10 ` David Koppelman
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.