* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le @ 2018-10-27 10:35 Thomas Fitzsimmons 2018-10-27 13:16 ` Thomas Fitzsimmons 0 siblings, 1 reply; 18+ messages in thread From: Thomas Fitzsimmons @ 2018-10-27 10:35 UTC (permalink / raw) To: 33174; +Cc: Paul Eggert Hi Paul, On ppc64le Debian GNU/Linux 9, I'm seeing a bootstrap failure on the master and emacs-26 branches that I didn't see the last time I built Emacs: [...] Dumping under the name emacs ************************************************** Warning: Your system has a gap between BSS and the heap (424960392 bytes). This usually means that exec-shield or something similar is in effect. The dump may fail because of this. See the section about exec-shield in etc/PROBLEMS for more information. ************************************************** 22496304 of 33554432 static heap bytes used Makefile:749: recipe for target 'bootstrap-emacs' failed make[1]: *** [bootstrap-emacs] Segmentation fault make[1]: Leaving directory '/home/fitzsim/sources/emacs/src' Makefile:421: recipe for target 'src' failed make: *** [src] Error 2 I bisected the failure to: commit 65889a6d127fcbbbdc1e74d26036e91bd24d1405 Author: Paul Eggert <eggert@cs.ucla.edu> Date: Sun Jul 8 09:04:02 2018 -0700 Fix bootstrap infloop in GNU/Linux alpha * src/emacs.c (main): Do not re-exec if EMACS_HEAP_EXEC is already set (Bug#32083). If I revert that commit's patch, the build succeeds. Thomas ^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le 2018-10-27 10:35 bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le Thomas Fitzsimmons @ 2018-10-27 13:16 ` Thomas Fitzsimmons 2018-10-27 20:26 ` Paul Eggert 0 siblings, 1 reply; 18+ messages in thread From: Thomas Fitzsimmons @ 2018-10-27 13:16 UTC (permalink / raw) To: 33174; +Cc: Paul Eggert Thomas Fitzsimmons <fitzsim@fitzsim.org> writes: > Hi Paul, > > On ppc64le Debian GNU/Linux 9, I'm seeing a bootstrap failure on the > master and emacs-26 branches that I didn't see the last time I built > Emacs: > > [...] > Dumping under the name emacs > ************************************************** > Warning: Your system has a gap between BSS and the > heap (424960392 bytes). This usually means that exec-shield > or something similar is in effect. The dump may > fail because of this. See the section about > exec-shield in etc/PROBLEMS for more information. > ************************************************** > 22496304 of 33554432 static heap bytes used > Makefile:749: recipe for target 'bootstrap-emacs' failed > make[1]: *** [bootstrap-emacs] Segmentation fault > make[1]: Leaving directory '/home/fitzsim/sources/emacs/src' > Makefile:421: recipe for target 'src' failed > make: *** [src] Error 2 > > I bisected the failure to: > > commit 65889a6d127fcbbbdc1e74d26036e91bd24d1405 > Author: Paul Eggert <eggert@cs.ucla.edu> > Date: Sun Jul 8 09:04:02 2018 -0700 > > Fix bootstrap infloop in GNU/Linux alpha > > * src/emacs.c (main): Do not re-exec if EMACS_HEAP_EXEC > is already set (Bug#32083). > > If I revert that commit's patch, the build succeeds. It looks like this segmentation fault is influenced by the setting in /proc/sys/kernel/randomize_va_space. Running by hand in src (with the patch in question applied): ./temacs --batch --load loadup bootstrap - if randomize_va_space contains 2, bootstrap fails: segfaults: Dumping under the name emacs ************************************************** Warning: Your system has a gap between BSS and the heap (752378248 bytes). This usually means that exec-shield or something similar is in effect. The dump may fail because of this. See the section about exec-shield in etc/PROBLEMS for more information. ************************************************** 22496304 of 33554432 static heap bytes used Segmentation fault - if randomize_va_space contains 1 or 0, bootstrap succeeds: Loading /home/fitzsim/sources/emacs/lisp/tooltip.el (source)... Finding pointers to doc strings... Finding pointers to doc strings...done Dumping under the name emacs 22496304 of 33554432 static heap bytes used 99079 pure bytes used Thomas ^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le 2018-10-27 13:16 ` Thomas Fitzsimmons @ 2018-10-27 20:26 ` Paul Eggert 2018-10-28 8:05 ` Paul Eggert 2018-10-29 1:47 ` Thomas Fitzsimmons 0 siblings, 2 replies; 18+ messages in thread From: Paul Eggert @ 2018-10-27 20:26 UTC (permalink / raw) To: Thomas Fitzsimmons, 33174 Wonderful. Yet another reason we need to get the pdumper branch working. Anyway: 1. Can you use strace and/or GDB to investigate how ./temacs is disabling address randomization? In the emacs-26 branch, if you run this command in src: strace -f -o /tmp/tr ./temacs --batch --load loadup bootstrap the output file /tmp/tr should contain something like this: 18406 personality(0xffffffff) = 0 (PER_LINUX) 18406 personality(PER_LINUX|ADDR_NO_RANDOMIZE) = 0 (PER_LINUX) 18406 personality(0xffffffff) = 0x40000 (PER_LINUX|ADDR_NO_RANDOMIZE) 18406 execve("./temacs", ["./temacs", "--batch", "--load", "loadup", "bootstrap"], 0xc521b0 /* 80 vars */) = 0 and later on there should be one more call to 'personality', like this: 18406 personality(0xffffffff) = 0x40000 (PER_LINUX|ADDR_NO_RANDOMIZE) If it doesn't look like this, please investigate why, by using GDB. 2. What filesystem are you building in, and does it support xattrs? What is the output of the following in your build directory? touch x setfattr -n user.variable -v value x getfattr -n user.variable x 3. What is the output of "getfattr -d src/temacs"? It should be something like this: # file: src/temacs user.pax.flags="er" If not, we need to figure out why not. 4. The output of 'make V=1' should contain something like the following. If not, please investigate why not. setfattr -n user.pax.flags -v er temacs ^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le 2018-10-27 20:26 ` Paul Eggert @ 2018-10-28 8:05 ` Paul Eggert 2018-10-28 12:56 ` Thomas Fitzsimmons 2018-10-29 1:47 ` Thomas Fitzsimmons 1 sibling, 1 reply; 18+ messages in thread From: Paul Eggert @ 2018-10-28 8:05 UTC (permalink / raw) To: Thomas Fitzsimmons; +Cc: 33174 I built and ran Emacs master on two CentOS 7 ppc64le platforms with no trouble. So the problem is not GNU/Linux ppc64le per se; it's something more specific to your platform. Here are some characteristics of my builds: Emacs master branch commit 3dd16a89bf410d77e9ddc41cbfbbd4b343928d6d ./configure --with-x-toolkit=no --with-xpm=no --with-jpeg=no --with-png=no --with-gif=no --with-tiff=no Both machines were CentOS 7. On both of them, /proc/sys/kernel/randomize_va_space is 2, just as in your Debian machine. uname -rvmpi reported the following (respectively): 3.10.0-693.11.6.el7.ppc64le #1 SMP Thu Jan 4 18:15:24 GMT 2018 ppc64le ppc64le ppc64le 4.14.0-49.13.1.el7a.ppc64le #1 SMP Wed Sep 26 21:58:41 GMT 2018 ppc64le ppc64le ppc64le gcc --version reported the following (respectively): gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16) gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28) On the machine running 4.14.0, there was no setfattr or getfattr command; however, ./temacs personality (0xffffffff) returned 0x40000 (i.e., ADDR_NO_RANDOMIZE) and so ./temacs did not reexec itself. I don't know why it returned 0x40000; even little programs that only call personality (0xffffffff) return 0x40000 on this platform. ^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le 2018-10-28 8:05 ` Paul Eggert @ 2018-10-28 12:56 ` Thomas Fitzsimmons 2018-10-28 19:22 ` Paul Eggert 0 siblings, 1 reply; 18+ messages in thread From: Thomas Fitzsimmons @ 2018-10-28 12:56 UTC (permalink / raw) To: Paul Eggert; +Cc: 33174 [-- Attachment #1: Type: text/plain, Size: 608 bytes --] On 2018-10-28 4:05 AM, "Paul Eggert" <eggert@cs.ucla.edu> wrote: > > I built and ran Emacs master on two CentOS 7 ppc64le platforms with no trouble. So the problem is not GNU/Linux ppc64le per se; it's something more specific to your platform. OK, thanks for checking. I'm running the Linux-libre 4.18 kernel built from source with powernv_defconfig -- I should have mentioned that in the initial report, sorry. I'll try the debugging steps you've suggested when I get back to my system, and narrow down what the relevant options are. Everything else on my system is stock Debian stretch packages. Thomas [-- Attachment #2: Type: text/html, Size: 755 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le 2018-10-28 12:56 ` Thomas Fitzsimmons @ 2018-10-28 19:22 ` Paul Eggert 2018-10-30 1:28 ` Thomas Fitzsimmons 0 siblings, 1 reply; 18+ messages in thread From: Paul Eggert @ 2018-10-28 19:22 UTC (permalink / raw) To: Thomas Fitzsimmons; +Cc: 33174 Thomas Fitzsimmons wrote: > I'm running the Linux-libre 4.18 kernel built from > source with powernv_defconfig In that case, please also verify that your kernel configuration has enabled XATTR_PAX. See, for example, <https://wiki.gentoo.org/wiki/Hardened/PaX_Quickstart>. If you have a superhardened kernel that refuses to disable ASLR under any circumstances, then you'll need to build Emacs with './configure CANNOT_DUMP=yes', something I can't recommend for production until we get the pdumper branch merged in. ^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le 2018-10-28 19:22 ` Paul Eggert @ 2018-10-30 1:28 ` Thomas Fitzsimmons 2018-10-30 2:16 ` Paul Eggert 0 siblings, 1 reply; 18+ messages in thread From: Thomas Fitzsimmons @ 2018-10-30 1:28 UTC (permalink / raw) To: Paul Eggert; +Cc: 33174 Paul Eggert <eggert@cs.ucla.edu> writes: > Thomas Fitzsimmons wrote: >> I'm running the Linux-libre 4.18 kernel built from >> source with powernv_defconfig > > In that case, please also verify that your kernel configuration has > enabled XATTR_PAX. Hmm, the XATTR_PAX kernel configuration option does not exist in the kernel sources I'm using; I haven't applied any hardening or Grsecurity patches, it's just the upstream Linux-libre 4.18 kernel, built with: make powernv_defconfig make > See, for example, > <https://wiki.gentoo.org/wiki/Hardened/PaX_Quickstart>. If you have a > superhardened kernel that refuses to disable ASLR under any > circumstances, then you'll need to build Emacs with './configure > CANNOT_DUMP=yes', something I can't recommend for production until we > get the pdumper branch merged in. I guess I could experiment with adding the PaX patches to my kernel so that XATTR_PAX would take effect, but I'm surprised Emacs can't complete the bootstrap step when the kernel is in a default configuration. Thomas ^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le 2018-10-30 1:28 ` Thomas Fitzsimmons @ 2018-10-30 2:16 ` Paul Eggert 0 siblings, 0 replies; 18+ messages in thread From: Paul Eggert @ 2018-10-30 2:16 UTC (permalink / raw) To: Thomas Fitzsimmons; +Cc: 33174 On 10/29/18 6:28 PM, Thomas Fitzsimmons wrote: > I guess I could experiment with adding the PaX patches to my kernel so > that XATTR_PAX would take effect, but I'm surprised Emacs can't complete > the bootstrap step when the kernel is in a default configuration. You're right, the XATTR_PAX stuff is likely a red herring. Emacs should be able to bootstrap without it by using the reexec trick, and bootstrapping does work on Fedora 28 (kernel 4.18.16-200.fc28.x86_64) without XATTR_PAX being used. The main issue now, as I see it, is why Emacs is crashing on re-exec. That is, on my system after I run 'make SETFATTR=: bootstrap-emacs' to simulate a system without XATTR_PAX, the command strace -o /tmp/tr ./temacs --batch --load loadup bootstrap generates the syscalls: ... personality(0xffffffff) = 0 (PER_LINUX) personality(PER_LINUX|ADDR_NO_RANDOMIZE) = 0 (PER_LINUX) personality(0xffffffff) = 0x40000 (PER_LINUX|ADDR_NO_RANDOMIZE) execve("./temacs", ["./temacs", "--batch", "--load", "loadup", "bootstrap"], 0x852ee10 /* 73 vars */) = 0 ... whereas on your system Emacs is crashing instead of issuing the execve. I suspect that something like the following may help you debug: $ cd src $ make clean $ make CFLAGS='-g3 -O0' SETFATTR=: temacs $ gdb temacs (gdb) set disable-randomization off (gdb) b personality (gdb) r --batch --load loadup bootstrap and then poke around to see what 'personality' is returning and whether 'disable_address_randomization' succeeds and how 'execve' is being called. I guess on your host, personality (0xffffffff) returns 0, then personality (0x40000) returns 0, then personality (0xffffffff) returns 0x40000, then disable_address_randomization () returns true, but it would be helpful to check this. The next thing should be that getenv ("EMACS_HEAP_EXEC") returns NULL; is that what actually happens? And then xputenv ("EMACS_HEAP_EXEC=true") should work without crashing; is that right? And is execvp being called and with what arguments? That sort of thing. ^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le 2018-10-27 20:26 ` Paul Eggert 2018-10-28 8:05 ` Paul Eggert @ 2018-10-29 1:47 ` Thomas Fitzsimmons 2018-10-29 7:22 ` Paul Eggert 1 sibling, 1 reply; 18+ messages in thread From: Thomas Fitzsimmons @ 2018-10-29 1:47 UTC (permalink / raw) To: Paul Eggert; +Cc: 33174 Paul Eggert <eggert@cs.ucla.edu> writes: > Wonderful. Yet another reason we need to get the pdumper branch working. Anyway: > > 1. Can you use strace and/or GDB to investigate how ./temacs is > disabling address randomization? In the emacs-26 branch, if you run > this command in src: > > strace -f -o /tmp/tr ./temacs --batch --load loadup bootstrap > > the output file /tmp/tr should contain something like this: > > 18406 personality(0xffffffff) = 0 (PER_LINUX) > 18406 personality(PER_LINUX|ADDR_NO_RANDOMIZE) = 0 (PER_LINUX) > 18406 personality(0xffffffff) = 0x40000 (PER_LINUX|ADDR_NO_RANDOMIZE) I see the above personality calls exactly as you've shown them. strace never gets to the next execve; the crash happens before the next execve is run, see below. > 18406 execve("./temacs", ["./temacs", "--batch", "--load", "loadup", > "bootstrap"], 0xc521b0 /* 80 vars */) = 0 > > and later on there should be one more call to 'personality', like this: > > 18406 personality(0xffffffff) = 0x40000 (PER_LINUX|ADDR_NO_RANDOMIZE) > > If it doesn't look like this, please investigate why, by using GDB. When I run the command under gdb, it succeeds, so I had to enable core dumps to get the backtrace: $ gdb ./temacs core GNU gdb (Debian 7.12-6) 7.12.0.20161007-git Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "powerpc64le-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./temacs...done. [New LWP 26831] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/powerpc64le-linux-gnu/libthread_db.so.1". Core was generated by `./temacs --batch --load loadup bootstrap '. Program terminated with signal SIGSEGV, Segmentation fault. #0 __memcpy_power7 () at ../sysdeps/powerpc/powerpc64/power7/memcpy.S:111 111 ../sysdeps/powerpc/powerpc64/power7/memcpy.S: No such file or directory. SIGINT is used by the debugger. Are you sure you want to change it? (y or n) [answered Y; input not from terminal] DISPLAY = :0 TERM = dumb Breakpoint 1 at 0x10118108: file emacs.c, line 363. Temporary breakpoint 2 at 0x10137aa8: init_sys_modes. (3 locations) (gdb) thread apply all bt Thread 1 (Thread 0x7fff940a7be0 (LWP 26831)): #0 __memcpy_power7 () at ../sysdeps/powerpc/powerpc64/power7/memcpy.S:111 #1 0x00000000101ec188 in memcpy (__len=383855080, __src=0x105cd618, __dest=<optimized out>) at /usr/include/powerpc64le-linux-gnu/bits/string3.h:53 #2 unexec (new_name=0x11b934f8 <bss_sbrk_buffer+22229000> "/build/emacs/src/emacs", old_name=0x11b93528 <bss_sbrk_buffer+22229048> "/build/emacs/src/temacs") at unexelf.c:411 #3 0x0000000010110bec in Fdump_emacs (filename=XIL(0x11b945a4), symfile=XIL(0x11b94584)) at emacs.c:2224 #4 0x00000000101a7cb8 in eval_sub (form=...) at eval.c:2244 #5 0x00000000101a8194 in Fprogn (body=XIL(0x107360d3)) at eval.c:459 #6 0x00000000101a7d6c in eval_sub (form=...) at eval.c:2193 #7 0x00000000101ab8bc in Fif (args=...) at eval.c:414 #8 0x00000000101a7d6c in eval_sub (form=...) at eval.c:2193 #9 0x00000000101d8a98 in readevalloop (readcharfun=XIL(0x68d0), infile0=0x7fffc3cef950, sourcename=XIL(0x107216a4), printflag=false, unibyte=..., readfun=XIL(0), start=XIL(0), end=XIL(0)) at lread.c:2048 #10 0x00000000101d90bc in Fload (file=XIL(0x107215a4), noerror=..., nomessage=XIL(0), nosuffix=..., must_suffix=...) at lread.c:1435 #11 0x00000000101a7c34 in eval_sub (form=...) at eval.c:2255 #12 0x00000000101acd18 in Feval (form=XIL(0x10727f03), lexical=...) at eval.c:2061 #13 0x00000000101188f8 in top_level_2 () at keyboard.c:1119 #14 0x00000000101a6410 in internal_condition_case (bfun=0x101188d0 <top_level_2>, handlers=..., hfun=0x1011f030 <cmd_error>) at eval.c:1336 #15 0x00000000101188a8 in top_level_1 (ignore=...) at keyboard.c:1127 #16 0x00000000101a6364 in internal_catch (tag=..., func=0x10118800 <top_level_1>, arg=XIL(0)) at eval.c:1101 #17 0x000000001011873c in command_loop () at keyboard.c:1088 #18 0x000000001011e89c in recursive_edit_1 () at keyboard.c:695 #19 0x000000001011eeb4 in Frecursive_edit () at keyboard.c:766 #20 0x0000000010017804 in main (argc=<optimized out>, argv=0x7fffc3cf0308) at emacs.c:1717 You can't do that without a process to debug. (gdb) > 2. What filesystem are you building in, ext4 > and does it support xattrs? Yes. > What is the output of the following in your build directory? > > touch x > setfattr -n user.variable -v value x > getfattr -n user.variable x I bootstrapped Debian from a minimal root file system, so I didn't have the attr package installed (i.e., the setfattr and getfattr commands were not found). After I installed it, the output of the above commands is: $ touch x $ setfattr -n user.variable -v value x $ getfattr -n user.variable x # file: x user.variable="value" > 3. What is the output of "getfattr -d src/temacs"? It should be something like this: > > # file: src/temacs > user.pax.flags="er" > > If not, we need to figure out why not. Yes, after I re-ran configure and make after having installed the attr package: $ getfattr -d src/temacs # file: src/temacs user.pax.flags="er" > 4. The output of 'make V=1' should contain something like the > following. If not, please investigate why not. > > setfattr -n user.pax.flags -v er temacs When setfattr is not installed, configure shows: [...] checking for setfattr... no [...] and that command is not run. After I installed the attr package, configure shows: checking for setfattr... yes and the above command shows up in make V=1 output. Thomas ^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le 2018-10-29 1:47 ` Thomas Fitzsimmons @ 2018-10-29 7:22 ` Paul Eggert 2018-10-30 2:44 ` Thomas Fitzsimmons 0 siblings, 1 reply; 18+ messages in thread From: Paul Eggert @ 2018-10-29 7:22 UTC (permalink / raw) To: Thomas Fitzsimmons; +Cc: 33174 Thomas Fitzsimmons wrote: > Paul Eggert <eggert@cs.ucla.edu> writes: > >> Wonderful. Yet another reason we need to get the pdumper branch working. Anyway: >> >> 1. Can you use strace and/or GDB to investigate how ./temacs is >> disabling address randomization? In the emacs-26 branch, if you run >> this command in src: >> >> strace -f -o /tmp/tr ./temacs --batch --load loadup bootstrap >> >> the output file /tmp/tr should contain something like this: >> >> 18406 personality(0xffffffff) = 0 (PER_LINUX) >> 18406 personality(PER_LINUX|ADDR_NO_RANDOMIZE) = 0 (PER_LINUX) >> 18406 personality(0xffffffff) = 0x40000 (PER_LINUX|ADDR_NO_RANDOMIZE) > > I see the above personality calls exactly as you've shown them. > > strace never gets to the next execve; the crash happens before the next > execve is run, see below. > >> 18406 execve("./temacs", ["./temacs", "--batch", "--load", "loadup", >> "bootstrap"], 0xc521b0 /* 80 vars */) = 0 So there are no more system calls after personality(0xffffffff)? That is, the crash happens immediately before any other system calls? What does 'strace' say about the crash? For me, the execve is the first syscall after the 'personality(0xffffffff) = 0x40000'. If you're seeing some other syscall there (or are seeing a crash), please investigate why, presumably with GDB. > When I run the command under gdb, it succeeds, so I had to enable core > dumps to get the backtrace: Core dumps won't help us much I'm afraid. Instead, when debugging ./temacs, please use the GDB command "set disable-randomization off" before issuing the GDB command "run --batch --load loadup bootstrap". ^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le 2018-10-29 7:22 ` Paul Eggert @ 2018-10-30 2:44 ` Thomas Fitzsimmons 2018-10-30 5:58 ` Paul Eggert 0 siblings, 1 reply; 18+ messages in thread From: Thomas Fitzsimmons @ 2018-10-30 2:44 UTC (permalink / raw) To: Paul Eggert; +Cc: 33174 Paul Eggert <eggert@cs.ucla.edu> writes: > Thomas Fitzsimmons wrote: >> Paul Eggert <eggert@cs.ucla.edu> writes: >> >>> Wonderful. Yet another reason we need to get the pdumper branch working. Anyway: >>> >>> 1. Can you use strace and/or GDB to investigate how ./temacs is >>> disabling address randomization? In the emacs-26 branch, if you run >>> this command in src: >>> >>> strace -f -o /tmp/tr ./temacs --batch --load loadup bootstrap >>> >>> the output file /tmp/tr should contain something like this: >>> >>> 18406 personality(0xffffffff) = 0 (PER_LINUX) >>> 18406 personality(PER_LINUX|ADDR_NO_RANDOMIZE) = 0 (PER_LINUX) >>> 18406 personality(0xffffffff) = 0x40000 (PER_LINUX|ADDR_NO_RANDOMIZE) >> >> I see the above personality calls exactly as you've shown them. >> >> strace never gets to the next execve; the crash happens before the next >> execve is run, see below. >> >>> 18406 execve("./temacs", ["./temacs", "--batch", "--load", "loadup", >>> "bootstrap"], 0xc521b0 /* 80 vars */) = 0 > > So there are no more system calls after personality(0xffffffff)? That > is, the crash happens immediately before any other system calls? There are many more syscalls before the crash. 58215 personality(0xffffffff) = 0 (PER_LINUX) 58215 personality(PER_LINUX|ADDR_NO_RANDOMIZE) = 0 (PER_LINUX) 58215 personality(0xffffffff) = 0x40000 (PER_LINUX|ADDR_NO_RANDOMIZE) 58215 brk(NULL) = 0x27070000 58215 dup2(0, 0) = 0 58215 dup2(1, 1) = 1 58215 dup2(2, 2) = 2 58215 ugetrlimit(RLIMIT_STACK, {rlim_cur=9792*1024, rlim_max=RLIM64_INFINITY}) = 0 58215 open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3 58215 fstat(3, {st_mode=S_IFREG|0644, st_size=1679776, ...}) = 0 58215 mmap(NULL, 1679776, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fffb6ed0000 58215 close(3) = 0 58215 open("/usr/lib/powerpc64le-linux-gnu/gconv/gconv-modules.cache", O_RDONLY) = 3 58215 fstat(3, {st_mode=S_IFREG|0644, st_size=26264, ...}) = 0 58215 mmap(NULL, 26264, PROT_READ, MAP_SHARED, 3, 0) = 0x7fffbbd00000 58215 close(3) = 0 58215 futex(0x7fffba001ab8, FUTEX_WAKE_PRIVATE, 2147483647) = 0 58215 timerfd_create(CLOCK_REALTIME, TFD_CLOEXEC|TFD_NONBLOCK) = 3 58215 rt_sigaction(SIGALRM, {sa_handler=0x10209280, sa_mask=[ALRM CHLD PROF WINCH], sa_flags=SA_RESTART}, NULL, 8) = 0 58215 open("/dev/urandom", O_RDONLY|O_CLOEXEC) = 4 58215 read(4, "\315\276O\225", 4) = 4 58215 close(4) = 0 [...] > What does 'strace' say about the crash? [...] 58215 write(2, "Dumping under the name emacs", 28) = 28 58215 write(2, "\n", 1) = 1 58215 stat("/a/b/c/emacs/src/emacs", {st_mode=S_IFREG|0755, st_size=57908416, ...}) = 0 58215 unlink("/a/b/c/emacs/src/emacs") = 0 58215 write(2, "********************************"..., 51) = 51 58215 write(2, "Warning: Your system has a gap b"..., 51) = 51 58215 write(2, "heap (346093672 bytes). This us"..., 61) = 61 58215 write(2, "or something similar is in effec"..., 49) = 49 58215 write(2, "fail because of this. See the s"..., 45) = 45 58215 write(2, "exec-shield in etc/PROBLEMS for "..., 50) = 50 58215 write(2, "********************************"..., 51) = 51 58215 write(2, "22440720 of 33554432 static heap"..., 43) = 43 58215 write(2, "\n", 1) = 1 58215 open("/a/b/c/emacs/src/temacs", O_RDONLY|O_CLOEXEC) = 5 58215 fstat(5, {st_mode=S_IFREG|0755, st_size=23687896, ...}) = 0 58215 mmap(NULL, 23687896, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fffb3980000 58215 read(5, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\2\0\25\0\1\0\0\0,\207\1\20\0\0\0\0"..., 23687896) = 23687896 58215 open("/a/b/c/emacs/src/emacs", O_RDWR|O_CREAT|O_CLOEXEC, 0777) = 6 58215 ftruncate(6, 403938496) = 0 58215 mmap(NULL, 403938496, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fff9b840000 58215 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x12670000} --- 58215 +++ killed by SIGSEGV +++ I could attach the whole strace output but it's pretty big. > For me, the execve is the first syscall after the > 'personality(0xffffffff) = 0x40000'. If you're seeing some other > syscall there (or are seeing a crash), please investigate why, > presumably with GDB. It seems like it's crashing when trying to memcpy over the BSS area, on this line in unexelf.c (see below): /* Copy over what we have in memory now for the bss area. */ memcpy (new_base + new_data2_offset, (caddr_t) old_bss_addr, bss_size_growth); >> When I run the command under gdb, it succeeds, so I had to enable core >> dumps to get the backtrace: > > Core dumps won't help us much I'm afraid. Instead, when debugging > ./temacs, please use the GDB command "set disable-randomization off" > before issuing the GDB command "run --batch --load loadup bootstrap". (memcpy.S below might not be quite the right version; the Debian libc6-dbg 2.24 package doesn't seem to have that file, so I pointed gdb to a copy within a glibc 2.24 git checkout instead.) [...] Dumping under the name emacs ************************************************** Warning: Your system has a gap between BSS and the heap (652277864 bytes). This usually means that exec-shield or something similar is in effect. The dump may fail because of this. See the section about exec-shield in etc/PROBLEMS for more information. ************************************************** 22440720 of 33554432 static heap bytes used Program received signal SIGSEGV, Segmentation fault. __memcpy_power7 () at ../sysdeps/powerpc/powerpc64/power7/memcpy.S:111 warning: Source file is more recent than executable. 111 lxvd2x 8,src,7 (gdb) thread apply all bt Thread 1 (Thread 0x7fffb0f47be0 (LWP 23958)): #0 __memcpy_power7 () at ../sysdeps/powerpc/powerpc64/power7/memcpy.S:111 #1 0x00000000101ec178 in memcpy (__len=686434792, __src=<optimized out>, __dest=<optimized out>) at /usr/include/powerpc64le-linux-gnu/bits/string3.h:53 #2 unexec (new_name=0x11b934f8 <bss_sbrk_buffer+22229000> "/a/b/c/emacs/src/emacs", old_name=0x11b93528 <bss_sbrk_buffer+22229048> "/a/b/c/emacs/src/temacs") at unexelf.c:410 #3 0x0000000010110bec in Fdump_emacs (filename=XIL(0x11b945a4), symfile=XIL(0x11b94584)) at emacs.c:2224 #4 0x00000000101a7cb8 in eval_sub (form=...) at eval.c:2244 #5 0x00000000101a8194 in Fprogn (body=XIL(0x107656d3)) at eval.c:459 #6 0x00000000101a7d6c in eval_sub (form=...) at eval.c:2193 #7 0x00000000101ab8bc in Fif (args=...) at eval.c:414 #8 0x00000000101a7d6c in eval_sub (form=...) at eval.c:2193 #9 0x00000000101d8a98 in readevalloop (readcharfun=XIL(0x68d0), infile0=0x7fffffa503a0, sourcename=XIL(0x10721684), printflag=false, unibyte=..., readfun=XIL(0), start=XIL(0), end=XIL(0)) at lread.c:2048 #10 0x00000000101d90bc in Fload (file=XIL(0x10721584), noerror=..., nomessage=XIL(0), nosuffix=..., must_suffix=...) at lread.c:1435 #11 0x00000000101a7c34 in eval_sub (form=...) at eval.c:2255 #12 0x00000000101acd18 in Feval (form=XIL(0x10727f03), lexical=...) at eval.c:2061 #13 0x00000000101188f8 in top_level_2 () at keyboard.c:1119 #14 0x00000000101a6410 in internal_condition_case (bfun=0x101188d0 <top_level_2>, handlers=..., hfun=0x1011f030 <cmd_error>) at eval.c:1336 #15 0x00000000101188a8 in top_level_1 (ignore=...) at keyboard.c:1127 #16 0x00000000101a6364 in internal_catch (tag=..., func=0x10118800 <top_level_1>, arg=XIL(0)) at eval.c:1101 #17 0x000000001011873c in command_loop () at keyboard.c:1088 #18 0x000000001011e89c in recursive_edit_1 () at keyboard.c:695 #19 0x000000001011eeb4 in Frecursive_edit () at keyboard.c:766 #20 0x0000000010017804 in main (argc=<optimized out>, argv=0x7fffffa50d58) at emacs.c:1717 Lisp Backtrace: "dump-emacs" (0xffa4fee0) "progn" (0xffa50060) "if" (0xffa501a0) "load" (0xffa505a0) BTW, let me know if you don't think it's useful to debug this further. I'm OK just disabling randomization when I build Emacs for the time being and waiting until the portable dumper work lands, but I'm happy to continue if you think it will lead to a general fix. Thanks, Thomas ^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le 2018-10-30 2:44 ` Thomas Fitzsimmons @ 2018-10-30 5:58 ` Paul Eggert 2018-10-30 9:30 ` Thomas Fitzsimmons 0 siblings, 1 reply; 18+ messages in thread From: Paul Eggert @ 2018-10-30 5:58 UTC (permalink / raw) To: Thomas Fitzsimmons; +Cc: 33174 Thomas Fitzsimmons wrote: > BTW, let me know if you don't think it's useful to debug this further. > I'm OK just disabling randomization when I build Emacs for the time > being and waiting until the portable dumper work lands, but I'm happy to > continue if you think it will lead to a general fix. It's not clear when the portable dumper will land; it might not ever land, unfortunately. So I would like to work on bug#33174 a bit longer, if only so that we can put something intelligible into the PROBLEMS file. > It seems like it's crashing when trying to memcpy over the BSS area, on > this line in unexelf.c (see below): By the time the memcpy is run the damage has already been done: the memory layout is messed up and we can't fix that simply by passing different arguments to memcpy. We have to prevent the memory layout from being messed up in the first place by disabling undesirable address space layout randomization and doing this very early in execution. The key question for me is in this set of system calls: > 58215 personality(0xffffffff) = 0 (PER_LINUX) > 58215 personality(PER_LINUX|ADDR_NO_RANDOMIZE) = 0 (PER_LINUX) > 58215 personality(0xffffffff) = 0x40000 (PER_LINUX|ADDR_NO_RANDOMIZE) > 58215 brk(NULL) = 0x27070000 > 58215 dup2(0, 0) = 0 > 58215 dup2(1, 1) = 1 > 58215 dup2(2, 2) = 2 Surely the call to disable_address_randomization () must have returned true, but can you verify that, either via GDB or (shudder) by inserting print statements? Also, the call from 'main' to getenv ("EMACS_HEAP_EXEC") must have returned NULL. Can you also verify this? And it appears that 'main' must have called xputenv ("EMACS_HEAP_EXEC=true") and execvp (argv[0], argv). But how can this be, since there's no execve syscall? This is the heart of the mystery, and we can find out more about it by using GDB to put breakpoints on 'personality', 'getenv', 'xputenv' and/or 'execvp' and seeing what's going on. Something like this, perhaps: $ gdb temacs (gdb) set disable-randomization off (gdb) b personality (gdb) b getenv (gdb) b xputenv (gdb) b execvp (gdb) r --batch --load loadup bootstrap and seeing which of these functions get executed in what order, and what they return. ^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le 2018-10-30 5:58 ` Paul Eggert @ 2018-10-30 9:30 ` Thomas Fitzsimmons 2018-11-12 6:39 ` Paul Eggert 0 siblings, 1 reply; 18+ messages in thread From: Thomas Fitzsimmons @ 2018-10-30 9:30 UTC (permalink / raw) To: Paul Eggert; +Cc: 33174 Paul Eggert <eggert@cs.ucla.edu> writes: > Thomas Fitzsimmons wrote: >> BTW, let me know if you don't think it's useful to debug this further. >> I'm OK just disabling randomization when I build Emacs for the time >> being and waiting until the portable dumper work lands, but I'm happy to >> continue if you think it will lead to a general fix. > > It's not clear when the portable dumper will land; it might not ever > land, unfortunately. So I would like to work on bug#33174 a bit > longer, if only so that we can put something intelligible into the > PROBLEMS file. OK. >> It seems like it's crashing when trying to memcpy over the BSS area, on >> this line in unexelf.c (see below): > > By the time the memcpy is run the damage has already been done: the > memory layout is messed up and we can't fix that simply by passing > different arguments to memcpy. We have to prevent the memory layout > from being messed up in the first place by disabling undesirable > address space layout randomization and doing this very early in > execution. Ah, OK, so the goal is to programmatically do something similar to echo'ing to randomize_va_space, but just for the temacs process. > The key question for me is in this set of system calls: > >> 58215 personality(0xffffffff) = 0 (PER_LINUX) >> 58215 personality(PER_LINUX|ADDR_NO_RANDOMIZE) = 0 (PER_LINUX) >> 58215 personality(0xffffffff) = 0x40000 (PER_LINUX|ADDR_NO_RANDOMIZE) >> 58215 brk(NULL) = 0x27070000 >> 58215 dup2(0, 0) = 0 >> 58215 dup2(1, 1) = 1 >> 58215 dup2(2, 2) = 2 > > Surely the call to disable_address_randomization () must have returned > true, but can you verify that, either via GDB or (shudder) by > inserting print statements? (I sorted out glibc source code and debug symbols so they'll be accurate now). Yes, disable_address_randomization returns true: [...] (gdb) finish Run till exit from #0 0x0000000010136d9c in disable_address_randomization () at sysdep.c:165 0x0000000010016c94 in main (argc=<optimized out>, argv=0x7fffd4430178) at emacs.c:710 710 if (disable_aslr && disable_address_randomization () Value returned is $1 = true [...] > Also, the call from 'main' to getenv ("EMACS_HEAP_EXEC") must have > returned NULL. Can you also verify this? (gdb) c Continuing. Breakpoint 4, 0x00007fff9dc1ef98 in __GI_getenv (name=0x10274ce8 "EMACS_HEAP_EXEC") at getenv.c:34 34 { (gdb) finish Run till exit from #0 0x00007fff9dc1ef98 in __GI_getenv (name=0x10274ce8 "EMACS_HEAP_EXEC") at getenv.c:34 0x0000000010017870 in main (argc=<optimized out>, argv=0x7ffff4883248) at emacs.c:711 711 && !getenv ("EMACS_HEAP_EXEC")) Value returned is $2 = 0x7ffff488fe49 "true" Actually, EMACS_HEAP_EXEC is true! If I unset it, then the bootstrap works with and without "Fix bootstrap infloop in GNU/Linux alpha" applied. I'm building Emacs inside Emacs via M-x shell. "EMACS_HEAP_EXEC=true" is in process-environment. Given that I'm also running EXWM, no matter what build shell I start up, even an xterm, EMACS_HEAP_EXEC is set to "true" in the environment. Ah, by running the "outer" Emacs via a serial console (i.e., not from within Emacs, and starting with EMACS_HEAP_EXEC unset in the environment), I think I see what happened. Because of the ifdef just above the randomization disablement code: # ifdef __PPC64__ bool disable_aslr = true; # else bool disable_aslr = dumping; # endif randomization is unconditionally disabled on PPC64, and so EMACS_HEAP_EXEC is unconditionally set to true in the outer build Emacs's initial-environment. With "Fix bootstrap infloop in GNU/Linux alpha" applied, building Emacs within Emacs on PPC64 will no longer work because the re-exec will be skipped during bootstrap. Maybe can you try building Emacs within Emacs on one of those CentOS machines to confirm? Thanks, Thomas ^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le 2018-10-30 9:30 ` Thomas Fitzsimmons @ 2018-11-12 6:39 ` Paul Eggert 2018-11-12 17:24 ` Andy Moreton 2018-11-13 2:29 ` Thomas Fitzsimmons 0 siblings, 2 replies; 18+ messages in thread From: Paul Eggert @ 2018-11-12 6:39 UTC (permalink / raw) To: Thomas Fitzsimmons; +Cc: 33174-done [-- Attachment #1: Type: text/plain, Size: 472 bytes --] Thomas Fitzsimmons wrote: > Actually, EMACS_HEAP_EXEC is true! OK, thanks, that explains things. I installed the first attached patch to the emacs-26 branch as a quick hack to work around the problem, and the second attached patch to the master branch to come up with a better way to address the underlying issue that doesn't involve fiddling with environment variables. Please let us know if this doesn't work for your platform, as I've tested this only on x86-64. [-- Attachment #2: 0001-Work-around-dumping-bug-on-GNU-Linux-ppc64le.patch --] [-- Type: text/x-patch, Size: 1128 bytes --] From 1d79c2ebd9bd9aa36586e57463502373c0296d11 Mon Sep 17 00:00:00 2001 From: Paul Eggert <eggert@cs.ucla.edu> Date: Sun, 11 Nov 2018 22:34:46 -0800 Subject: [PATCH] Work around dumping bug on GNU/Linux ppc64le Problem reported by Thomas Fitzsimmons (Bug#33174). Do not merge to master, as we have a better fix there. * src/Makefile.in (emacs$(EXEEXT)): (bootstrap-emacs$(EXEEXT)): Unset EMACS_HEAP_EXEC before invoking temacs. --- src/Makefile.in | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/Makefile.in b/src/Makefile.in index 6ed8f3cc91..53c18e7ac0 100644 --- a/src/Makefile.in +++ b/src/Makefile.in @@ -534,6 +534,7 @@ emacs$(EXEEXT): ifeq ($(CANNOT_DUMP),yes) ln -f temacs$(EXEEXT) $@ else + unset EMACS_HEAP_EXEC; \ LC_ALL=C $(RUN_TEMACS) -batch $(BUILD_DETAILS) -l loadup dump ifneq ($(PAXCTL_dumped),) $(PAXCTL_dumped) $@ @@ -739,6 +740,7 @@ bootstrap-emacs$(EXEEXT): ifeq ($(CANNOT_DUMP),yes) ln -f temacs$(EXEEXT) $@ else + unset EMACS_HEAP_EXEC; \ $(RUN_TEMACS) --batch $(BUILD_DETAILS) --load loadup bootstrap ifneq ($(PAXCTL_dumped),) $(PAXCTL_dumped) emacs$(EXEEXT) -- 2.17.1 [-- Attachment #3: 0001-Fix-dumping-on-GNU-Linux-ppc64le.patch --] [-- Type: text/x-patch, Size: 5574 bytes --] From c14eab222c5208ec0650292c3771a3ee632fdb0d Mon Sep 17 00:00:00 2001 From: Paul Eggert <eggert@cs.ucla.edu> Date: Sun, 11 Nov 2018 22:18:47 -0800 Subject: [PATCH] Fix dumping on GNU/Linux ppc64le Problem reported by Thomas Fitzsimmons (Bug#33174). * src/emacs.c (main): Adjust to sysdep.c changes. * src/sysdep.c (exec_personality): New static var. (disable_address_randomization): Remove, replacing with ... (maybe_disable_address_randomization): ... this new function. Do not set or use an environment variable; use a command-line argument instead, and set the new static var. Migrate the emacs.c personality-change code to here, where it belongs. (emacs_exec_file): Simplify by using new static var. --- src/emacs.c | 23 +---------------- src/lisp.h | 7 +++-- src/sysdep.c | 73 +++++++++++++++++++++++++++++++++------------------- 3 files changed, 53 insertions(+), 50 deletions(-) diff --git a/src/emacs.c b/src/emacs.c index 07df191035..512174d562 100644 --- a/src/emacs.c +++ b/src/emacs.c @@ -706,28 +706,7 @@ main (int argc, char **argv) dumping = false; #endif - /* True if address randomization interferes with memory allocation. */ -# ifdef __PPC64__ - bool disable_aslr = true; -# else - bool disable_aslr = dumping; -# endif - - if (disable_aslr && disable_address_randomization () - && !getenv ("EMACS_HEAP_EXEC")) - { - /* Set this so the personality will be reverted before execs - after this one, and to work around an re-exec loop on buggy - kernels (Bug#32083). */ - xputenv ("EMACS_HEAP_EXEC=true"); - - /* Address randomization was enabled, but is now disabled. - Re-execute Emacs to get a clean slate. */ - execvp (argv[0], argv); - - /* If the exec fails, warn and then try anyway. */ - perror (argv[0]); - } + argc = maybe_disable_address_randomization (dumping, argc, argv); #ifndef CANNOT_DUMP might_dump = !initialized; diff --git a/src/lisp.h b/src/lisp.h index eb6762678c..383d61274c 100644 --- a/src/lisp.h +++ b/src/lisp.h @@ -4325,9 +4325,12 @@ struct tty_display_info; /* Defined in sysdep.c. */ #ifdef HAVE_PERSONALITY_ADDR_NO_RANDOMIZE -extern bool disable_address_randomization (void); +extern int maybe_disable_address_randomization (bool, int, char **); #else -INLINE bool disable_address_randomization (void) { return false; } +INLINE void +maybe_disable_address_randomization (bool dumping, int argc, char **argv) +{ +} #endif extern int emacs_exec_file (char const *, char *const *, char *const *); extern void init_standard_fds (void); diff --git a/src/sysdep.c b/src/sysdep.c index 7a0c8a8ab8..ddcb594f66 100644 --- a/src/sysdep.c +++ b/src/sysdep.c @@ -150,22 +150,52 @@ static const int baud_convert[] = #ifdef HAVE_PERSONALITY_ADDR_NO_RANDOMIZE # include <sys/personality.h> -/* Disable address randomization in the current process. Return true - if addresses were randomized but this has been disabled, false - otherwise. */ -bool -disable_address_randomization (void) +/* If not -1, the personality that should be restored before exec. */ +static int exec_personality; + +/* Try to disable randomization if the current process needs it and + does not appear to have it already. */ +int +maybe_disable_address_randomization (bool dumping, int argc, char **argv) { - int pers = personality (0xffffffff); - if (pers < 0) - return false; - int desired_pers = pers | ADDR_NO_RANDOMIZE; + /* Undocumented Emacs option used only by this function. */ + static char const aslr_disabled_option[] = "--__aslr-disabled"; - /* Call 'personality' twice, to detect buggy platforms like WSL - where 'personality' always returns 0. */ - return (pers != desired_pers - && personality (desired_pers) == pers - && personality (0xffffffff) == desired_pers); + if (argc < 2 || strcmp (argv[1], aslr_disabled_option) != 0) + { + bool disable_aslr = dumping; +# ifdef __PPC64__ + disable_aslr = true; +# endif + exec_personality = disable_aslr ? personality (0xffffffff) : -1; + if (exec_personality & ADDR_NO_RANDOMIZE) + exec_personality = -1; + if (exec_personality != -1 + && personality (exec_personality | ADDR_NO_RANDOMIZE) != -1) + { + char **newargv = malloc ((argc + 2) * sizeof *newargv); + if (newargv) + { + /* Invoke self with undocumented option. */ + newargv[0] = argv[0]; + newargv[1] = (char *) aslr_disabled_option; + memcpy (&newargv[2], &argv[1], argc * sizeof *newargv); + execvp (newargv[0], newargv); + } + + /* If malloc or execvp fails, warn and then try anyway. */ + perror (argv[0]); + free (newargv); + } + } + else + { + /* Our earlier incarnation already disabled ASLR. */ + argc--; + memmove (&argv[1], &argv[2], argc * sizeof *argv); + } + + return argc; } #endif @@ -177,21 +207,12 @@ int emacs_exec_file (char const *file, char *const *argv, char *const *envp) { #ifdef HAVE_PERSONALITY_ADDR_NO_RANDOMIZE - int pers = getenv ("EMACS_HEAP_EXEC") ? personality (0xffffffff) : -1; - bool change_personality = 0 <= pers && pers & ADDR_NO_RANDOMIZE; - if (change_personality) - personality (pers & ~ADDR_NO_RANDOMIZE); + if (exec_personality != -1) + personality (exec_personality); #endif execve (file, argv, envp); - int err = errno; - -#ifdef HAVE_PERSONALITY_ADDR_NO_RANDOMIZE - if (change_personality) - personality (pers); -#endif - - return err; + return errno; } /* If FD is not already open, arrange for it to be open with FLAGS. */ -- 2.17.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le 2018-11-12 6:39 ` Paul Eggert @ 2018-11-12 17:24 ` Andy Moreton 2018-11-13 2:29 ` Thomas Fitzsimmons 1 sibling, 0 replies; 18+ messages in thread From: Andy Moreton @ 2018-11-12 17:24 UTC (permalink / raw) To: 33174 On Sun 11 Nov 2018, Paul Eggert wrote: > Thomas Fitzsimmons wrote: >> Actually, EMACS_HEAP_EXEC is true! > > OK, thanks, that explains things. I installed the first attached patch to the > emacs-26 branch as a quick hack to work around the problem, and the second > attached patch to the master branch to come up with a better way to address > the underlying issue that doesn't involve fiddling with environment variables. > Please let us know if this doesn't work for your platform, as I've tested this > only on x86-64. The second patch (applied to master) breaks the build on Windows. > diff --git a/src/lisp.h b/src/lisp.h > index eb6762678c..383d61274c 100644 > --- a/src/lisp.h > +++ b/src/lisp.h > @@ -4325,9 +4325,12 @@ struct tty_display_info; > > /* Defined in sysdep.c. */ > #ifdef HAVE_PERSONALITY_ADDR_NO_RANDOMIZE > -extern bool disable_address_randomization (void); > +extern int maybe_disable_address_randomization (bool, int, char **); > #else > -INLINE bool disable_address_randomization (void) { return false; } > +INLINE void > +maybe_disable_address_randomization (bool dumping, int argc, char **argv) > +{ > +} The return type must be consistent as it is used by its caller: C:/emacs/git/emacs/master/src/emacs.c: In function 'main': C:/emacs/git/emacs/master/src/emacs.c:709:8: error: void value not ignored as it ought to be argc = maybe_disable_address_randomization (dumping, argc, argv); ^ The obvious fix seems to work: #ifdef HAVE_PERSONALITY_ADDR_NO_RANDOMIZE extern int maybe_disable_address_randomization (bool, int, char **); #else INLINE int maybe_disable_address_randomization (bool dumping, int argc, char **argv) { return argc; } #endif > +int > +maybe_disable_address_randomization (bool dumping, int argc, char **argv) > { [snipped] > + if (argc < 2 || strcmp (argv[1], aslr_disabled_option) != 0) > + { [snipped] > + } > + else > + { > + /* Our earlier incarnation already disabled ASLR. */ > + argc--; > + memmove (&argv[1], &argv[2], argc * sizeof *argv); > + } This looks wrong: the memmove size is one element too long, and if argc was 2 then there is an out of bounds access. AndyM ^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le 2018-11-12 6:39 ` Paul Eggert 2018-11-12 17:24 ` Andy Moreton @ 2018-11-13 2:29 ` Thomas Fitzsimmons 2018-11-13 14:42 ` Paul Eggert 1 sibling, 1 reply; 18+ messages in thread From: Thomas Fitzsimmons @ 2018-11-13 2:29 UTC (permalink / raw) To: Paul Eggert; +Cc: 33174-done Paul Eggert <eggert@cs.ucla.edu> writes: > Thomas Fitzsimmons wrote: >> Actually, EMACS_HEAP_EXEC is true! > > OK, thanks, that explains things. I installed the first attached patch > to the emacs-26 branch as a quick hack to work around the problem, I confirmed this works around the problem on the emacs-26 branch with EMACS_HEAP_EXEC set in the initial build environment, and confirmed that the build still works without it set. > and the second attached patch to the master branch to come up with a > better way to address the underlying issue that doesn't involve > fiddling with environment variables. Please let us know if this > doesn't work for your platform, as I've tested this only on x86-64. Likewise this fixes the failure on master on my system with EMACS_HEAP_EXEC set, and the build still works without it set. It looks like the second patch still disables ASLR unconditionally on PPC64. If you want to suggest experiments I can try on my system, maybe I can attempt a fix so that ASLR can be disabled only during dumping on PPC64 like on other archiectures. Or do you know that's an impossibility for some reason? Thanks, Thomas ^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le 2018-11-13 2:29 ` Thomas Fitzsimmons @ 2018-11-13 14:42 ` Paul Eggert 2018-11-19 3:14 ` Thomas Fitzsimmons 0 siblings, 1 reply; 18+ messages in thread From: Paul Eggert @ 2018-11-13 14:42 UTC (permalink / raw) To: Thomas Fitzsimmons; +Cc: 33174-done [-- Attachment #1: Type: text/plain, Size: 595 bytes --] Thomas Fitzsimmons wrote: > It looks like the second patch still disables ASLR unconditionally on > PPC64. If you want to suggest experiments I can try on my system, maybe > I can attempt a fix so that ASLR can be disabled only during dumping on > PPC64 like on other archiectures. Or do you know that's an > impossibility for some reason? I don't recall the reason that ppc64 is different, but you can experiment by applying the attached patch. To help with the detective work you might look at the thread containing this email: https://sourceware.org/ml/libc-alpha/2016-07/msg00538.html [-- Attachment #2: ppc.diff --] [-- Type: text/x-patch, Size: 518 bytes --] diff --git a/src/sysdep.c b/src/sysdep.c index ddcb594f66..71aa63a1c0 100644 --- a/src/sysdep.c +++ b/src/sysdep.c @@ -164,9 +164,6 @@ maybe_disable_address_randomization (bool dumping, int argc, char **argv) if (argc < 2 || strcmp (argv[1], aslr_disabled_option) != 0) { bool disable_aslr = dumping; -# ifdef __PPC64__ - disable_aslr = true; -# endif exec_personality = disable_aslr ? personality (0xffffffff) : -1; if (exec_personality & ADDR_NO_RANDOMIZE) exec_personality = -1; ^ permalink raw reply related [flat|nested] 18+ messages in thread
* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le 2018-11-13 14:42 ` Paul Eggert @ 2018-11-19 3:14 ` Thomas Fitzsimmons 0 siblings, 0 replies; 18+ messages in thread From: Thomas Fitzsimmons @ 2018-11-19 3:14 UTC (permalink / raw) To: Paul Eggert; +Cc: 33174-done [-- Attachment #1: Type: text/plain, Size: 1916 bytes --] Paul Eggert <eggert@cs.ucla.edu> writes: > Thomas Fitzsimmons wrote: >> It looks like the second patch still disables ASLR unconditionally on >> PPC64. If you want to suggest experiments I can try on my system, maybe >> I can attempt a fix so that ASLR can be disabled only during dumping on >> PPC64 like on other archiectures. Or do you know that's an >> impossibility for some reason? > > I don't recall the reason that ppc64 is different, but you can > experiment by applying the attached patch. To help with the detective > work you might look at the thread containing this email: > > https://sourceware.org/ml/libc-alpha/2016-07/msg00538.html I read through that thread, then tried the patch. I was surprised that everything seems to work: - Bootstrap works within a reasonable amount of RAM (limited with ulimit -d to make sure it didn't spike up to 12 GB). - The Emacs process (run as ./src/emacs from the build directory): - does not use 12 GB of RAM (i.e., RAM usage is reasonable after opening a C file, src/xdisp.c). - has a /proc/<pid>/personality value of 00000000. - runs subprocesses with a /proc/self/personality value of 00000000. My system has glibc 2.24. src/config.h has "#define HYBRID_MALLOC 1". Hybrid malloc was proposed as a proper fix during that discussion, but wasn't enabled by default at the time. Given that I configured with just "./configure --with-x-toolkit=lucid", it seems that hybrid malloc is enabled by default now. I'm not sure if this would now work on older glibc versions as well, but in case it doesn't, what about something like the attached for master? Are there other tests I should try, to reveal problems? If the patch looks OK, I should probably also test it on Fedora as part of a local Rawhide Emacs package build, since lots of the discussion was about getting that working reliably. Thanks, Thomas [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Enable-ASLR-on-PPC64-glibc-versions-2.24.patch --] [-- Type: text/x-diff, Size: 977 bytes --] From d69509b909bc35b0908e40a9be741e9ef38b3402 Mon Sep 17 00:00:00 2001 From: Thomas Fitzsimmons <fitzsim@fitzsim.org> Date: Sun, 18 Nov 2018 11:07:49 -0500 Subject: [PATCH] Enable ASLR on PPC64 when glibc version is >= 2.24 * src/sysdep.c (maybe_disable_address_randomization): Disable randomization on PPC64 only when glibc version is less than 2.24. --- src/sysdep.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/sysdep.c b/src/sysdep.c index ddcb594f66..1a863e65ab 100644 --- a/src/sysdep.c +++ b/src/sysdep.c @@ -164,7 +164,8 @@ maybe_disable_address_randomization (bool dumping, int argc, char **argv) if (argc < 2 || strcmp (argv[1], aslr_disabled_option) != 0) { bool disable_aslr = dumping; -# ifdef __PPC64__ +# if defined(__PPC64__) \ + && (__GLIBC__ < 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ < 24)) disable_aslr = true; # endif exec_personality = disable_aslr ? personality (0xffffffff) : -1; -- 2.11.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
end of thread, other threads:[~2018-11-19 3:14 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-10-27 10:35 bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le Thomas Fitzsimmons 2018-10-27 13:16 ` Thomas Fitzsimmons 2018-10-27 20:26 ` Paul Eggert 2018-10-28 8:05 ` Paul Eggert 2018-10-28 12:56 ` Thomas Fitzsimmons 2018-10-28 19:22 ` Paul Eggert 2018-10-30 1:28 ` Thomas Fitzsimmons 2018-10-30 2:16 ` Paul Eggert 2018-10-29 1:47 ` Thomas Fitzsimmons 2018-10-29 7:22 ` Paul Eggert 2018-10-30 2:44 ` Thomas Fitzsimmons 2018-10-30 5:58 ` Paul Eggert 2018-10-30 9:30 ` Thomas Fitzsimmons 2018-11-12 6:39 ` Paul Eggert 2018-11-12 17:24 ` Andy Moreton 2018-11-13 2:29 ` Thomas Fitzsimmons 2018-11-13 14:42 ` Paul Eggert 2018-11-19 3:14 ` Thomas Fitzsimmons
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).