unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le
@ 2018-10-27 10:35 Thomas Fitzsimmons
  2018-10-27 13:16 ` Thomas Fitzsimmons
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Fitzsimmons @ 2018-10-27 10:35 UTC (permalink / raw)
  To: 33174; +Cc: Paul Eggert

Hi Paul,

On ppc64le Debian GNU/Linux 9, I'm seeing a bootstrap failure on the
master and emacs-26 branches that I didn't see the last time I built
Emacs:

[...]
Dumping under the name emacs
**************************************************
Warning: Your system has a gap between BSS and the
heap (424960392 bytes).  This usually means that exec-shield
or something similar is in effect.  The dump may
fail because of this.  See the section about
exec-shield in etc/PROBLEMS for more information.
**************************************************
22496304 of 33554432 static heap bytes used
Makefile:749: recipe for target 'bootstrap-emacs' failed
make[1]: *** [bootstrap-emacs] Segmentation fault
make[1]: Leaving directory '/home/fitzsim/sources/emacs/src'
Makefile:421: recipe for target 'src' failed
make: *** [src] Error 2

I bisected the failure to:

commit 65889a6d127fcbbbdc1e74d26036e91bd24d1405
Author: Paul Eggert <eggert@cs.ucla.edu>
Date:   Sun Jul 8 09:04:02 2018 -0700

    Fix bootstrap infloop in GNU/Linux alpha
    
    * src/emacs.c (main): Do not re-exec if EMACS_HEAP_EXEC
    is already set (Bug#32083).

If I revert that commit's patch, the build succeeds.

Thomas





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le
  2018-10-27 10:35 bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le Thomas Fitzsimmons
@ 2018-10-27 13:16 ` Thomas Fitzsimmons
  2018-10-27 20:26   ` Paul Eggert
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Fitzsimmons @ 2018-10-27 13:16 UTC (permalink / raw)
  To: 33174; +Cc: Paul Eggert

Thomas Fitzsimmons <fitzsim@fitzsim.org> writes:

> Hi Paul,
>
> On ppc64le Debian GNU/Linux 9, I'm seeing a bootstrap failure on the
> master and emacs-26 branches that I didn't see the last time I built
> Emacs:
>
> [...]
> Dumping under the name emacs
> **************************************************
> Warning: Your system has a gap between BSS and the
> heap (424960392 bytes).  This usually means that exec-shield
> or something similar is in effect.  The dump may
> fail because of this.  See the section about
> exec-shield in etc/PROBLEMS for more information.
> **************************************************
> 22496304 of 33554432 static heap bytes used
> Makefile:749: recipe for target 'bootstrap-emacs' failed
> make[1]: *** [bootstrap-emacs] Segmentation fault
> make[1]: Leaving directory '/home/fitzsim/sources/emacs/src'
> Makefile:421: recipe for target 'src' failed
> make: *** [src] Error 2
>
> I bisected the failure to:
>
> commit 65889a6d127fcbbbdc1e74d26036e91bd24d1405
> Author: Paul Eggert <eggert@cs.ucla.edu>
> Date:   Sun Jul 8 09:04:02 2018 -0700
>
>     Fix bootstrap infloop in GNU/Linux alpha
>     
>     * src/emacs.c (main): Do not re-exec if EMACS_HEAP_EXEC
>     is already set (Bug#32083).
>
> If I revert that commit's patch, the build succeeds.

It looks like this segmentation fault is influenced by the setting in
/proc/sys/kernel/randomize_va_space.

Running by hand in src (with the patch in question applied):

./temacs --batch --load loadup bootstrap

- if randomize_va_space contains 2, bootstrap fails:

  segfaults:

  Dumping under the name emacs
  **************************************************
  Warning: Your system has a gap between BSS and the
  heap (752378248 bytes).  This usually means that exec-shield
  or something similar is in effect.  The dump may
  fail because of this.  See the section about
  exec-shield in etc/PROBLEMS for more information.
  **************************************************
  22496304 of 33554432 static heap bytes used
  Segmentation fault

- if randomize_va_space contains 1 or 0, bootstrap succeeds:

  Loading /home/fitzsim/sources/emacs/lisp/tooltip.el (source)...
  Finding pointers to doc strings...
  Finding pointers to doc strings...done
  Dumping under the name emacs
  22496304 of 33554432 static heap bytes used
  99079 pure bytes used

Thomas





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le
  2018-10-27 13:16 ` Thomas Fitzsimmons
@ 2018-10-27 20:26   ` Paul Eggert
  2018-10-28  8:05     ` Paul Eggert
  2018-10-29  1:47     ` Thomas Fitzsimmons
  0 siblings, 2 replies; 18+ messages in thread
From: Paul Eggert @ 2018-10-27 20:26 UTC (permalink / raw)
  To: Thomas Fitzsimmons, 33174

Wonderful. Yet another reason we need to get the pdumper branch working. Anyway:

1. Can you use strace and/or GDB to investigate how ./temacs is disabling 
address randomization? In the emacs-26 branch, if you run this command in src:

strace -f -o /tmp/tr ./temacs --batch  --load loadup bootstrap

the output file /tmp/tr should contain something like this:

18406 personality(0xffffffff)           = 0 (PER_LINUX)
18406 personality(PER_LINUX|ADDR_NO_RANDOMIZE) = 0 (PER_LINUX)
18406 personality(0xffffffff)           = 0x40000 (PER_LINUX|ADDR_NO_RANDOMIZE)
18406 execve("./temacs", ["./temacs", "--batch", "--load", "loadup", 
"bootstrap"], 0xc521b0 /* 80 vars */) = 0

and later on there should be one more call to 'personality', like this:

18406 personality(0xffffffff)           = 0x40000 (PER_LINUX|ADDR_NO_RANDOMIZE)

If it doesn't look like this, please investigate why, by using GDB.


2. What filesystem are you building in, and does it support xattrs? What is the 
output of the following in your build directory?

     touch x
     setfattr -n user.variable -v value x
     getfattr -n user.variable x


3. What is the output of "getfattr -d src/temacs"? It should be something like this:

# file: src/temacs
user.pax.flags="er"

If not, we need to figure out why not.


4. The output of 'make V=1' should contain something like the following. If not, 
please investigate why not.

setfattr -n user.pax.flags -v er temacs





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le
  2018-10-27 20:26   ` Paul Eggert
@ 2018-10-28  8:05     ` Paul Eggert
  2018-10-28 12:56       ` Thomas Fitzsimmons
  2018-10-29  1:47     ` Thomas Fitzsimmons
  1 sibling, 1 reply; 18+ messages in thread
From: Paul Eggert @ 2018-10-28  8:05 UTC (permalink / raw)
  To: Thomas Fitzsimmons; +Cc: 33174

I built and ran Emacs master on two CentOS 7 ppc64le platforms with no trouble. 
So the problem is not GNU/Linux ppc64le per se; it's something more specific to 
your platform. Here are some characteristics of my builds:

Emacs master branch commit 3dd16a89bf410d77e9ddc41cbfbbd4b343928d6d

./configure --with-x-toolkit=no --with-xpm=no --with-jpeg=no --with-png=no 
--with-gif=no --with-tiff=no

Both machines were CentOS 7. On both of them, 
/proc/sys/kernel/randomize_va_space is 2, just as in your Debian machine.

uname -rvmpi reported the following (respectively):
3.10.0-693.11.6.el7.ppc64le #1 SMP Thu Jan 4 18:15:24 GMT 2018 ppc64le ppc64le 
ppc64le
4.14.0-49.13.1.el7a.ppc64le #1 SMP Wed Sep 26 21:58:41 GMT 2018 ppc64le ppc64le 
ppc64le

gcc --version reported the following (respectively):
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28)

On the machine running 4.14.0, there was no setfattr or getfattr command; 
however, ./temacs personality (0xffffffff) returned 0x40000 (i.e., 
ADDR_NO_RANDOMIZE) and so ./temacs did not reexec itself. I don't know why it 
returned 0x40000; even little programs that only call personality (0xffffffff) 
return 0x40000 on this platform.





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le
  2018-10-28  8:05     ` Paul Eggert
@ 2018-10-28 12:56       ` Thomas Fitzsimmons
  2018-10-28 19:22         ` Paul Eggert
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Fitzsimmons @ 2018-10-28 12:56 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 33174

[-- Attachment #1: Type: text/plain, Size: 608 bytes --]

On 2018-10-28 4:05 AM, "Paul Eggert" <eggert@cs.ucla.edu> wrote:
>
> I built and ran Emacs master on two CentOS 7 ppc64le platforms with no
trouble. So the problem is not GNU/Linux ppc64le per se; it's something
more specific to your platform.

OK, thanks for checking. I'm running the Linux-libre 4.18 kernel built from
source with powernv_defconfig -- I should have mentioned that in the
initial report, sorry.  I'll try the debugging steps you've suggested when
I get back to my system, and narrow down what the relevant options are.
Everything else on my system is stock Debian stretch packages.

Thomas

[-- Attachment #2: Type: text/html, Size: 755 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le
  2018-10-28 12:56       ` Thomas Fitzsimmons
@ 2018-10-28 19:22         ` Paul Eggert
  2018-10-30  1:28           ` Thomas Fitzsimmons
  0 siblings, 1 reply; 18+ messages in thread
From: Paul Eggert @ 2018-10-28 19:22 UTC (permalink / raw)
  To: Thomas Fitzsimmons; +Cc: 33174

Thomas Fitzsimmons wrote:
> I'm running the Linux-libre 4.18 kernel built from
> source with powernv_defconfig

In that case, please also verify that your kernel configuration has enabled 
XATTR_PAX. See, for example, 
<https://wiki.gentoo.org/wiki/Hardened/PaX_Quickstart>. If you have a 
superhardened kernel that refuses to disable ASLR under any circumstances, then 
you'll need to build Emacs with './configure CANNOT_DUMP=yes', something I can't 
recommend for production until we get the pdumper branch merged in.





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le
  2018-10-27 20:26   ` Paul Eggert
  2018-10-28  8:05     ` Paul Eggert
@ 2018-10-29  1:47     ` Thomas Fitzsimmons
  2018-10-29  7:22       ` Paul Eggert
  1 sibling, 1 reply; 18+ messages in thread
From: Thomas Fitzsimmons @ 2018-10-29  1:47 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 33174

Paul Eggert <eggert@cs.ucla.edu> writes:

> Wonderful. Yet another reason we need to get the pdumper branch working. Anyway:
>
> 1. Can you use strace and/or GDB to investigate how ./temacs is
> disabling address randomization? In the emacs-26 branch, if you run
> this command in src:
>
> strace -f -o /tmp/tr ./temacs --batch  --load loadup bootstrap
>
> the output file /tmp/tr should contain something like this:
>
> 18406 personality(0xffffffff)           = 0 (PER_LINUX)
> 18406 personality(PER_LINUX|ADDR_NO_RANDOMIZE) = 0 (PER_LINUX)
> 18406 personality(0xffffffff)           = 0x40000 (PER_LINUX|ADDR_NO_RANDOMIZE)

I see the above personality calls exactly as you've shown them.

strace never gets to the next execve; the crash happens before the next
execve is run, see below.

> 18406 execve("./temacs", ["./temacs", "--batch", "--load", "loadup",
> "bootstrap"], 0xc521b0 /* 80 vars */) = 0
>
> and later on there should be one more call to 'personality', like this:
>
> 18406 personality(0xffffffff)           = 0x40000 (PER_LINUX|ADDR_NO_RANDOMIZE)
>
> If it doesn't look like this, please investigate why, by using GDB.

When I run the command under gdb, it succeeds, so I had to enable core
dumps to get the backtrace:

$ gdb ./temacs core
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "powerpc64le-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./temacs...done.
[New LWP 26831]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/powerpc64le-linux-gnu/libthread_db.so.1".
Core was generated by `./temacs --batch --load loadup bootstrap '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __memcpy_power7 () at ../sysdeps/powerpc/powerpc64/power7/memcpy.S:111
111	../sysdeps/powerpc/powerpc64/power7/memcpy.S: No such file or directory.
SIGINT is used by the debugger.
Are you sure you want to change it? (y or n) [answered Y; input not from terminal]
DISPLAY = :0
TERM = dumb
Breakpoint 1 at 0x10118108: file emacs.c, line 363.
Temporary breakpoint 2 at 0x10137aa8: init_sys_modes. (3 locations)
(gdb) thread apply all bt

Thread 1 (Thread 0x7fff940a7be0 (LWP 26831)):
#0  __memcpy_power7 () at ../sysdeps/powerpc/powerpc64/power7/memcpy.S:111
#1  0x00000000101ec188 in memcpy (__len=383855080, __src=0x105cd618, __dest=<optimized out>) at /usr/include/powerpc64le-linux-gnu/bits/string3.h:53
#2  unexec (new_name=0x11b934f8 <bss_sbrk_buffer+22229000> "/build/emacs/src/emacs", 
    old_name=0x11b93528 <bss_sbrk_buffer+22229048> "/build/emacs/src/temacs") at unexelf.c:411
#3  0x0000000010110bec in Fdump_emacs (filename=XIL(0x11b945a4), symfile=XIL(0x11b94584)) at emacs.c:2224
#4  0x00000000101a7cb8 in eval_sub (form=...) at eval.c:2244
#5  0x00000000101a8194 in Fprogn (body=XIL(0x107360d3)) at eval.c:459
#6  0x00000000101a7d6c in eval_sub (form=...) at eval.c:2193
#7  0x00000000101ab8bc in Fif (args=...) at eval.c:414
#8  0x00000000101a7d6c in eval_sub (form=...) at eval.c:2193
#9  0x00000000101d8a98 in readevalloop (readcharfun=XIL(0x68d0), infile0=0x7fffc3cef950, sourcename=XIL(0x107216a4), printflag=false, unibyte=..., 
    readfun=XIL(0), start=XIL(0), end=XIL(0)) at lread.c:2048
#10 0x00000000101d90bc in Fload (file=XIL(0x107215a4), noerror=..., nomessage=XIL(0), nosuffix=..., must_suffix=...) at lread.c:1435
#11 0x00000000101a7c34 in eval_sub (form=...) at eval.c:2255
#12 0x00000000101acd18 in Feval (form=XIL(0x10727f03), lexical=...) at eval.c:2061
#13 0x00000000101188f8 in top_level_2 () at keyboard.c:1119
#14 0x00000000101a6410 in internal_condition_case (bfun=0x101188d0 <top_level_2>, handlers=..., hfun=0x1011f030 <cmd_error>) at eval.c:1336
#15 0x00000000101188a8 in top_level_1 (ignore=...) at keyboard.c:1127
#16 0x00000000101a6364 in internal_catch (tag=..., func=0x10118800 <top_level_1>, arg=XIL(0)) at eval.c:1101
#17 0x000000001011873c in command_loop () at keyboard.c:1088
#18 0x000000001011e89c in recursive_edit_1 () at keyboard.c:695
#19 0x000000001011eeb4 in Frecursive_edit () at keyboard.c:766
#20 0x0000000010017804 in main (argc=<optimized out>, argv=0x7fffc3cf0308) at emacs.c:1717
You can't do that without a process to debug.
(gdb) 

> 2. What filesystem are you building in,

ext4

> and does it support xattrs?

Yes.

> What is the output of the following in your build directory?
>
>     touch x
>     setfattr -n user.variable -v value x
>     getfattr -n user.variable x

I bootstrapped Debian from a minimal root file system, so I didn't have
the attr package installed (i.e., the setfattr and getfattr commands
were not found).  After I installed it, the output of the above commands
is:

$ touch x
$ setfattr -n user.variable -v value x
$ getfattr -n user.variable x
# file: x
user.variable="value"


> 3. What is the output of "getfattr -d src/temacs"? It should be something like this:
>
> # file: src/temacs
> user.pax.flags="er"
>
> If not, we need to figure out why not.

Yes, after I re-ran configure and make after having installed the attr
package:

$ getfattr -d src/temacs
# file: src/temacs
user.pax.flags="er"


> 4. The output of 'make V=1' should contain something like the
> following. If not, please investigate why not.
>
> setfattr -n user.pax.flags -v er temacs

When setfattr is not installed, configure shows:

[...]
checking for setfattr... no
[...]

and that command is not run.  After I installed the attr package,
configure shows:

checking for setfattr... yes

and the above command shows up in make V=1 output.

Thomas





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le
  2018-10-29  1:47     ` Thomas Fitzsimmons
@ 2018-10-29  7:22       ` Paul Eggert
  2018-10-30  2:44         ` Thomas Fitzsimmons
  0 siblings, 1 reply; 18+ messages in thread
From: Paul Eggert @ 2018-10-29  7:22 UTC (permalink / raw)
  To: Thomas Fitzsimmons; +Cc: 33174

Thomas Fitzsimmons wrote:
> Paul Eggert <eggert@cs.ucla.edu> writes:
> 
>> Wonderful. Yet another reason we need to get the pdumper branch working. Anyway:
>>
>> 1. Can you use strace and/or GDB to investigate how ./temacs is
>> disabling address randomization? In the emacs-26 branch, if you run
>> this command in src:
>>
>> strace -f -o /tmp/tr ./temacs --batch  --load loadup bootstrap
>>
>> the output file /tmp/tr should contain something like this:
>>
>> 18406 personality(0xffffffff)           = 0 (PER_LINUX)
>> 18406 personality(PER_LINUX|ADDR_NO_RANDOMIZE) = 0 (PER_LINUX)
>> 18406 personality(0xffffffff)           = 0x40000 (PER_LINUX|ADDR_NO_RANDOMIZE)
> 
> I see the above personality calls exactly as you've shown them.
> 
> strace never gets to the next execve; the crash happens before the next
> execve is run, see below.
> 
>> 18406 execve("./temacs", ["./temacs", "--batch", "--load", "loadup",
>> "bootstrap"], 0xc521b0 /* 80 vars */) = 0

So there are no more system calls after personality(0xffffffff)? That is, the 
crash happens immediately before any other system calls? What does 'strace' say 
about the crash?

For me, the execve is the first syscall after the 'personality(0xffffffff) = 
0x40000'. If you're seeing some other syscall there (or are seeing a crash), 
please investigate why, presumably with GDB.

> When I run the command under gdb, it succeeds, so I had to enable core
> dumps to get the backtrace:

Core dumps won't help us much I'm afraid. Instead, when debugging ./temacs, 
please use the GDB command "set disable-randomization off" before issuing the 
GDB command "run --batch  --load loadup bootstrap".





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le
  2018-10-28 19:22         ` Paul Eggert
@ 2018-10-30  1:28           ` Thomas Fitzsimmons
  2018-10-30  2:16             ` Paul Eggert
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Fitzsimmons @ 2018-10-30  1:28 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 33174

Paul Eggert <eggert@cs.ucla.edu> writes:

> Thomas Fitzsimmons wrote:
>> I'm running the Linux-libre 4.18 kernel built from
>> source with powernv_defconfig
>
> In that case, please also verify that your kernel configuration has
> enabled XATTR_PAX.

Hmm, the XATTR_PAX kernel configuration option does not exist in the
kernel sources I'm using; I haven't applied any hardening or Grsecurity
patches, it's just the upstream Linux-libre 4.18 kernel, built with:

make powernv_defconfig
make

> See, for example,
> <https://wiki.gentoo.org/wiki/Hardened/PaX_Quickstart>. If you have a
> superhardened kernel that refuses to disable ASLR under any
> circumstances, then you'll need to build Emacs with './configure
> CANNOT_DUMP=yes', something I can't recommend for production until we
> get the pdumper branch merged in.

I guess I could experiment with adding the PaX patches to my kernel so
that XATTR_PAX would take effect, but I'm surprised Emacs can't complete
the bootstrap step when the kernel is in a default configuration.

Thomas





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le
  2018-10-30  1:28           ` Thomas Fitzsimmons
@ 2018-10-30  2:16             ` Paul Eggert
  0 siblings, 0 replies; 18+ messages in thread
From: Paul Eggert @ 2018-10-30  2:16 UTC (permalink / raw)
  To: Thomas Fitzsimmons; +Cc: 33174

On 10/29/18 6:28 PM, Thomas Fitzsimmons wrote:
> I guess I could experiment with adding the PaX patches to my kernel so
> that XATTR_PAX would take effect, but I'm surprised Emacs can't complete
> the bootstrap step when the kernel is in a default configuration.

You're right, the XATTR_PAX stuff is likely a red herring. Emacs should 
be able to bootstrap without it by using the reexec trick, and 
bootstrapping does work on Fedora 28 (kernel 4.18.16-200.fc28.x86_64) 
without XATTR_PAX being used.

The main issue now, as I see it, is why Emacs is crashing on re-exec. 
That is, on my system after I run 'make SETFATTR=: bootstrap-emacs' to 
simulate a system without XATTR_PAX, the command

strace -o /tmp/tr ./temacs --batch  --load loadup bootstrap

generates the syscalls:

...
personality(0xffffffff)                 = 0 (PER_LINUX)
personality(PER_LINUX|ADDR_NO_RANDOMIZE) = 0 (PER_LINUX)
personality(0xffffffff)                 = 0x40000 
(PER_LINUX|ADDR_NO_RANDOMIZE)
execve("./temacs", ["./temacs", "--batch", "--load", "loadup", 
"bootstrap"], 0x852ee10 /* 73 vars */) = 0
...

whereas on your system Emacs is crashing instead of issuing the execve. 
I suspect that something like the following may help you debug:

$ cd src
$ make clean
$ make CFLAGS='-g3 -O0' SETFATTR=: temacs
$ gdb temacs
(gdb) set disable-randomization off
(gdb) b personality
(gdb) r --batch  --load loadup bootstrap

and then poke around to see what 'personality' is returning and whether 
'disable_address_randomization' succeeds and how 'execve' is being 
called. I guess on your host, personality (0xffffffff) returns 0, then 
personality (0x40000) returns 0, then personality (0xffffffff) returns 
0x40000, then disable_address_randomization () returns true, but it 
would be helpful to check this. The next thing should be that getenv 
("EMACS_HEAP_EXEC") returns NULL; is that what actually happens? And 
then xputenv ("EMACS_HEAP_EXEC=true") should work without crashing; is 
that right? And is execvp being called and with what arguments? That 
sort of thing.






^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le
  2018-10-29  7:22       ` Paul Eggert
@ 2018-10-30  2:44         ` Thomas Fitzsimmons
  2018-10-30  5:58           ` Paul Eggert
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Fitzsimmons @ 2018-10-30  2:44 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 33174

Paul Eggert <eggert@cs.ucla.edu> writes:

> Thomas Fitzsimmons wrote:
>> Paul Eggert <eggert@cs.ucla.edu> writes:
>>
>>> Wonderful. Yet another reason we need to get the pdumper branch working. Anyway:
>>>
>>> 1. Can you use strace and/or GDB to investigate how ./temacs is
>>> disabling address randomization? In the emacs-26 branch, if you run
>>> this command in src:
>>>
>>> strace -f -o /tmp/tr ./temacs --batch  --load loadup bootstrap
>>>
>>> the output file /tmp/tr should contain something like this:
>>>
>>> 18406 personality(0xffffffff)           = 0 (PER_LINUX)
>>> 18406 personality(PER_LINUX|ADDR_NO_RANDOMIZE) = 0 (PER_LINUX)
>>> 18406 personality(0xffffffff)           = 0x40000 (PER_LINUX|ADDR_NO_RANDOMIZE)
>>
>> I see the above personality calls exactly as you've shown them.
>>
>> strace never gets to the next execve; the crash happens before the next
>> execve is run, see below.
>>
>>> 18406 execve("./temacs", ["./temacs", "--batch", "--load", "loadup",
>>> "bootstrap"], 0xc521b0 /* 80 vars */) = 0
>
> So there are no more system calls after personality(0xffffffff)? That
> is, the crash happens immediately before any other system calls?

There are many more syscalls before the crash.

58215 personality(0xffffffff)           = 0 (PER_LINUX)
58215 personality(PER_LINUX|ADDR_NO_RANDOMIZE) = 0 (PER_LINUX)
58215 personality(0xffffffff)           = 0x40000 (PER_LINUX|ADDR_NO_RANDOMIZE)
58215 brk(NULL)                         = 0x27070000
58215 dup2(0, 0)                        = 0
58215 dup2(1, 1)                        = 1
58215 dup2(2, 2)                        = 2
58215 ugetrlimit(RLIMIT_STACK, {rlim_cur=9792*1024, rlim_max=RLIM64_INFINITY}) = 0
58215 open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
58215 fstat(3, {st_mode=S_IFREG|0644, st_size=1679776, ...}) = 0
58215 mmap(NULL, 1679776, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fffb6ed0000
58215 close(3)                          = 0
58215 open("/usr/lib/powerpc64le-linux-gnu/gconv/gconv-modules.cache", O_RDONLY) = 3
58215 fstat(3, {st_mode=S_IFREG|0644, st_size=26264, ...}) = 0
58215 mmap(NULL, 26264, PROT_READ, MAP_SHARED, 3, 0) = 0x7fffbbd00000
58215 close(3)                          = 0
58215 futex(0x7fffba001ab8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
58215 timerfd_create(CLOCK_REALTIME, TFD_CLOEXEC|TFD_NONBLOCK) = 3
58215 rt_sigaction(SIGALRM, {sa_handler=0x10209280, sa_mask=[ALRM CHLD PROF WINCH], sa_flags=SA_RESTART}, NULL, 8) = 0
58215 open("/dev/urandom", O_RDONLY|O_CLOEXEC) = 4
58215 read(4, "\315\276O\225", 4)       = 4
58215 close(4)                          = 0
[...]

> What does 'strace' say about the crash?

[...]
58215 write(2, "Dumping under the name emacs", 28) = 28
58215 write(2, "\n", 1)                 = 1
58215 stat("/a/b/c/emacs/src/emacs", {st_mode=S_IFREG|0755, st_size=57908416, ...}) = 0
58215 unlink("/a/b/c/emacs/src/emacs") = 0
58215 write(2, "********************************"..., 51) = 51
58215 write(2, "Warning: Your system has a gap b"..., 51) = 51
58215 write(2, "heap (346093672 bytes).  This us"..., 61) = 61
58215 write(2, "or something similar is in effec"..., 49) = 49
58215 write(2, "fail because of this.  See the s"..., 45) = 45
58215 write(2, "exec-shield in etc/PROBLEMS for "..., 50) = 50
58215 write(2, "********************************"..., 51) = 51
58215 write(2, "22440720 of 33554432 static heap"..., 43) = 43
58215 write(2, "\n", 1)                 = 1
58215 open("/a/b/c/emacs/src/temacs", O_RDONLY|O_CLOEXEC) = 5
58215 fstat(5, {st_mode=S_IFREG|0755, st_size=23687896, ...}) = 0
58215 mmap(NULL, 23687896, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fffb3980000
58215 read(5, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\2\0\25\0\1\0\0\0,\207\1\20\0\0\0\0"..., 23687896) = 23687896
58215 open("/a/b/c/emacs/src/emacs", O_RDWR|O_CREAT|O_CLOEXEC, 0777) = 6
58215 ftruncate(6, 403938496)           = 0
58215 mmap(NULL, 403938496, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fff9b840000
58215 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x12670000} ---
58215 +++ killed by SIGSEGV +++

I could attach the whole strace output but it's pretty big.

> For me, the execve is the first syscall after the
> 'personality(0xffffffff) = 0x40000'. If you're seeing some other
> syscall there (or are seeing a crash), please investigate why,
> presumably with GDB.

It seems like it's crashing when trying to memcpy over the BSS area, on
this line in unexelf.c (see below):

  /* Copy over what we have in memory now for the bss area. */
  memcpy (new_base + new_data2_offset, (caddr_t) old_bss_addr,
	  bss_size_growth);

>> When I run the command under gdb, it succeeds, so I had to enable core
>> dumps to get the backtrace:
>
> Core dumps won't help us much I'm afraid. Instead, when debugging
> ./temacs, please use the GDB command "set disable-randomization off"
> before issuing the GDB command "run --batch  --load loadup bootstrap".

(memcpy.S below might not be quite the right version; the Debian
libc6-dbg 2.24 package doesn't seem to have that file, so I pointed gdb
to a copy within a glibc 2.24 git checkout instead.)

[...]
Dumping under the name emacs
**************************************************
Warning: Your system has a gap between BSS and the
heap (652277864 bytes).  This usually means that exec-shield
or something similar is in effect.  The dump may
fail because of this.  See the section about
exec-shield in etc/PROBLEMS for more information.
**************************************************
22440720 of 33554432 static heap bytes used

Program received signal SIGSEGV, Segmentation fault.
__memcpy_power7 () at ../sysdeps/powerpc/powerpc64/power7/memcpy.S:111
warning: Source file is more recent than executable.
111		lxvd2x	8,src,7
(gdb) thread apply all bt

Thread 1 (Thread 0x7fffb0f47be0 (LWP 23958)):
#0  __memcpy_power7 () at ../sysdeps/powerpc/powerpc64/power7/memcpy.S:111
#1  0x00000000101ec178 in memcpy (__len=686434792, __src=<optimized out>, __dest=<optimized out>) at /usr/include/powerpc64le-linux-gnu/bits/string3.h:53
#2  unexec (new_name=0x11b934f8 <bss_sbrk_buffer+22229000> "/a/b/c/emacs/src/emacs", 
    old_name=0x11b93528 <bss_sbrk_buffer+22229048> "/a/b/c/emacs/src/temacs") at unexelf.c:410
#3  0x0000000010110bec in Fdump_emacs (filename=XIL(0x11b945a4), symfile=XIL(0x11b94584)) at emacs.c:2224
#4  0x00000000101a7cb8 in eval_sub (form=...) at eval.c:2244
#5  0x00000000101a8194 in Fprogn (body=XIL(0x107656d3)) at eval.c:459
#6  0x00000000101a7d6c in eval_sub (form=...) at eval.c:2193
#7  0x00000000101ab8bc in Fif (args=...) at eval.c:414
#8  0x00000000101a7d6c in eval_sub (form=...) at eval.c:2193
#9  0x00000000101d8a98 in readevalloop (readcharfun=XIL(0x68d0), infile0=0x7fffffa503a0, sourcename=XIL(0x10721684), printflag=false, unibyte=..., 
    readfun=XIL(0), start=XIL(0), end=XIL(0)) at lread.c:2048
#10 0x00000000101d90bc in Fload (file=XIL(0x10721584), noerror=..., nomessage=XIL(0), nosuffix=..., must_suffix=...) at lread.c:1435
#11 0x00000000101a7c34 in eval_sub (form=...) at eval.c:2255
#12 0x00000000101acd18 in Feval (form=XIL(0x10727f03), lexical=...) at eval.c:2061
#13 0x00000000101188f8 in top_level_2 () at keyboard.c:1119
#14 0x00000000101a6410 in internal_condition_case (bfun=0x101188d0 <top_level_2>, handlers=..., hfun=0x1011f030 <cmd_error>) at eval.c:1336
#15 0x00000000101188a8 in top_level_1 (ignore=...) at keyboard.c:1127
#16 0x00000000101a6364 in internal_catch (tag=..., func=0x10118800 <top_level_1>, arg=XIL(0)) at eval.c:1101
#17 0x000000001011873c in command_loop () at keyboard.c:1088
#18 0x000000001011e89c in recursive_edit_1 () at keyboard.c:695
#19 0x000000001011eeb4 in Frecursive_edit () at keyboard.c:766
#20 0x0000000010017804 in main (argc=<optimized out>, argv=0x7fffffa50d58) at emacs.c:1717

Lisp Backtrace:
"dump-emacs" (0xffa4fee0)
"progn" (0xffa50060)
"if" (0xffa501a0)
"load" (0xffa505a0)

BTW, let me know if you don't think it's useful to debug this further.
I'm OK just disabling randomization when I build Emacs for the time
being and waiting until the portable dumper work lands, but I'm happy to
continue if you think it will lead to a general fix.

Thanks,
Thomas





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le
  2018-10-30  2:44         ` Thomas Fitzsimmons
@ 2018-10-30  5:58           ` Paul Eggert
  2018-10-30  9:30             ` Thomas Fitzsimmons
  0 siblings, 1 reply; 18+ messages in thread
From: Paul Eggert @ 2018-10-30  5:58 UTC (permalink / raw)
  To: Thomas Fitzsimmons; +Cc: 33174

Thomas Fitzsimmons wrote:
> BTW, let me know if you don't think it's useful to debug this further.
> I'm OK just disabling randomization when I build Emacs for the time
> being and waiting until the portable dumper work lands, but I'm happy to
> continue if you think it will lead to a general fix.

It's not clear when the portable dumper will land; it might not ever land, 
unfortunately. So I would like to work on bug#33174 a bit longer, if only so 
that we can put something intelligible into the PROBLEMS file.

> It seems like it's crashing when trying to memcpy over the BSS area, on
> this line in unexelf.c (see below):

By the time the memcpy is run the damage has already been done: the memory 
layout is messed up and we can't fix that simply by passing different arguments 
to memcpy. We have to prevent the memory layout from being messed up in the 
first place by disabling undesirable address space layout randomization and 
doing this very early in execution.

The key question for me is in this set of system calls:

> 58215 personality(0xffffffff)           = 0 (PER_LINUX)
> 58215 personality(PER_LINUX|ADDR_NO_RANDOMIZE) = 0 (PER_LINUX)
> 58215 personality(0xffffffff)           = 0x40000 (PER_LINUX|ADDR_NO_RANDOMIZE)
> 58215 brk(NULL)                         = 0x27070000
> 58215 dup2(0, 0)                        = 0
> 58215 dup2(1, 1)                        = 1
> 58215 dup2(2, 2)                        = 2

Surely the call to disable_address_randomization () must have returned true, but 
can you verify that, either via GDB or (shudder) by inserting print statements?

Also, the call from 'main' to getenv ("EMACS_HEAP_EXEC") must have returned 
NULL. Can you also verify this?

And it appears that 'main' must have called xputenv ("EMACS_HEAP_EXEC=true") and 
execvp (argv[0], argv). But how can this be, since there's no execve syscall? 
This is the heart of the mystery, and we can find out more about it by using GDB 
to put breakpoints on 'personality', 'getenv', 'xputenv' and/or 'execvp' and 
seeing what's going on.  Something like this, perhaps:

$ gdb temacs
(gdb) set disable-randomization off
(gdb) b personality
(gdb) b getenv
(gdb) b xputenv
(gdb) b execvp
(gdb) r --batch  --load loadup bootstrap

and seeing which of these functions get executed in what order, and what they 
return.





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le
  2018-10-30  5:58           ` Paul Eggert
@ 2018-10-30  9:30             ` Thomas Fitzsimmons
  2018-11-12  6:39               ` Paul Eggert
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Fitzsimmons @ 2018-10-30  9:30 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 33174

Paul Eggert <eggert@cs.ucla.edu> writes:

> Thomas Fitzsimmons wrote:
>> BTW, let me know if you don't think it's useful to debug this further.
>> I'm OK just disabling randomization when I build Emacs for the time
>> being and waiting until the portable dumper work lands, but I'm happy to
>> continue if you think it will lead to a general fix.
>
> It's not clear when the portable dumper will land; it might not ever
> land, unfortunately. So I would like to work on bug#33174 a bit
> longer, if only so that we can put something intelligible into the
> PROBLEMS file.

OK.

>> It seems like it's crashing when trying to memcpy over the BSS area, on
>> this line in unexelf.c (see below):
>
> By the time the memcpy is run the damage has already been done: the
> memory layout is messed up and we can't fix that simply by passing
> different arguments to memcpy. We have to prevent the memory layout
> from being messed up in the first place by disabling undesirable
> address space layout randomization and doing this very early in
> execution.

Ah, OK, so the goal is to programmatically do something similar to
echo'ing to randomize_va_space, but just for the temacs process.

> The key question for me is in this set of system calls:
>
>> 58215 personality(0xffffffff)           = 0 (PER_LINUX)
>> 58215 personality(PER_LINUX|ADDR_NO_RANDOMIZE) = 0 (PER_LINUX)
>> 58215 personality(0xffffffff)           = 0x40000 (PER_LINUX|ADDR_NO_RANDOMIZE)
>> 58215 brk(NULL)                         = 0x27070000
>> 58215 dup2(0, 0)                        = 0
>> 58215 dup2(1, 1)                        = 1
>> 58215 dup2(2, 2)                        = 2
>
> Surely the call to disable_address_randomization () must have returned
> true, but can you verify that, either via GDB or (shudder) by
> inserting print statements?

(I sorted out glibc source code and debug symbols so they'll be accurate
now).  Yes, disable_address_randomization returns true:

[...]
(gdb) finish
Run till exit from #0  0x0000000010136d9c in disable_address_randomization () at sysdep.c:165
0x0000000010016c94 in main (argc=<optimized out>, argv=0x7fffd4430178) at emacs.c:710
710	  if (disable_aslr && disable_address_randomization ()
Value returned is $1 = true
[...]

> Also, the call from 'main' to getenv ("EMACS_HEAP_EXEC") must have
> returned NULL. Can you also verify this?

(gdb) c
Continuing.

Breakpoint 4, 0x00007fff9dc1ef98 in __GI_getenv (name=0x10274ce8 "EMACS_HEAP_EXEC") at getenv.c:34
34	{
(gdb) finish
Run till exit from #0  0x00007fff9dc1ef98 in __GI_getenv (name=0x10274ce8 "EMACS_HEAP_EXEC") at getenv.c:34
0x0000000010017870 in main (argc=<optimized out>, argv=0x7ffff4883248) at emacs.c:711
711	      && !getenv ("EMACS_HEAP_EXEC"))
Value returned is $2 = 0x7ffff488fe49 "true"

Actually, EMACS_HEAP_EXEC is true!  If I unset it, then the bootstrap
works with and without "Fix bootstrap infloop in GNU/Linux alpha"
applied.

I'm building Emacs inside Emacs via M-x shell.  "EMACS_HEAP_EXEC=true"
is in process-environment.  Given that I'm also running EXWM, no matter
what build shell I start up, even an xterm, EMACS_HEAP_EXEC is set to
"true" in the environment.

Ah, by running the "outer" Emacs via a serial console (i.e., not from
within Emacs, and starting with EMACS_HEAP_EXEC unset in the
environment), I think I see what happened.  Because of the ifdef just
above the randomization disablement code:

# ifdef __PPC64__
  bool disable_aslr = true;
# else
  bool disable_aslr = dumping;
# endif

randomization is unconditionally disabled on PPC64, and so
EMACS_HEAP_EXEC is unconditionally set to true in the outer build
Emacs's initial-environment.  With "Fix bootstrap infloop in GNU/Linux
alpha" applied, building Emacs within Emacs on PPC64 will no longer work
because the re-exec will be skipped during bootstrap.

Maybe can you try building Emacs within Emacs on one of those CentOS
machines to confirm?

Thanks,
Thomas





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le
  2018-10-30  9:30             ` Thomas Fitzsimmons
@ 2018-11-12  6:39               ` Paul Eggert
  2018-11-12 17:24                 ` Andy Moreton
  2018-11-13  2:29                 ` Thomas Fitzsimmons
  0 siblings, 2 replies; 18+ messages in thread
From: Paul Eggert @ 2018-11-12  6:39 UTC (permalink / raw)
  To: Thomas Fitzsimmons; +Cc: 33174-done

[-- Attachment #1: Type: text/plain, Size: 472 bytes --]

Thomas Fitzsimmons wrote:
> Actually, EMACS_HEAP_EXEC is true!

OK, thanks, that explains things. I installed the first attached patch to the 
emacs-26 branch as a quick hack to work around the problem, and the second 
attached patch to the master branch to come up with a better way to address the 
underlying issue that doesn't involve fiddling with environment variables. 
Please let us know if this doesn't work for your platform, as I've tested this 
only on x86-64.

[-- Attachment #2: 0001-Work-around-dumping-bug-on-GNU-Linux-ppc64le.patch --]
[-- Type: text/x-patch, Size: 1128 bytes --]

From 1d79c2ebd9bd9aa36586e57463502373c0296d11 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 11 Nov 2018 22:34:46 -0800
Subject: [PATCH] Work around dumping bug on GNU/Linux ppc64le

Problem reported by Thomas Fitzsimmons (Bug#33174).
Do not merge to master, as we have a better fix there.
* src/Makefile.in (emacs$(EXEEXT)):
(bootstrap-emacs$(EXEEXT)):
Unset EMACS_HEAP_EXEC before invoking temacs.
---
 src/Makefile.in | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/Makefile.in b/src/Makefile.in
index 6ed8f3cc91..53c18e7ac0 100644
--- a/src/Makefile.in
+++ b/src/Makefile.in
@@ -534,6 +534,7 @@ emacs$(EXEEXT):
 ifeq ($(CANNOT_DUMP),yes)
 	ln -f temacs$(EXEEXT) $@
 else
+	unset EMACS_HEAP_EXEC; \
 	LC_ALL=C $(RUN_TEMACS) -batch $(BUILD_DETAILS) -l loadup dump
   ifneq ($(PAXCTL_dumped),)
 	$(PAXCTL_dumped) $@
@@ -739,6 +740,7 @@ bootstrap-emacs$(EXEEXT):
 ifeq ($(CANNOT_DUMP),yes)
 	ln -f temacs$(EXEEXT) $@
 else
+	unset EMACS_HEAP_EXEC; \
 	$(RUN_TEMACS) --batch $(BUILD_DETAILS) --load loadup bootstrap
   ifneq ($(PAXCTL_dumped),)
 	$(PAXCTL_dumped) emacs$(EXEEXT)
-- 
2.17.1


[-- Attachment #3: 0001-Fix-dumping-on-GNU-Linux-ppc64le.patch --]
[-- Type: text/x-patch, Size: 5574 bytes --]

From c14eab222c5208ec0650292c3771a3ee632fdb0d Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 11 Nov 2018 22:18:47 -0800
Subject: [PATCH] Fix dumping on GNU/Linux ppc64le

Problem reported by Thomas Fitzsimmons (Bug#33174).
* src/emacs.c (main): Adjust to sysdep.c changes.
* src/sysdep.c (exec_personality): New static var.
(disable_address_randomization): Remove, replacing with ...
(maybe_disable_address_randomization): ... this new function.
Do not set or use an environment variable; use a command-line
argument instead, and set the new static var.  Migrate the emacs.c
personality-change code to here, where it belongs.
(emacs_exec_file): Simplify by using new static var.
---
 src/emacs.c  | 23 +----------------
 src/lisp.h   |  7 +++--
 src/sysdep.c | 73 +++++++++++++++++++++++++++++++++-------------------
 3 files changed, 53 insertions(+), 50 deletions(-)

diff --git a/src/emacs.c b/src/emacs.c
index 07df191035..512174d562 100644
--- a/src/emacs.c
+++ b/src/emacs.c
@@ -706,28 +706,7 @@ main (int argc, char **argv)
   dumping = false;
 #endif
 
-  /* True if address randomization interferes with memory allocation.  */
-# ifdef __PPC64__
-  bool disable_aslr = true;
-# else
-  bool disable_aslr = dumping;
-# endif
-
-  if (disable_aslr && disable_address_randomization ()
-      && !getenv ("EMACS_HEAP_EXEC"))
-    {
-      /* Set this so the personality will be reverted before execs
-	 after this one, and to work around an re-exec loop on buggy
-	 kernels (Bug#32083).  */
-      xputenv ("EMACS_HEAP_EXEC=true");
-
-      /* Address randomization was enabled, but is now disabled.
-	 Re-execute Emacs to get a clean slate.  */
-      execvp (argv[0], argv);
-
-      /* If the exec fails, warn and then try anyway.  */
-      perror (argv[0]);
-    }
+  argc = maybe_disable_address_randomization (dumping, argc, argv);
 
 #ifndef CANNOT_DUMP
   might_dump = !initialized;
diff --git a/src/lisp.h b/src/lisp.h
index eb6762678c..383d61274c 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -4325,9 +4325,12 @@ struct tty_display_info;
 
 /* Defined in sysdep.c.  */
 #ifdef HAVE_PERSONALITY_ADDR_NO_RANDOMIZE
-extern bool disable_address_randomization (void);
+extern int maybe_disable_address_randomization (bool, int, char **);
 #else
-INLINE bool disable_address_randomization (void) { return false; }
+INLINE void
+maybe_disable_address_randomization (bool dumping, int argc, char **argv)
+{
+}
 #endif
 extern int emacs_exec_file (char const *, char *const *, char *const *);
 extern void init_standard_fds (void);
diff --git a/src/sysdep.c b/src/sysdep.c
index 7a0c8a8ab8..ddcb594f66 100644
--- a/src/sysdep.c
+++ b/src/sysdep.c
@@ -150,22 +150,52 @@ static const int baud_convert[] =
 #ifdef HAVE_PERSONALITY_ADDR_NO_RANDOMIZE
 # include <sys/personality.h>
 
-/* Disable address randomization in the current process.  Return true
-   if addresses were randomized but this has been disabled, false
-   otherwise. */
-bool
-disable_address_randomization (void)
+/* If not -1, the personality that should be restored before exec.  */
+static int exec_personality;
+
+/* Try to disable randomization if the current process needs it and
+   does not appear to have it already.  */
+int
+maybe_disable_address_randomization (bool dumping, int argc, char **argv)
 {
-  int pers = personality (0xffffffff);
-  if (pers < 0)
-    return false;
-  int desired_pers = pers | ADDR_NO_RANDOMIZE;
+  /* Undocumented Emacs option used only by this function.  */
+  static char const aslr_disabled_option[] = "--__aslr-disabled";
 
-  /* Call 'personality' twice, to detect buggy platforms like WSL
-     where 'personality' always returns 0.  */
-  return (pers != desired_pers
-	  && personality (desired_pers) == pers
-	  && personality (0xffffffff) == desired_pers);
+  if (argc < 2 || strcmp (argv[1], aslr_disabled_option) != 0)
+    {
+      bool disable_aslr = dumping;
+# ifdef __PPC64__
+      disable_aslr = true;
+# endif
+      exec_personality = disable_aslr ? personality (0xffffffff) : -1;
+      if (exec_personality & ADDR_NO_RANDOMIZE)
+	exec_personality = -1;
+      if (exec_personality != -1
+	  && personality (exec_personality | ADDR_NO_RANDOMIZE) != -1)
+	{
+	  char **newargv = malloc ((argc + 2) * sizeof *newargv);
+	  if (newargv)
+	    {
+	      /* Invoke self with undocumented option.  */
+	      newargv[0] = argv[0];
+	      newargv[1] = (char *) aslr_disabled_option;
+	      memcpy (&newargv[2], &argv[1], argc * sizeof *newargv);
+	      execvp (newargv[0], newargv);
+	    }
+
+	  /* If malloc or execvp fails, warn and then try anyway.  */
+	  perror (argv[0]);
+	  free (newargv);
+	}
+    }
+  else
+    {
+      /* Our earlier incarnation already disabled ASLR.  */
+      argc--;
+      memmove (&argv[1], &argv[2], argc * sizeof *argv);
+    }
+
+  return argc;
 }
 #endif
 
@@ -177,21 +207,12 @@ int
 emacs_exec_file (char const *file, char *const *argv, char *const *envp)
 {
 #ifdef HAVE_PERSONALITY_ADDR_NO_RANDOMIZE
-  int pers = getenv ("EMACS_HEAP_EXEC") ? personality (0xffffffff) : -1;
-  bool change_personality = 0 <= pers && pers & ADDR_NO_RANDOMIZE;
-  if (change_personality)
-    personality (pers & ~ADDR_NO_RANDOMIZE);
+  if (exec_personality != -1)
+    personality (exec_personality);
 #endif
 
   execve (file, argv, envp);
-  int err = errno;
-
-#ifdef HAVE_PERSONALITY_ADDR_NO_RANDOMIZE
-  if (change_personality)
-    personality (pers);
-#endif
-
-  return err;
+  return errno;
 }
 
 /* If FD is not already open, arrange for it to be open with FLAGS.  */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le
  2018-11-12  6:39               ` Paul Eggert
@ 2018-11-12 17:24                 ` Andy Moreton
  2018-11-13  2:29                 ` Thomas Fitzsimmons
  1 sibling, 0 replies; 18+ messages in thread
From: Andy Moreton @ 2018-11-12 17:24 UTC (permalink / raw)
  To: 33174

On Sun 11 Nov 2018, Paul Eggert wrote:

> Thomas Fitzsimmons wrote:
>> Actually, EMACS_HEAP_EXEC is true!
>
> OK, thanks, that explains things. I installed the first attached patch to the
> emacs-26 branch as a quick hack to work around the problem, and the second
> attached patch to the master branch to come up with a better way to address
> the underlying issue that doesn't involve fiddling with environment variables.
> Please let us know if this doesn't work for your platform, as I've tested this
> only on x86-64.

The second patch (applied to master) breaks the build on Windows.

> diff --git a/src/lisp.h b/src/lisp.h
> index eb6762678c..383d61274c 100644
> --- a/src/lisp.h
> +++ b/src/lisp.h
> @@ -4325,9 +4325,12 @@ struct tty_display_info;
>  
>  /* Defined in sysdep.c.  */
>  #ifdef HAVE_PERSONALITY_ADDR_NO_RANDOMIZE
> -extern bool disable_address_randomization (void);
> +extern int maybe_disable_address_randomization (bool, int, char **);
>  #else
> -INLINE bool disable_address_randomization (void) { return false; }
> +INLINE void
> +maybe_disable_address_randomization (bool dumping, int argc, char **argv)
> +{
> +}

The return type must be consistent as it is used by its caller:

C:/emacs/git/emacs/master/src/emacs.c: In function 'main':
C:/emacs/git/emacs/master/src/emacs.c:709:8: error: void value not ignored as it ought to be
   argc = maybe_disable_address_randomization (dumping, argc, argv);
        ^
The obvious fix seems to work:

#ifdef HAVE_PERSONALITY_ADDR_NO_RANDOMIZE
extern int maybe_disable_address_randomization (bool, int, char **);
#else
INLINE int
maybe_disable_address_randomization (bool dumping, int argc, char **argv)
{
  return argc;
}
#endif


> +int
> +maybe_disable_address_randomization (bool dumping, int argc, char **argv)
>  {
[snipped]
> +  if (argc < 2 || strcmp (argv[1], aslr_disabled_option) != 0)
> +    {
[snipped]
> +    }
> +  else
> +    {
> +      /* Our earlier incarnation already disabled ASLR.  */
> +      argc--;
> +      memmove (&argv[1], &argv[2], argc * sizeof *argv);
> +    }

This looks wrong: the memmove size is one element too long, and if argc
was 2 then there is an out of bounds access.

    AndyM






^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le
  2018-11-12  6:39               ` Paul Eggert
  2018-11-12 17:24                 ` Andy Moreton
@ 2018-11-13  2:29                 ` Thomas Fitzsimmons
  2018-11-13 14:42                   ` Paul Eggert
  1 sibling, 1 reply; 18+ messages in thread
From: Thomas Fitzsimmons @ 2018-11-13  2:29 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 33174-done

Paul Eggert <eggert@cs.ucla.edu> writes:

> Thomas Fitzsimmons wrote:
>> Actually, EMACS_HEAP_EXEC is true!
>
> OK, thanks, that explains things. I installed the first attached patch
> to the emacs-26 branch as a quick hack to work around the problem,

I confirmed this works around the problem on the emacs-26 branch with
EMACS_HEAP_EXEC set in the initial build environment, and confirmed that
the build still works without it set.

> and the second attached patch to the master branch to come up with a
> better way to address the underlying issue that doesn't involve
> fiddling with environment variables. Please let us know if this
> doesn't work for your platform, as I've tested this only on x86-64.

Likewise this fixes the failure on master on my system with
EMACS_HEAP_EXEC set, and the build still works without it set.

It looks like the second patch still disables ASLR unconditionally on
PPC64.  If you want to suggest experiments I can try on my system, maybe
I can attempt a fix so that ASLR can be disabled only during dumping on
PPC64 like on other archiectures.  Or do you know that's an
impossibility for some reason?

Thanks,
Thomas





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le
  2018-11-13  2:29                 ` Thomas Fitzsimmons
@ 2018-11-13 14:42                   ` Paul Eggert
  2018-11-19  3:14                     ` Thomas Fitzsimmons
  0 siblings, 1 reply; 18+ messages in thread
From: Paul Eggert @ 2018-11-13 14:42 UTC (permalink / raw)
  To: Thomas Fitzsimmons; +Cc: 33174-done

[-- Attachment #1: Type: text/plain, Size: 595 bytes --]

Thomas Fitzsimmons wrote:
> It looks like the second patch still disables ASLR unconditionally on
> PPC64.  If you want to suggest experiments I can try on my system, maybe
> I can attempt a fix so that ASLR can be disabled only during dumping on
> PPC64 like on other archiectures.  Or do you know that's an
> impossibility for some reason?

I don't recall the reason that ppc64 is different, but you can experiment by 
applying the attached patch. To help with the detective work you might look at 
the thread containing this email:

https://sourceware.org/ml/libc-alpha/2016-07/msg00538.html

[-- Attachment #2: ppc.diff --]
[-- Type: text/x-patch, Size: 518 bytes --]

diff --git a/src/sysdep.c b/src/sysdep.c
index ddcb594f66..71aa63a1c0 100644
--- a/src/sysdep.c
+++ b/src/sysdep.c
@@ -164,9 +164,6 @@ maybe_disable_address_randomization (bool dumping, int argc, char **argv)
   if (argc < 2 || strcmp (argv[1], aslr_disabled_option) != 0)
     {
       bool disable_aslr = dumping;
-# ifdef __PPC64__
-      disable_aslr = true;
-# endif
       exec_personality = disable_aslr ? personality (0xffffffff) : -1;
       if (exec_personality & ADDR_NO_RANDOMIZE)
 	exec_personality = -1;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le
  2018-11-13 14:42                   ` Paul Eggert
@ 2018-11-19  3:14                     ` Thomas Fitzsimmons
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Fitzsimmons @ 2018-11-19  3:14 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 33174-done

[-- Attachment #1: Type: text/plain, Size: 1916 bytes --]

Paul Eggert <eggert@cs.ucla.edu> writes:

> Thomas Fitzsimmons wrote:
>> It looks like the second patch still disables ASLR unconditionally on
>> PPC64.  If you want to suggest experiments I can try on my system, maybe
>> I can attempt a fix so that ASLR can be disabled only during dumping on
>> PPC64 like on other archiectures.  Or do you know that's an
>> impossibility for some reason?
>
> I don't recall the reason that ppc64 is different, but you can
> experiment by applying the attached patch. To help with the detective
> work you might look at the thread containing this email:
>
> https://sourceware.org/ml/libc-alpha/2016-07/msg00538.html

I read through that thread, then tried the patch.  I was surprised that
everything seems to work:

   - Bootstrap works within a reasonable amount of RAM (limited with
     ulimit -d to make sure it didn't spike up to 12 GB).

   - The Emacs process (run as ./src/emacs from the build directory):

      - does not use 12 GB of RAM (i.e., RAM usage is reasonable after
        opening a C file, src/xdisp.c).

      - has a /proc/<pid>/personality value of 00000000.

      - runs subprocesses with a /proc/self/personality value of
        00000000.

My system has glibc 2.24.  src/config.h has "#define HYBRID_MALLOC 1".
Hybrid malloc was proposed as a proper fix during that discussion, but
wasn't enabled by default at the time.  Given that I configured with
just "./configure --with-x-toolkit=lucid", it seems that hybrid malloc
is enabled by default now.  I'm not sure if this would now work on older
glibc versions as well, but in case it doesn't, what about something
like the attached for master?

Are there other tests I should try, to reveal problems?  If the patch
looks OK, I should probably also test it on Fedora as part of a local
Rawhide Emacs package build, since lots of the discussion was about
getting that working reliably.

Thanks,
Thomas


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Enable-ASLR-on-PPC64-glibc-versions-2.24.patch --]
[-- Type: text/x-diff, Size: 977 bytes --]

From d69509b909bc35b0908e40a9be741e9ef38b3402 Mon Sep 17 00:00:00 2001
From: Thomas Fitzsimmons <fitzsim@fitzsim.org>
Date: Sun, 18 Nov 2018 11:07:49 -0500
Subject: [PATCH] Enable ASLR on PPC64 when glibc version is >= 2.24

* src/sysdep.c (maybe_disable_address_randomization): Disable
randomization on PPC64 only when glibc version is less than 2.24.
---
 src/sysdep.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/sysdep.c b/src/sysdep.c
index ddcb594f66..1a863e65ab 100644
--- a/src/sysdep.c
+++ b/src/sysdep.c
@@ -164,7 +164,8 @@ maybe_disable_address_randomization (bool dumping, int argc, char **argv)
   if (argc < 2 || strcmp (argv[1], aslr_disabled_option) != 0)
     {
       bool disable_aslr = dumping;
-# ifdef __PPC64__
+# if defined(__PPC64__) \
+  && (__GLIBC__ < 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ < 24))
       disable_aslr = true;
 # endif
       exec_personality = disable_aslr ? personality (0xffffffff) : -1;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-11-19  3:14 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-27 10:35 bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le Thomas Fitzsimmons
2018-10-27 13:16 ` Thomas Fitzsimmons
2018-10-27 20:26   ` Paul Eggert
2018-10-28  8:05     ` Paul Eggert
2018-10-28 12:56       ` Thomas Fitzsimmons
2018-10-28 19:22         ` Paul Eggert
2018-10-30  1:28           ` Thomas Fitzsimmons
2018-10-30  2:16             ` Paul Eggert
2018-10-29  1:47     ` Thomas Fitzsimmons
2018-10-29  7:22       ` Paul Eggert
2018-10-30  2:44         ` Thomas Fitzsimmons
2018-10-30  5:58           ` Paul Eggert
2018-10-30  9:30             ` Thomas Fitzsimmons
2018-11-12  6:39               ` Paul Eggert
2018-11-12 17:24                 ` Andy Moreton
2018-11-13  2:29                 ` Thomas Fitzsimmons
2018-11-13 14:42                   ` Paul Eggert
2018-11-19  3:14                     ` Thomas Fitzsimmons

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).