On Sun, Sep 30, 2018 at 10:14 AM Eli Zaretskii wrote: > Then please step with a debugger through the code starting from the > call to getrlimit, and please show the values of related variables, > such as newlim, all the way until the call to setrlimit and the > computed value of emacs_re_safe_alloca. Please do that once with the > current code and then once again with the code before the offending > commit. I'd like to see the differences, because I meanwhile see > nothing wrong with using rlim_t here. One change from my past reports: after compiling Emacs with -g flags, I have now managed to reproduce the crash under lldb, including attaching to the forked process which eats CPU after the crash. Backtrace from that process is attached. Here are my results from stepping through the code. Note this all runs at Emacs startup, long before anything forks. The highlights (as far as I noticed) are: - emacs_re_max_failures and the older re_max_failures are not initialized at this point - in the working branch, newlim is reset to rlim.rlim_max; in the broken branch, it is not - in the working branch, setrlimit does not get called; in the broken branch, it does I'm guessing the problem is with the uninitialized values for *_re_max_failures and the resulting values being assigned to lim and newlim. It seems to only work on the working branch by accident because, for whatever reason, newlim always gets reset to rlim.rlim_max and setrlimit doesn't get called. ----- master branch (commit 3eedabaef37e), use of rlim_t: - immediately after getrlimit call, lim is assigned, value: 0 - lim is then assigned rlim.rlim_cur, value: 67104768 - min_ratio is initialized, value: 160 - ratio is initialized, value: 213 - try_to_grow_stack ends up assigned, value: true The code proceeds into the try_to_grow_stack condition: - newlim is assigned, value: 10020000 - BUT: emacs_re_max_failures defined at that point and used to calculate newlim has a very large size_t value: 6500256977556508423 - looks like newlim has overflown here to fit unsigned long long - pagesize is assigned, value 4096 - newlim is decremented, value: 10024095 - condition checking if rlim.rlim_max < newlim; rlim.rlim_max is 67104768 so the condition evaluates to false (emacs.c:880) - condition checking if pagesize <= (newlim - lim) evaluates to true: this happens because (newlim < rlim), and the subtraction causes an overflow (newlim - lim returns an unsigned long long with value 18446744073652469760); consequently, setrlimit is called and succeeds The try_to_grow_stack condition ends. - emacs_re_safe_alloca is assigned, value: 4435280473597425792. I'm not sure if that's a reasonable value for a value of type ptrdiff_t. ----- ----- last working revision (commit 6cdd1c333034b), use of long: Please note that this code predates the introduction of emacs_re_safe_alloca. - immediately after getrlimit call, lim is assigned, value: 0 - lim then is assigned rlim.rlim_curr, value: 67104768 - ratio is then initialized: 160 - and subsequently incremented, value: 213 - try_to_grow_stack ends up assigned, value: true The code proceeds into the try_to_grow_stack condition: - newlim is assigned, value: 67104578 - BUT: re_max_failures defined at that point and used to calculate newlim has a very large size_t value: 16107485546189635934 - newlim has obviously overflown here to fit a signed long - pagesize is assigned, value 4096 - newlim is decremented, value: 67108673 - condition checking if rlim.rlim_max < newlim; rlim.rlim_max is 67104768 so the condition evaluates to true and newlim is set to rlim.rlim_max (emacs.c:862) - newlim decrement by newlim % pagesize is a noop - condition checking if pagesize <= (newlim - lim) evaluates to false, skipping the setrlimit call ----- I am attaching lldb session transcripts for both runs in case you want to look more closely at what's going on.