unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#56108: 29.0.50; ASAN use-after-free in re_match_2_internal
@ 2022-06-20 14:07 Gerd Möllmann
  2022-06-20 19:09 ` Eli Zaretskii
  0 siblings, 1 reply; 22+ messages in thread
From: Gerd Möllmann @ 2022-06-20 14:07 UTC (permalink / raw)
  To: 56108

FWIW, here is another non-reproducible crash with ASAN.

In short, shrink_regexp_cache realloc'd something leading to a malloc +
free, and something is still holding a pointer the old memory.  Or so it
looks to me.

=22069==ERROR: AddressSanitizer: heap-use-after-free on address 0x000105b493a5 at pc 0x00010057549c bp 0x00016fde0b90 sp 0x00016fde0b88
READ of size 1 at 0x000105b493a5 thread T0
    #0 0x100575498 in re_match_2_internal regex-emacs.c:5021
    #1 0x100568c38 in rpl_re_search_2 regex-emacs.c:3382
    #2 0x1005678c4 in rpl_re_search regex-emacs.c:3176
    #3 0x10054cc68 in fast_string_match_internal search.c:489
    #4 0x1004f20b0 in fast_string_match lisp.h:4747
    #5 0x1004f1b28 in Ffind_file_name_handler fileio.c:324
    #6 0x1004f82d4 in Fexpand_file_name fileio.c:1018
    #7 0x1006ddc50 in openp lread.c:1849
    #8 0x1006dae98 in Fload lread.c:1312
    #9 0x1006e3c64 in save_match_data_load lread.c:1641
    #10 0x1006408d0 in load_with_autoload_queue eval.c:2245
    #11 0x100677534 in Frequire fns.c:3146

0x000105b493a5 is located 293 bytes inside of 558-byte region [0x000105b49280,0x000105b494ae)
freed by thread T0 here:
    #0 0x1031c7ddc in wrap_realloc+0x9c (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x3fddc)
    #1 0x100598388 in lrealloc alloc.c:1376
    #2 0x1005982c4 in xrealloc alloc.c:790
    #3 0x10054a490 in shrink_regexp_cache search.c:150
    #4 0x1005aaeb0 in garbage_collect alloc.c:6172
    #5 0x1005aa6cc in maybe_garbage_collect alloc.c:6088
    #6 0x1006416c0 in maybe_gc lisp.h:5548
    #7 0x10063a99c in Ffuncall eval.c:2948
    #8 0x10064a144 in funcall_nil eval.c:2635
    #9 0x10064a0b4 in run_hook_with_args eval.c:2812
    #10 0x100649b84 in Frun_hook_with_args eval.c:2677
    #11 0x100649ad0 in run_hook eval.c:2825
    #12 0x1004da650 in signal_before_change insdel.c:2155
    #13 0x1004d9c40 in prepare_to_modify_buffer_1 insdel.c:2009
    #14 0x1004c810c in prepare_to_modify_buffer insdel.c:2020
    #15 0x1005081ec in Finsert_file_contents fileio.c:4601
    #16 0x10064b758 in funcall_subr eval.c:2999
    #17 0x10072fa40 in exec_byte_code bytecode.c:809
    #18 0x10065361c in fetch_and_exec_byte_code eval.c:3040
    #19 0x10064c344 in funcall_lambda eval.c:3112
    #20 0x10064ac18 in funcall_general eval.c:2903
    #21 0x10063aa70 in Ffuncall eval.c:2953
    #22 0x100643c0c in Fapply eval.c:2577
    #23 0x10064bde0 in funcall_subr eval.c:3018
    #24 0x10072fa40 in exec_byte_code bytecode.c:809
    #25 0x10065361c in fetch_and_exec_byte_code eval.c:3040
    #26 0x10064c344 in funcall_lambda eval.c:3112
    #27 0x1006437c0 in apply_lambda eval.c:3062
    #28 0x100633734 in eval_sub eval.c:2503
    #29 0x100640ef8 in Feval eval.c:2314

previously allocated by thread T0 here:
    #0 0x1031c7ddc in wrap_realloc+0x9c (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x3fddc)
    #1 0x100598388 in lrealloc alloc.c:1376
    #2 0x1005982c4 in xrealloc alloc.c:790
    #3 0x10054a490 in shrink_regexp_cache search.c:150






^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#56108: 29.0.50; ASAN use-after-free in re_match_2_internal
  2022-06-20 14:07 bug#56108: 29.0.50; ASAN use-after-free in re_match_2_internal Gerd Möllmann
@ 2022-06-20 19:09 ` Eli Zaretskii
  2022-06-22  8:13   ` Gerd Möllmann
  0 siblings, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2022-06-20 19:09 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: 56108

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Date: Mon, 20 Jun 2022 16:07:55 +0200
> 
> FWIW, here is another non-reproducible crash with ASAN.
> 
> In short, shrink_regexp_cache realloc'd something leading to a malloc +
> free, and something is still holding a pointer the old memory.  Or so it
> looks to me.

I don't understand why some callers of compile_pattern mark the cache
entry as busy, but some others don't.  If a cache entry that is in use
is not marked as busy, then any GC can decide to shrink the cache by
freeing that entry.





^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#56108: 29.0.50; ASAN use-after-free in re_match_2_internal
  2022-06-20 19:09 ` Eli Zaretskii
@ 2022-06-22  8:13   ` Gerd Möllmann
  2022-06-22 13:38     ` Eli Zaretskii
  0 siblings, 1 reply; 22+ messages in thread
From: Gerd Möllmann @ 2022-06-22  8:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 56108

[-- Attachment #1: Type: text/plain, Size: 445 bytes --]

On 20. Jun 2022, 21:10 +0200, Eli Zaretskii <eliz@gnu.org>, wrote:
> > I don't understand why some callers of compile_pattern mark the cache
> > entry as busy, but some others don't. If a cache entry that is in use
> > is not marked as busy, then any GC can decide to shrink the cache by
> > freeing that entry.
	struct re_pattern_buffer *bufp;
	...
	bufp = &compile_pattern (regexp,
	...

The address operator is there to confuse the Russians.

[-- Attachment #2: Type: text/html, Size: 1151 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#56108: 29.0.50; ASAN use-after-free in re_match_2_internal
  2022-06-22  8:13   ` Gerd Möllmann
@ 2022-06-22 13:38     ` Eli Zaretskii
  2022-06-22 14:10       ` Gerd Möllmann
  0 siblings, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2022-06-22 13:38 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: 56108

> Date: Wed, 22 Jun 2022 10:13:08 +0200
> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: 56108@debbugs.gnu.org
> 
> On 20. Jun 2022, 21:10 +0200, Eli Zaretskii <eliz@gnu.org>, wrote:
> 
>  I don't understand why some callers of compile_pattern mark the cache
>  entry as busy, but some others don't. If a cache entry that is in use
>  is not marked as busy, then any GC can decide to shrink the cache by
>  freeing that entry.
> 
> struct re_pattern_buffer *bufp;
> ...
> bufp = &compile_pattern (regexp,
> ...
> 
> The address operator is there to confuse the Russians.

Hmm... did you mean by that to explain why some callers of
compile_pattern don't mark the new cache entry as "busy"?  Because if
so, I guess I'm one of the "confused Russians", as I don't understand
the explanation.  Please elaborate.

Or maybe _I_ should elaborate.  By "marking an entry busy" I meant the
call to freeze_pattern, not a call to freeze_buffer_relocation (the
latter is mostly a no-op nowadays, as almost all the supported
platforms don't use ralloc.c).  So it isn't the C pointer we keep
around to compile_pattern's result that bothered me, it's the fact
that the pattern cache entry created by compile_pattern is not
protected from being freed by shrink_regexp_cache that is called by
GC.  AFAIU, that entry must be protected for the whole time the
compiled pattern is in use by re_match_2 or any of its callers.

Does the above make sense?





^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#56108: 29.0.50; ASAN use-after-free in re_match_2_internal
  2022-06-22 13:38     ` Eli Zaretskii
@ 2022-06-22 14:10       ` Gerd Möllmann
  2022-06-22 14:24         ` Eli Zaretskii
  0 siblings, 1 reply; 22+ messages in thread
From: Gerd Möllmann @ 2022-06-22 14:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 56108

[-- Attachment #1: Type: text/plain, Size: 2330 bytes --]

On 22. Jun 2022, 15:39 +0200, Eli Zaretskii <eliz@gnu.org>, wrote:
> > Date: Wed, 22 Jun 2022 10:13:08 +0200
> > From: Gerd Möllmann <gerd.moellmann@gmail.com>
> > Cc: 56108@debbugs.gnu.org
> >
> > On 20. Jun 2022, 21:10 +0200, Eli Zaretskii <eliz@gnu.org>, wrote:
> >
> > I don't understand why some callers of compile_pattern mark the cache
> > entry as busy, but some others don't. If a cache entry that is in use
> > is not marked as busy, then any GC can decide to shrink the cache by
> > freeing that entry.
> >
> > struct re_pattern_buffer *bufp;
> > ...
> > bufp = &compile_pattern (regexp,
> > ...
> >
> > The address operator is there to confuse the Russians.
> >
> > Hmm... did you mean by that to explain why some callers of
> > compile_pattern don't mark the new cache entry as "busy"? Because if
> > so, I guess I'm one of the "confused Russians", as I don't understand
> > the explanation. Please elaborate.
> >
Sorry, looking at this again, I'm now also completely confused.

I see, all in search.c:

	static struct regexp_cache *
	compile_pattern (Lisp_Object pattern, struct re_registers *regp,

and then, later

	struct re_pattern_buffer *bufp;
	bufp = &compile_pattern

How the heck does this compile?
> >
> > Or maybe _I_ should elaborate. By "marking an entry busy" I meant the
> > call to freeze_pattern,
Yes, I've seen that.
> > not a call to freeze_buffer_relocation (the
> > latter is mostly a no-op nowadays, as almost all the supported
> > platforms don't use ralloc.c). So it isn't the C pointer we keep
> > around to compile_pattern's result that bothered me, it's the fact
> > that the pattern cache entry created by compile_pattern is not
> > protected from being freed by shrink_regexp_cache that is called by
> > GC. AFAIU, that entry must be protected for the whole time the
> > compiled pattern is in use by re_match_2 or any of its callers.
> >
> > Does the above make sense?

Yes, it's the same I see.

Functions fast_string_match_internal* don't freeze in the sense you explained.  What I don't see so far is what could lead to a GC in these cases, between the compile_pattern and the use of its result...

Did you find other places where there's no freeze?
Can Emacs GC while handling a signal?
Does Emacs use threads nowadays?

[-- Attachment #2: Type: text/html, Size: 3913 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#56108: 29.0.50; ASAN use-after-free in re_match_2_internal
  2022-06-22 14:10       ` Gerd Möllmann
@ 2022-06-22 14:24         ` Eli Zaretskii
  2022-06-22 15:11           ` Gerd Möllmann
  0 siblings, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2022-06-22 14:24 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: 56108

> Date: Wed, 22 Jun 2022 16:10:23 +0200
> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: 56108@debbugs.gnu.org
> 
> Functions fast_string_match_internal* don't freeze in the sense you explained.  What I don't see so far is
> what could lead to a GC in these cases, between the compile_pattern and the use of its result...

I don't know if something inside re_match_2_internal can call
something that would trigger GC.  There's too much stuff going on
there, what with syntax tables and whatnot.

> Did you find other places where there's no freeze?

string_match_1, I think.





^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#56108: 29.0.50; ASAN use-after-free in re_match_2_internal
  2022-06-22 14:24         ` Eli Zaretskii
@ 2022-06-22 15:11           ` Gerd Möllmann
  2022-06-22 16:19             ` Eli Zaretskii
  0 siblings, 1 reply; 22+ messages in thread
From: Gerd Möllmann @ 2022-06-22 15:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 56108

[-- Attachment #1: Type: text/plain, Size: 555 bytes --]



Maybe I have something.  Could you please check?

Please read the following list from the bottom up, i.e. re_match... calls maybe_quit etc.

maybe_gc
Ffuncall
call2
signal_or_quit (eval.c:1741)
quit (eval.c:1697)
process_quit_flag (eval.c:1657)
probably_quit (eval.c:1864)
maybe_quit (lisp.h:3681)
re_match_2_internal (regexp-emacs.c:4691)

If this is true a GC can be triggered under very specific circumstances involving edebug, if the comment in signal_or_quit is right.

And I might have used edebug, I'm not 100% sure anymore.

[-- Attachment #2: Type: text/html, Size: 858 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#56108: 29.0.50; ASAN use-after-free in re_match_2_internal
  2022-06-22 15:11           ` Gerd Möllmann
@ 2022-06-22 16:19             ` Eli Zaretskii
  2022-06-23  5:53               ` Gerd Möllmann
  0 siblings, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2022-06-22 16:19 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: 56108

> Date: Wed, 22 Jun 2022 17:11:55 +0200
> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: 56108@debbugs.gnu.org
> 
> Maybe I have something.  Could you please check?
> 
> Please read the following list from the bottom up, i.e. re_match... calls maybe_quit etc.
> 
> maybe_gc
> Ffuncall
> call2 
> signal_or_quit (eval.c:1741)
> quit (eval.c:1697)
> process_quit_flag (eval.c:1657)
> probably_quit (eval.c:1864)
> maybe_quit (lisp.h:3681)
> re_match_2_internal (regexp-emacs.c:4691)
> 
> If this is true a GC can be triggered under very specific circumstances involving edebug, if the comment in
> signal_or_quit is right.  
> 
> And I might have used edebug, I'm not 100% sure anymore.

Sounds plausible.  signal-hook-function should be non-nil to trigger
the call2 call inside signal_or_quit.  In addition to Edebug, Tramp
also sets that.

So yes, it could happen, with some "luck".

I think the next step is to add the missing freeze_pattern calls and
see if that fixes the problem?





^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#56108: 29.0.50; ASAN use-after-free in re_match_2_internal
  2022-06-22 16:19             ` Eli Zaretskii
@ 2022-06-23  5:53               ` Gerd Möllmann
  2022-06-23  6:57                 ` Eli Zaretskii
  0 siblings, 1 reply; 22+ messages in thread
From: Gerd Möllmann @ 2022-06-23  5:53 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 56108

[-- Attachment #1: Type: text/plain, Size: 654 bytes --]

On 22. Jun 2022, 18:20 +0200, Eli Zaretskii <eliz@gnu.org>, wrote:
> >
> > I think the next step is to add the missing freeze_pattern calls and
> > see if that fixes the problem?
I think the missing freezes are 100% a bug, and they should be fixed.

Do you want to do that or should I?

(BTW, I just now noticed the "->buf" at the end of the "bufp = &compile_pattern (regexp,...)" that I complained about.   That explains it.  Nice :-/.)

Another side question, if I may: Have you perhaps heard of someone producing a static call graph for Emacs, or better yet, specific functions in Emacs?  Maybe using objdump -D or something similar?

[-- Attachment #2: Type: text/html, Size: 1245 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#56108: 29.0.50; ASAN use-after-free in re_match_2_internal
  2022-06-23  5:53               ` Gerd Möllmann
@ 2022-06-23  6:57                 ` Eli Zaretskii
  2022-06-23  7:17                   ` Eli Zaretskii
  2022-06-23  8:24                   ` Gerd Möllmann
  0 siblings, 2 replies; 22+ messages in thread
From: Eli Zaretskii @ 2022-06-23  6:57 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: 56108

> Date: Thu, 23 Jun 2022 07:53:29 +0200
> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: 56108@debbugs.gnu.org
> 
> On 22. Jun 2022, 18:20 +0200, Eli Zaretskii <eliz@gnu.org>, wrote:
> 
>  I think the next step is to add the missing freeze_pattern calls and
>  see if that fixes the problem?
> 
> I think the missing freezes are 100% a bug, and they should be fixed.

I agree.

> Do you want to do that or should I? 

Feel free to do it, I generally prefer that people who see the problem
and could at least potentially test the solution also make the change
to fix it.

> Another side question, if I may: Have you perhaps heard of someone producing a static call graph for
> Emacs, or better yet, specific functions in Emacs?  Maybe using objdump -D or something similar? 

Does this make sense in a dynamic program such as Emacs?  We call into
Lisp quite a lot from C, and from there you can arrive anywhere, no?
And objdump cannot capture Lisp levels.

That is, btw, the main problem with maintaining Emacs internals
nowadays: it is hard, almost impossible, to know, just by looking at C
code, whether GC or any other Lisp-related activity could happen
between two arbitrary lines of C.  We have more and more hooks called
from C that could potentially call any Lisp, and we have more and more
direct calls into Lisp from the most intimate parts of Emacs, like the
display engine and the main loop in keyboard.c.  This basically makes
any analysis of whether or not some code fragment could cause GC
futile: even if today it's impossible, it can easily become possible
tomorrow, with some innocent-looking change.  This is exacerbated by
the fact that GCPROs are long gone, so the caution we used to
exercised 20 years ago to make sure GC doesn't surprise us is no
longer needed nor practiced.

But no, I don't think anyone tried to see what kind of graph could be
obtained.  Maybe it's worthwhile, who knows? we might learn something
useful regardless.





^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#56108: 29.0.50; ASAN use-after-free in re_match_2_internal
  2022-06-23  6:57                 ` Eli Zaretskii
@ 2022-06-23  7:17                   ` Eli Zaretskii
  2022-06-23 21:29                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-06-23  8:24                   ` Gerd Möllmann
  1 sibling, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2022-06-23  7:17 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: gerd.moellmann, 56108

> Cc: 56108@debbugs.gnu.org
> Date: Thu, 23 Jun 2022 09:57:53 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> 
> > Do you want to do that or should I? 
> 
> Feel free to do it, I generally prefer that people who see the problem
> and could at least potentially test the solution also make the change
> to fix it.

Actually, let's first bring Stefan on board of this discussion.

Stefan, do you happen to know why some of the callers of
compile_pattern don't call freeze_pattern to protect the new cache
entry?  Is it just an omission or do we miss something here?





^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#56108: 29.0.50; ASAN use-after-free in re_match_2_internal
  2022-06-23  6:57                 ` Eli Zaretskii
  2022-06-23  7:17                   ` Eli Zaretskii
@ 2022-06-23  8:24                   ` Gerd Möllmann
  2022-06-23  8:37                     ` Eli Zaretskii
  1 sibling, 1 reply; 22+ messages in thread
From: Gerd Möllmann @ 2022-06-23  8:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 56108

[-- Attachment #1: Type: text/plain, Size: 2184 bytes --]

On 23. Jun 2022, 08:58 +0200, Eli Zaretskii <eliz@gnu.org>, wrote:
> > Do you want to do that or should I?
> >
> > Feel free to do it, I generally prefer that people who see the problem
> > and could at least potentially test the solution also make the change
> > to fix it.
> >
Ok
> >
> > Another side question, if I may: Have you perhaps heard of someone producing a static call graph for
> > Emacs, or better yet, specific functions in Emacs? Maybe using objdump -D or something similar?
> >
> > Does this make sense in a dynamic program such as Emacs? We call into
> > Lisp quite a lot from C, and from there you can arrive anywhere, no?
> > And objdump cannot capture Lisp levels.
True, but for GC at least, I think it would make it easier to tell if it can potentially happen. One would see a call to GC in the static call graph. Not for arbitrary lines, of course, you know what I mean...
> >
> > That is, btw, the main problem with maintaining Emacs internals
> > nowadays: it is hard, almost impossible, to know, just by looking at C
> > code, whether GC or any other Lisp-related activity could happen
> > between two arbitrary lines of C. We have more and more hooks called
> > from C that could potentially call any Lisp, and we have more and more
> > direct calls into Lisp from the most intimate parts of Emacs, like the
> > display engine and the main loop in keyboard.c. This basically makes
> > any analysis of whether or not some code fragment could cause GC
> > futile: even if today it's impossible, it can easily become possible
> > tomorrow, with some innocent-looking change. This is exacerbated by
> > the fact that GCPROs are long gone, so the caution we used to
> > exercised 20 years ago to make sure GC doesn't surprise us is no
> > longer needed nor practiced.
> >
All true, I just want to remark that I have no fond memories of GCPRO, and of debugging stuff caused by missing ones.   Glad to hear they're finally completely dead now.
> >
> > But no, I don't think anyone tried to see what kind of graph could be
> > obtained. Maybe it's worthwhile, who knows? we might learn something
> > useful regardless.
Thanks

[-- Attachment #2: Type: text/html, Size: 3788 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#56108: 29.0.50; ASAN use-after-free in re_match_2_internal
  2022-06-23  8:24                   ` Gerd Möllmann
@ 2022-06-23  8:37                     ` Eli Zaretskii
  2022-06-23  8:49                       ` Gerd Möllmann
  0 siblings, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2022-06-23  8:37 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: 56108

> Date: Thu, 23 Jun 2022 10:24:31 +0200
> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: 56108@debbugs.gnu.org
> 
>  Another side question, if I may: Have you perhaps heard of someone producing a static call graph for
>  Emacs, or better yet, specific functions in Emacs? Maybe using objdump -D or something
>  similar?
> 
>  Does this make sense in a dynamic program such as Emacs? We call into
>  Lisp quite a lot from C, and from there you can arrive anywhere, no?
>  And objdump cannot capture Lisp levels.
> 
> True, but for GC at least, I think it would make it easier to tell if it can potentially happen. One would see a
> call to GC in the static call graph. Not for arbitrary lines, of course, you know what I mean...

Fair enough.  But for that purpose, we need to consider each call into
Lisp, either directly or via a hook, as potentially triggering GC.

Moreover, if some code can signal an error or throw to a higher level,
that could cause GC via the handlers installed by the various
unwind-protect forms.  So signaling/throwing are also GC triggers, at
least in some situations, and I'm not sure how relevant that is to
what you had in mind.

(People also tend to forget that GC doesn't only deletes "garbage"
objects, it also has other potentially "surprising" effects: it can
compact strings, relocate string data and buffer text, shrink regexp
pattern cache and font caches, etc.)





^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#56108: 29.0.50; ASAN use-after-free in re_match_2_internal
  2022-06-23  8:37                     ` Eli Zaretskii
@ 2022-06-23  8:49                       ` Gerd Möllmann
  0 siblings, 0 replies; 22+ messages in thread
From: Gerd Möllmann @ 2022-06-23  8:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 56108

[-- Attachment #1: Type: text/plain, Size: 1182 bytes --]

On 23. Jun 2022, 10:38 +0200, Eli Zaretskii <eliz@gnu.org>, wrote:
> > Fair enough. But for that purpose, we need to consider each call into
> > Lisp, either directly or via a hook, as potentially triggering GC.
> >
True.
> >
> > Moreover, if some code can signal an error or throw to a higher level,
> > that could cause GC via the handlers installed by the various
> > unwind-protect forms. So signaling/throwing are also GC triggers, at
> > least in some situations, and I'm not sure how relevant that is to
> > what you had in mind.
> >
Also true.

I don't have something specific in mind, but I might give it a spin, partly because I tend to forget which things can call Lisp (like maybe_quit), partly because it was so boring to follow the calls in this bug, and partly because I can, or could  ;-).
> >
> > (People also tend to forget that GC doesn't only deletes "garbage"
> > objects, it also has other potentially "surprising" effects: it can
> > compact strings, relocate string data and buffer text, shrink regexp
> > pattern cache and font caches, etc.)
Yeah.   ISTR some fun after I changed the Lisp string implementation for conservative GC.

[-- Attachment #2: Type: text/html, Size: 2439 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#56108: 29.0.50; ASAN use-after-free in re_match_2_internal
  2022-06-23  7:17                   ` Eli Zaretskii
@ 2022-06-23 21:29                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-06-24  5:55                       ` Eli Zaretskii
  0 siblings, 1 reply; 22+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-06-23 21:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, 56108

> Stefan, do you happen to know why some of the callers of
> compile_pattern don't call freeze_pattern to protect the new cache
> entry?  Is it just an omission or do we miss something here?

Before `freeze_pattern`, the design was that nothing could happen while
running the regexp matcher (no GC, no execution of Lisp code).

Commit 938d252d1c6c5e2027aa250c649deb024154f936 changed that so that
searching inside a *buffer* could end up running ELisp code (and hence
also GC).  AFAIK this still can't happen when searching in strings.
[ IIRC The need to run ELisp is so as to apply `syntax-table` text
  properties on demand via `syntax-propertize`.  ]

So I think freeze_pattern should be used in all cases where
`compile_pattern` is used to search inside a buffer, but it shouldn't be
necessary when searching within a string.

At least, that's my recollection.


        Stefan






^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#56108: 29.0.50; ASAN use-after-free in re_match_2_internal
  2022-06-23 21:29                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-06-24  5:55                       ` Eli Zaretskii
  2022-06-24  6:01                         ` Gerd Möllmann
  0 siblings, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2022-06-24  5:55 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: gerd.moellmann, 56108

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: gerd.moellmann@gmail.com,  56108@debbugs.gnu.org
> Date: Thu, 23 Jun 2022 17:29:13 -0400
> 
> Before `freeze_pattern`, the design was that nothing could happen while
> running the regexp matcher (no GC, no execution of Lisp code).
> 
> Commit 938d252d1c6c5e2027aa250c649deb024154f936 changed that so that
> searching inside a *buffer* could end up running ELisp code (and hence
> also GC).  AFAIK this still can't happen when searching in strings.
> [ IIRC The need to run ELisp is so as to apply `syntax-table` text
>   properties on demand via `syntax-propertize`.  ]
> 
> So I think freeze_pattern should be used in all cases where
> `compile_pattern` is used to search inside a buffer, but it shouldn't be
> necessary when searching within a string.

I think at least the scenario uncovered by Gerd, shown in this
backtrace-like form:

> maybe_gc
> Ffuncall
> call2 
> signal_or_quit (eval.c:1741)
> quit (eval.c:1697)
> process_quit_flag (eval.c:1657)
> probably_quit (eval.c:1864)
> maybe_quit (lisp.h:3681)
> re_match_2_internal (regexp-emacs.c:4691)

could happen even when searching within strings.  And in general, as I
tried to explain up-thread, relying on what cannot happen _today_ wrt
GC is not future-proof, the way Emacs development advances.

So I think we should install a change that calls freeze_pattern for
every pattern-cache entry as long as it is in use.

Gerd, would you please show the patch for that?





^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#56108: 29.0.50; ASAN use-after-free in re_match_2_internal
  2022-06-24  5:55                       ` Eli Zaretskii
@ 2022-06-24  6:01                         ` Gerd Möllmann
  2022-06-24  9:35                           ` Gerd Möllmann
  0 siblings, 1 reply; 22+ messages in thread
From: Gerd Möllmann @ 2022-06-24  6:01 UTC (permalink / raw)
  To: Stefan Monnier, Eli Zaretskii; +Cc: 56108

[-- Attachment #1: Type: text/plain, Size: 174 bytes --]

On 24. Jun 2022, 07:56 +0200, Eli Zaretskii <eliz@gnu.org>, wrote:
> > Gerd, would you please show the patch for that?

It's not ready yet.  I'll send something later.

[-- Attachment #2: Type: text/html, Size: 695 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#56108: 29.0.50; ASAN use-after-free in re_match_2_internal
  2022-06-24  6:01                         ` Gerd Möllmann
@ 2022-06-24  9:35                           ` Gerd Möllmann
  2022-06-24 15:40                             ` Eli Zaretskii
  2022-06-27 13:26                             ` Eli Zaretskii
  0 siblings, 2 replies; 22+ messages in thread
From: Gerd Möllmann @ 2022-06-24  9:35 UTC (permalink / raw)
  To: Stefan Monnier, Eli Zaretskii; +Cc: 56108


[-- Attachment #1.1: Type: text/plain, Size: 605 bytes --]

Please find patch attached.

Some notes about the patch:

• TRT, I think, would be to change the whole cacheing to use Lisp objects etc.  I couldn't persuade myself to do that.
• A less right thing, but better than the patch, would be to protect the cache entry in re_match_2_internal.  But that requires interface changes because re_match_2_internal currently doesn't know about cash entries.  I couldn't bring myself to do that either.

Another note:  Should some document mention that trailing whitespace are not allowed in the git repo?  I couldn't find that anywhere.






[-- Attachment #1.2: Type: text/html, Size: 1118 bytes --]

[-- Attachment #2: 0001-Prevent-reexp-cache-entry-GC-in-more-cases.patch --]
[-- Type: application/octet-stream, Size: 3836 bytes --]

From 69a31c9976316b3a0542503f8c34adb1782a954f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gerd=20M=C3=B6llmann?= <gerd@gnu.org>
Date: Fri, 24 Jun 2022 10:44:17 +0200
Subject: [PATCH] Prevent reexp cache entry GC in more cases

* src/search.c (string_match_1, fast_string_match_internal)
(fast_c_string_match_ignore_case): Use freeze_pattern.
---
 src/search.c | 49 ++++++++++++++++++++++++++-----------------------
 1 file changed, 26 insertions(+), 23 deletions(-)

diff --git a/src/search.c b/src/search.c
index 816a757c18..9d6bd074e1 100644
--- a/src/search.c
+++ b/src/search.c
@@ -370,7 +370,6 @@ string_match_1 (Lisp_Object regexp, Lisp_Object string, Lisp_Object start,
 		bool posix, bool modify_data)
 {
   ptrdiff_t val;
-  struct re_pattern_buffer *bufp;
   EMACS_INT pos;
   ptrdiff_t pos_byte, i;
   bool modify_match_data = NILP (Vinhibit_changing_match_data) && modify_data;
@@ -401,17 +400,22 @@ string_match_1 (Lisp_Object regexp, Lisp_Object string, Lisp_Object start,
   set_char_table_extras (BVAR (current_buffer, case_canon_table), 2,
 			 BVAR (current_buffer, case_eqv_table));
 
-  bufp = &compile_pattern (regexp,
-                           (modify_match_data ? &search_regs : NULL),
-                           (!NILP (BVAR (current_buffer, case_fold_search))
-                            ? BVAR (current_buffer, case_canon_table) : Qnil),
-                           posix,
-                           STRING_MULTIBYTE (string))->buf;
+  specpdl_ref count = SPECPDL_INDEX ();
+  struct regexp_cache *cache_entry
+    = compile_pattern (regexp,
+		       modify_match_data ? &search_regs : NULL,
+		       (!NILP (BVAR (current_buffer, case_fold_search))
+			? BVAR (current_buffer, case_canon_table)
+			: Qnil),
+		       posix,
+		       STRING_MULTIBYTE (string));
+  freeze_pattern (cache_entry);
   re_match_object = string;
-  val = re_search (bufp, SSDATA (string),
+  val = re_search (&cache_entry->buf, SSDATA (string),
 		   SBYTES (string), pos_byte,
 		   SBYTES (string) - pos_byte,
 		   (modify_match_data ? &search_regs : NULL));
+  unbind_to (count, Qnil);
 
   /* Set last_thing_searched only when match data is changed.  */
   if (modify_match_data)
@@ -480,15 +484,15 @@ DEFUN ("posix-string-match", Fposix_string_match, Sposix_string_match, 2, 4, 0,
 fast_string_match_internal (Lisp_Object regexp, Lisp_Object string,
 			    Lisp_Object table)
 {
-  ptrdiff_t val;
-  struct re_pattern_buffer *bufp;
-
-  bufp = &compile_pattern (regexp, 0, table,
-                           0, STRING_MULTIBYTE (string))->buf;
   re_match_object = string;
-  val = re_search (bufp, SSDATA (string),
-		   SBYTES (string), 0,
-		   SBYTES (string), 0);
+  specpdl_ref count = SPECPDL_INDEX ();
+  struct regexp_cache *cache_entry
+    = compile_pattern (regexp, 0, table, 0, STRING_MULTIBYTE (string));
+  freeze_pattern (cache_entry);
+  ptrdiff_t val = re_search (&cache_entry->buf, SSDATA (string),
+			     SBYTES (string), 0,
+			     SBYTES (string), 0);
+  unbind_to (count, Qnil);
   return val;
 }
 
@@ -501,15 +505,14 @@ fast_string_match_internal (Lisp_Object regexp, Lisp_Object string,
 fast_c_string_match_ignore_case (Lisp_Object regexp,
 				 const char *string, ptrdiff_t len)
 {
-  ptrdiff_t val;
-  struct re_pattern_buffer *bufp;
-
   regexp = string_make_unibyte (regexp);
-  bufp = &compile_pattern (regexp, 0,
-                           Vascii_canon_table, 0,
-                           0)->buf;
+  specpdl_ref count = SPECPDL_INDEX ();
+  struct regexp_cache *cache_entry
+    = compile_pattern (regexp, 0, Vascii_canon_table, 0, 0);
+  freeze_pattern (cache_entry);
   re_match_object = Qt;
-  val = re_search (bufp, string, len, 0, len, 0);
+  ptrdiff_t val = re_search (&cache_entry->buf, string, len, 0, len, 0);
+  unbind_to (count, Qnil);
   return val;
 }
 
-- 
2.36.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#56108: 29.0.50; ASAN use-after-free in re_match_2_internal
  2022-06-24  9:35                           ` Gerd Möllmann
@ 2022-06-24 15:40                             ` Eli Zaretskii
  2022-06-25  9:18                               ` Eli Zaretskii
  2022-06-27 13:26                             ` Eli Zaretskii
  1 sibling, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2022-06-24 15:40 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: monnier, 56108

> Date: Fri, 24 Jun 2022 11:35:18 +0200
> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: 56108@debbugs.gnu.org
> 
> Another note:  Should some document mention that trailing whitespace are not allowed in the git repo?  I
> couldn't find that anywhere.

I think it should be in CONTRIBUTE.  But it should describe all the
checks done by our Git hooks in .git/hooks/, not just the
trailing-whitespace check.

I will write that if no one beats me to it.





^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#56108: 29.0.50; ASAN use-after-free in re_match_2_internal
  2022-06-24 15:40                             ` Eli Zaretskii
@ 2022-06-25  9:18                               ` Eli Zaretskii
  0 siblings, 0 replies; 22+ messages in thread
From: Eli Zaretskii @ 2022-06-25  9:18 UTC (permalink / raw)
  To: gerd.moellmann, monnier; +Cc: 56108

> Cc: monnier@iro.umontreal.ca, 56108@debbugs.gnu.org
> Date: Fri, 24 Jun 2022 18:40:33 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> 
> > Date: Fri, 24 Jun 2022 11:35:18 +0200
> > From: Gerd Möllmann <gerd.moellmann@gmail.com>
> > Cc: 56108@debbugs.gnu.org
> > 
> > Another note:  Should some document mention that trailing whitespace are not allowed in the git repo?  I
> > couldn't find that anywhere.
> 
> I think it should be in CONTRIBUTE.  But it should describe all the
> checks done by our Git hooks in .git/hooks/, not just the
> trailing-whitespace check.
> 
> I will write that if no one beats me to it.

Now done.





^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#56108: 29.0.50; ASAN use-after-free in re_match_2_internal
  2022-06-24  9:35                           ` Gerd Möllmann
  2022-06-24 15:40                             ` Eli Zaretskii
@ 2022-06-27 13:26                             ` Eli Zaretskii
  2022-06-27 13:29                               ` Gerd Möllmann
  1 sibling, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2022-06-27 13:26 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: monnier, 56108-done

> Date: Fri, 24 Jun 2022 11:35:18 +0200
> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: 56108@debbugs.gnu.org
> 
> Please find patch attached.
> 
> Some notes about the patch:
> 
> * TRT, I think, would be to change the whole cacheing to use Lisp objects etc.  I couldn't persuade myself
>  to do that.
> * A less right thing, but better than the patch, would be to protect the cache entry in re_match_2_internal. 
>  But that requires interface changes because re_match_2_internal currently doesn't know about cash
>  entries.  I couldn't bring myself to do that either.

Good points.

Since there were no more comments, I've now installed this, and I'm
marking the bug done.

P.S. Gerd, please in the future try to remember to mention the bug
number in the commit log message.  (I added it this time, but I cannot
be trusted to catch that every time ;-)





^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug#56108: 29.0.50; ASAN use-after-free in re_match_2_internal
  2022-06-27 13:26                             ` Eli Zaretskii
@ 2022-06-27 13:29                               ` Gerd Möllmann
  0 siblings, 0 replies; 22+ messages in thread
From: Gerd Möllmann @ 2022-06-27 13:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: monnier, 56108-done

[-- Attachment #1: Type: text/plain, Size: 255 bytes --]


> P.S. Gerd, please in the future try to remember to mention the bug
> number in the commit log message.  (I added it this time, but I cannot
> be trusted to catch that every time ;-)

I will try to remember, but I cannot be trusted either ;-).

Thanks!

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 874 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-06-27 13:29 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-20 14:07 bug#56108: 29.0.50; ASAN use-after-free in re_match_2_internal Gerd Möllmann
2022-06-20 19:09 ` Eli Zaretskii
2022-06-22  8:13   ` Gerd Möllmann
2022-06-22 13:38     ` Eli Zaretskii
2022-06-22 14:10       ` Gerd Möllmann
2022-06-22 14:24         ` Eli Zaretskii
2022-06-22 15:11           ` Gerd Möllmann
2022-06-22 16:19             ` Eli Zaretskii
2022-06-23  5:53               ` Gerd Möllmann
2022-06-23  6:57                 ` Eli Zaretskii
2022-06-23  7:17                   ` Eli Zaretskii
2022-06-23 21:29                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-06-24  5:55                       ` Eli Zaretskii
2022-06-24  6:01                         ` Gerd Möllmann
2022-06-24  9:35                           ` Gerd Möllmann
2022-06-24 15:40                             ` Eli Zaretskii
2022-06-25  9:18                               ` Eli Zaretskii
2022-06-27 13:26                             ` Eli Zaretskii
2022-06-27 13:29                               ` Gerd Möllmann
2022-06-23  8:24                   ` Gerd Möllmann
2022-06-23  8:37                     ` Eli Zaretskii
2022-06-23  8:49                       ` Gerd Möllmann

Code repositories for project(s) associated with this inbox:

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).