all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* re-search-forward/backward causes a segmentation fault
@ 2003-10-08 23:28 Kenichi Handa
  2003-10-11  5:37 ` Richard Stallman
  0 siblings, 1 reply; 4+ messages in thread
From: Kenichi Handa @ 2003-10-08 23:28 UTC (permalink / raw)
  Cc: mule-ja

I got this bug report.

----------------------------------------------------------------------
With these Emacsen:
	NTEmacs 21.3(WindowsXP), NTEmacs 21.3.50(CVS Head, Windows??)
	Emacs 21.2 (Zaurus)
	Emacs 21.3 (RHL7.2, Debian)
	Emacs 21.3.50 (CVS Head, Solaris7)
evaluating the following causes segmentation fault.

(let* ((re "[X\xd1d8]*")
       (re2 (concat re re))
       (re4 (concat re2 re2))
       (re8 (concat re4 re4))
       (re16 (concat re8 re8)))
  (re-search-backward
   (concat
    re "\\|" "\\(" re2 " \\|" re2 "\\|" re2 "\\)" re
    "\\([X" (make-string 1816 ?\xd1d8) "]\\|" re4 "\\|" re2
    "\\(" re4 "\\|" re "\\|" re4 "\\)\\|"
    re4 "\\|" re4 "\\|" re4 "\\|"
    re2 "\\(" re16 "\\|" re8 "\\|" re8 "\\)\\|"
    re2 "\\(" re "\\|" re2 "\\|" re4 "\\)\\|"
    re2 "\\(" re "\\|" re "\\|" re8 "\\|" re4 "\\(" re2 "\\)\\)\\|"
    re16 re16 "\\|" re "\\(" re8 "\\|" re8 "\\)\\|"
    re2 "\\|" re "\\(" re16"\\|" re4 "\\|" re4 "\\)\\|"
    re4 "\\(" re4 "\\|" re4 "\\)\\|" re "\\(" re8 "\\|" re8 "\\|"
    re8 "\\(" re4 "\\|" re4 "\\)" "\\)\\|"
    re2 "\\(" re "\\|" re "\\|" re2 "\\|" re2 "\\|" re2 "\\|" re "\\|"
    re "\\(" re4 "\\|" re "\\)" "\\|" re16 "\\|"
    re "\\(" re4 "\\|" re16 re16 "\\)\\|"
    re "\\(" re16 re16 "\\|" re "\\|" re4 "\\(" re8 "\\|" re4 "\\)\\)\\|"
    re "\\|" re "\\|" re "\\)\\|" re "\\(" re8 "\\|"
    re2 "\\)\\|" re4 "\\|" re4 "\\|" re4 "\\|" re4 "\\|" re2 "\\|"
    re2 "\\|" re2 "\\|" re2 "\\|" re2 "\\(" re "\\|" re8 "\\|"
    re16 re16 re16 re8 "\\|" re8 "\\|" re2 "\\|" re2 "\\|"
    re2 "\\|" re2 "\\|" re2 "\\|" re2 "\\|" re2 "\\)\\|"
    (mapconcat 'identity (make-list 39 re4) "\\|") "\\|"
    re "\\|" re "\\)") nil t))

This kind of giant regular expression is generated by migemo
(http://migemo.namazu.org/).
----------------------------------------------------------------------

I also confirmed the same phenomenon on:
	Emacs 21.3.50 (CVS Head, Debian)
Here's the backtrace I got at that time.

Program received signal SIGABRT, Aborted.
0x4030f781 in kill () from /lib/libc.so.6
(gdb) bt 10
#0  0x4030f781 in kill () from /lib/libc.so.6
#1  0x080d964a in abort () at emacs.c:417
#2  0x0811c295 in re_match_2_internal (bufp=0x83b692c, 
    string1=0x8665988 ";; This buffer is for notes you don't want to save, and for Lisp evaluation.\n;; If you want to create a file, visit that file with C-x C-f,\n;; then enter the text in that file's own buffer.\n\n(let* ((r"..., 
    size1=1554, string2=0x8666216 "\n", size2=1, pos=1554, regs=0x83acc44, 
    stop=1554) at regex.c:5866
#3  0x08119044 in re_search_2 (bufp=0x83b692c, 
    str1=0x8665988 ";; This buffer is for notes you don't want to save, and for Lisp evaluation.\n;; If you want to create a file, visit that file with C-x C-f,\n;; then enter the text in that file's own buffer.\n\n(let* ((r"..., 
    size1=1554, str2=0x8666216 "\n", size2=1, startpos=1554, range=-1554, 
    regs=0x83acc44, stop=1554) at regex.c:4260
#4  0x08110643 in search_buffer (string=1751017996, pos=1551, pos_byte=1554, 
    lim=1, lim_byte=1, n=-1, RE=1, trt=-2007727184, inverse_trt=-2007707384, 
    posix=0) at search.c:1069
#5  0x081103dc in search_command (string=1751017996, bound=675020044, 
    noerror=675020092, count=675020044, direction=-1, RE=1, posix=0)
    at search.c:904
#6  0x081123d4 in Fre_search_backward (regexp=1751017996, bound=675020044, 
    noerror=675020092, count=675020044) at search.c:2108
#7  0x08132b91 in Feval (form=-1467761176) at eval.c:2088
#8  0x08130714 in Fprogn (args=-1467763496) at eval.c:408
#9  0x08131197 in FletX (args=-1467761184) at eval.c:878
(More stack frames follow...)
(gdb) up
#1  0x080d964a in abort () at emacs.c:417
(gdb) up
#2  0x0811c295 in re_match_2_internal (bufp=0x83b692c, 
    string1=0x8665988 ";; This buffer is for notes you don't want to save, and for Lisp evaluation.\n;; If you want to create a file, visit that file with C-x C-f,\n;; then enter the text in that file's own buffer.\n\n(let* ((r"..., 
    size1=1554, string2=0x8666216 "\n", size2=1, pos=1554, regs=0x83acc44, 
    stop=1554) at regex.c:5866

(gdb) p p[-1]
$1 = 168 '\250'  <- This is an invalid (re_opcode_t).

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: re-search-forward/backward causes a segmentation fault
  2003-10-08 23:28 re-search-forward/backward causes a segmentation fault Kenichi Handa
@ 2003-10-11  5:37 ` Richard Stallman
  2003-10-13  2:11   ` Kenichi Handa
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Stallman @ 2003-10-11  5:37 UTC (permalink / raw)
  Cc: emacs-devel

The regexp you showedme is too big to be handled with the current
regexp format.  The bug was that regex.c thought that 2^16 bytes was
the limit.  Since jump offsets are signed, really only 2^15 bytes can
be accommodated.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: re-search-forward/backward causes a segmentation fault
  2003-10-11  5:37 ` Richard Stallman
@ 2003-10-13  2:11   ` Kenichi Handa
  2003-10-13 18:21     ` Richard Stallman
  0 siblings, 1 reply; 4+ messages in thread
From: Kenichi Handa @ 2003-10-13  2:11 UTC (permalink / raw)
  Cc: emacs-devel

In article <E1A8CRN-0001Ey-FS@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:

> The regexp you showedme is too big to be handled with the current
> regexp format.  The bug was that regex.c thought that 2^16 bytes was
> the limit.  Since jump offsets are signed, really only 2^15 bytes can
> be accommodated.

I see.  So, regex_compile should check the size of offset
before storing it in a buffer for compiled code.  But,
doesn't it mean that if regex_compile does that check, we
don't have to have the limit of 2^16 as below?

/* This is not an arbitrary limit: the arguments which represent offsets
   into the pattern are two bytes long.  So if 2^16 bytes turns out to
   be too small, many things would have to change.  */
/* Any other compiler which, like MSC, has allocation limit below 2^16
   bytes will have to use approach similar to what was done below for
   MSC and drop MAX_BUF_SIZE a bit.  Otherwise you may end up
   reallocating to 0 bytes.  Such thing is not going to work too well.
   You have been warned!!  */
#if defined _MSC_VER  && !defined WIN32
/* Microsoft C 16-bit versions limit malloc to approx 65512 bytes.  */
# define MAX_BUF_SIZE  65500L
#else
# define MAX_BUF_SIZE (1L << 16)
#endif

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: re-search-forward/backward causes a segmentation fault
  2003-10-13  2:11   ` Kenichi Handa
@ 2003-10-13 18:21     ` Richard Stallman
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Stallman @ 2003-10-13 18:21 UTC (permalink / raw)
  Cc: emacs-devel

    I see.  So, regex_compile should check the size of offset
    before storing it in a buffer for compiled code.  But,
    doesn't it mean that if regex_compile does that check, we
    don't have to have the limit of 2^16 as below?

regex_compile is the place that checks, and I am going to cut the
value of MAX_BUF_SIZE by 50%.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2003-10-13 18:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-08 23:28 re-search-forward/backward causes a segmentation fault Kenichi Handa
2003-10-11  5:37 ` Richard Stallman
2003-10-13  2:11   ` Kenichi Handa
2003-10-13 18:21     ` Richard Stallman

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.