unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
@ 2012-05-19 16:10 Juanma Barranquero
  2012-05-19 16:27 ` Eli Zaretskii
  0 siblings, 1 reply; 39+ messages in thread
From: Juanma Barranquero @ 2012-05-19 16:10 UTC (permalink / raw)
  To: 11519

Package: emacs
Version: 24.1.50


With current trunk (revno:108308) bootstrapping on Windows throws an
error while building custom-deps:

  make -w cus-load.el-CMD
  make[2]: Entering directory `C:/emacs/debug/lisp'
  echo ;;; cus-load.el --- automatically extracted custom
dependencies> cus-load.el-CMD
  echo ;;>> cus-load.el-CMD
  echo ;;; Code:>> cus-load.el-CMD
  echo.\f>> cus-load.el-CMD
  echo ;; Local Variables:>> cus-load.el-CMD
  echo ;; version-control: never>> cus-load.el-CMD
  echo ;; no-byte-compile: t>> cus-load.el-CMD
  echo ;; no-update-autoloads: t>> cus-load.el-CMD
  echo ;; End:>> cus-load.el-CMD
  make[2]: Leaving directory `C:/emacs/debug/lisp'
  mv cus-load.el-CMD C:/emacs/debug/lisp/cus-load.el
  Directories: calc calendar emacs-lisp emulation erc eshell gnus
international language mail mh-e net nxml org play progmodes textmodes
url vc cedet ce
  "../src/oo/i386/emacs.exe" -batch --no-site-file --no-site-lisp -l
cus-dep --eval "(setq find-file-hook nil)" \
                    -f custom-make-dependencies
C:/Devel/emacs/repo/debug/lisp calc calendar emacs-lisp emulation erc
eshell gnus international language
  Directory C:/emacs/debug/lisp
  Directory calc
  Directory calendar
  Directory emacs-lisp
  Directory emulation
  Directory erc
  Directory eshell
  Directory gnus
  Directory international
  Directory language
  Wrong type argument: characterp, 4195283
  make[1]: [custom-deps] Error -1 (ignored)
  rm "../src/oo/i386/emacs.exe"

After bootstraping,

  cd lisp
  make custom-deps

works as expected.

The error appeared somewhere in the past ten days or so (I have a
bootstrap log from 2012-05-08 which does not show the problem).

    Juanma





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-19 16:10 bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping Juanma Barranquero
@ 2012-05-19 16:27 ` Eli Zaretskii
  2012-05-19 21:40   ` Juanma Barranquero
  0 siblings, 1 reply; 39+ messages in thread
From: Eli Zaretskii @ 2012-05-19 16:27 UTC (permalink / raw)
  To: Juanma Barranquero; +Cc: 11519

> From: Juanma Barranquero <lekktu@gmail.com>
> Date: Sat, 19 May 2012 18:10:07 +0200
> 
>   Directory language
>   Wrong type argument: characterp, 4195283
>   make[1]: [custom-deps] Error -1 (ignored)

Can you try finding out which file in lisp/language causes that, and
on which line?





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-19 16:27 ` Eli Zaretskii
@ 2012-05-19 21:40   ` Juanma Barranquero
  2012-05-20 17:27     ` Eli Zaretskii
  0 siblings, 1 reply; 39+ messages in thread
From: Juanma Barranquero @ 2012-05-19 21:40 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 11519

> Can you try finding out which file in lisp/language causes that, and
> on which line?

Yes, but it is slow; it only happens on bootstraps.

    Juanma





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-19 21:40   ` Juanma Barranquero
@ 2012-05-20 17:27     ` Eli Zaretskii
  2012-05-20 19:00       ` Juanma Barranquero
  2012-05-21  1:49       ` Stefan Monnier
  0 siblings, 2 replies; 39+ messages in thread
From: Eli Zaretskii @ 2012-05-20 17:27 UTC (permalink / raw)
  To: Juanma Barranquero; +Cc: 11519

> From: Juanma Barranquero <lekktu@gmail.com>
> Date: Sat, 19 May 2012 23:40:06 +0200
> Cc: 11519@debbugs.gnu.org
> 
> > Can you try finding out which file in lisp/language causes that, and
> > on which line?
> 
> Yes, but it is slow; it only happens on bootstraps.

If you started looking, I think you can stop.  It's a %$#@!
heisenbug.  If I add a single (message %s file) to
custom-make-dependencies, it displays ethio-util.el as the last file
name before the error.  But adding another message makes the problem
disappear, and the bootstrap runs flawlessly to completion.  Likewise,
running just "make custom-deps" with the emacs.exe built by the first
stage of the bootstrap doesn't produce any errors.

I will try a few more tricks; ideas for how to debug this, or where to
look, are welcome.

If you say that this started happening only lately, perhaps bisecting
might give some insights.

Thanks.





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-20 17:27     ` Eli Zaretskii
@ 2012-05-20 19:00       ` Juanma Barranquero
  2012-05-21  1:50         ` Stefan Monnier
  2012-05-21  1:49       ` Stefan Monnier
  1 sibling, 1 reply; 39+ messages in thread
From: Juanma Barranquero @ 2012-05-20 19:00 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 11519

On Sun, May 20, 2012 at 7:27 PM, Eli Zaretskii <eliz@gnu.org> wrote:

> If you started looking, I think you can stop.  It's a %$#@!
> heisenbug.

I didn't start, I was far from home today.

> If you say that this started happening only lately, perhaps bisecting
> might give some insights.

The bug disappears if you revert this change:

revno: 108157
committer: Glenn Morris <rgm@gnu.org>
branch nick: trunk
timestamp: Mon 2012-05-07 21:50:17 -0400
message:
  Remove no-byte-compile setting from some lisp/language files.

  Same comments as per r108075, 2012-04-30, for lisp/term:

    Not that compiling these will bring any noticeable speed benefit, but
    there's really no reason not to compile them.  The extra disk space
    and build time is negligible, and it might reveal use of obsolete
    functions, bugs, etc.


    Juanma





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-20 17:27     ` Eli Zaretskii
  2012-05-20 19:00       ` Juanma Barranquero
@ 2012-05-21  1:49       ` Stefan Monnier
  2012-05-21  2:50         ` Eli Zaretskii
  1 sibling, 1 reply; 39+ messages in thread
From: Stefan Monnier @ 2012-05-21  1:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Juanma Barranquero, 11519

> If you started looking, I think you can stop.  It's a %$#@!
> heisenbug.  If I add a single (message %s file) to
> custom-make-dependencies, it displays ethio-util.el as the last file
> name before the error.  But adding another message makes the problem
> disappear, and the bootstrap runs flawlessly to completion.  Likewise,
> running just "make custom-deps" with the emacs.exe built by the first
> stage of the bootstrap doesn't produce any errors.

I guess the best options is to change the makefile so it sets
debug-on-error (and/or starts Emacs under a debugger) in the area of
interest and hope that the problem will still show up.


        Stefan





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-20 19:00       ` Juanma Barranquero
@ 2012-05-21  1:50         ` Stefan Monnier
  2012-05-21  2:51           ` Eli Zaretskii
  0 siblings, 1 reply; 39+ messages in thread
From: Stefan Monnier @ 2012-05-21  1:50 UTC (permalink / raw)
  To: Juanma Barranquero; +Cc: 11519

> The bug disappears if you revert this change:
[...]
>   Remove no-byte-compile setting from some lisp/language files.

My gut tells me this is just a trigger but not the cause of the bug :-(


        Stefan





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-21  1:49       ` Stefan Monnier
@ 2012-05-21  2:50         ` Eli Zaretskii
  2012-05-21  3:21           ` Stefan Monnier
  0 siblings, 1 reply; 39+ messages in thread
From: Eli Zaretskii @ 2012-05-21  2:50 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: lekktu, 11519

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Juanma Barranquero <lekktu@gmail.com>,  11519@debbugs.gnu.org
> Date: Sun, 20 May 2012 21:49:02 -0400
> 
> > If you started looking, I think you can stop.  It's a %$#@!
> > heisenbug.  If I add a single (message %s file) to
> > custom-make-dependencies, it displays ethio-util.el as the last file
> > name before the error.  But adding another message makes the problem
> > disappear, and the bootstrap runs flawlessly to completion.  Likewise,
> > running just "make custom-deps" with the emacs.exe built by the first
> > stage of the bootstrap doesn't produce any errors.
> 
> I guess the best options is to change the makefile so it sets
> debug-on-error (and/or starts Emacs under a debugger) in the area of
> interest and hope that the problem will still show up.

I already tried the former (the problem went away).  I'm guessing the
latter will do the same.  I will try to think about some trick, but
frankly it's very frustrating, because any small change to cus-dep.el
or the setup makes the problem go away.





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-21  1:50         ` Stefan Monnier
@ 2012-05-21  2:51           ` Eli Zaretskii
  2012-05-21  7:59             ` Andreas Schwab
  0 siblings, 1 reply; 39+ messages in thread
From: Eli Zaretskii @ 2012-05-21  2:51 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: lekktu, 11519

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Eli Zaretskii <eliz@gnu.org>,  11519@debbugs.gnu.org
> Date: Sun, 20 May 2012 21:50:04 -0400
> 
> > The bug disappears if you revert this change:
> [...]
> >   Remove no-byte-compile setting from some lisp/language files.
> 
> My gut tells me this is just a trigger but not the cause of the bug :-(

What could be the cause, given that this commit triggers it?





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-21  2:50         ` Eli Zaretskii
@ 2012-05-21  3:21           ` Stefan Monnier
  0 siblings, 0 replies; 39+ messages in thread
From: Stefan Monnier @ 2012-05-21  3:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: lekktu, 11519

> I already tried the former (the problem went away).  I'm guessing the
> latter will do the same.  I will try to think about some trick, but
> frankly it's very frustrating, because any small change to cus-dep.el
> or the setup makes the problem go away.

How 'bout changing the C code so a characterp error aborts?


        Stefan





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-21  2:51           ` Eli Zaretskii
@ 2012-05-21  7:59             ` Andreas Schwab
  2012-05-21 17:51               ` Eli Zaretskii
  0 siblings, 1 reply; 39+ messages in thread
From: Andreas Schwab @ 2012-05-21  7:59 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: lekktu, 11519

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Stefan Monnier <monnier@iro.umontreal.ca>
>> Cc: Eli Zaretskii <eliz@gnu.org>,  11519@debbugs.gnu.org
>> Date: Sun, 20 May 2012 21:50:04 -0400
>> 
>> > The bug disappears if you revert this change:
>> [...]
>> >   Remove no-byte-compile setting from some lisp/language files.
>> 
>> My gut tells me this is just a trigger but not the cause of the bug :-(
>
> What could be the cause, given that this commit triggers it?

Perhaps some constant sharing that the byte compiler introduces?

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-21  7:59             ` Andreas Schwab
@ 2012-05-21 17:51               ` Eli Zaretskii
  2012-05-21 20:39                 ` Stefan Monnier
  2012-05-22 14:38                 ` Kenichi Handa
  0 siblings, 2 replies; 39+ messages in thread
From: Eli Zaretskii @ 2012-05-21 17:51 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: lekktu, 11519

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,  lekktu@gmail.com,  11519@debbugs.gnu.org
> Date: Mon, 21 May 2012 09:59:50 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> From: Stefan Monnier <monnier@iro.umontreal.ca>
> >> Cc: Eli Zaretskii <eliz@gnu.org>,  11519@debbugs.gnu.org
> >> Date: Sun, 20 May 2012 21:50:04 -0400
> >> 
> >> > The bug disappears if you revert this change:
> >> [...]
> >> >   Remove no-byte-compile setting from some lisp/language files.
> >> 
> >> My gut tells me this is just a trigger but not the cause of the bug :-(
> >
> > What could be the cause, given that this commit triggers it?
> 
> Perhaps some constant sharing that the byte compiler introduces?

Maybe.  But based on the evidence below, and also on the fact that the
problem is so elusive and behaves like a classical heisenbug, my first
suspicion is elsewhere.

After trying a few tricks, I succeeded to catch the bug in GDB.  (The
full backtrace is near the end of this message.)  Here's what I found.

First, this really happens while cus-dep.el scans ethio-util.el.  The
Lisp call which throws the error is this:

                   (re-search-forward
		     (concat "(provide[ \t\n]+\\('\\|(quote[ \t\n]\\)[ \t\n]*"
			     (regexp-quote name) "[ \t\n)]")
		     nil t))

What I see on the C level is this:

(gdb) bt
  #0  xsignal2 (error_symbol=56891898, arg1=56963794, arg2=16781132)
      at eval.c:1740
  #1  0x01024a51 in wrong_type_argument (predicate=56891898, value=16781132)
      at data.c:111
  #2  0x0102b4e8 in Faref (array=61843973, idx=16781132) at data.c:2090
  #3  0x0129e66e in char_table_translate (table=61843973, ch=4195283)
      at chartab.c:680
  #4  0x01142f6a in re_search_2 (bufp=0x1933c00,
      str1=0x10757948 "\002╘⌐\002\303¿\002\312\212\002\307╖\002\333¼\002\334ק\002\341\214\217ß\002\341\214\216ß\002\303▒\002╘╗\002\341\214\215ß\002\303╗\002\341\214\214ß\002\341\214ßכ\002\305⌐\002\307ע\002\305┐\002\307ח\002\305¿\002\305\262\002\303צ\002\306\263\002\341\214\212ß\002\341\214ßי\002\303\220",
      size1=51024, str2=0x1077fb17 <Address 0x1077fb17 out of bounds>, size2=0,
      startpos=23749, range=27240, regs=0x19351f0, stop=51024) at regex.c:4422
  #5  0x010fbd70 in search_buffer (string=272417249, pos=1, pos_byte=1,
      lim=48448, lim_byte=51025, n=1, RE=1, trt=61843973, inverse_trt=61841925,
      posix=0) at search.c:1205
  #6  0x010fb578 in search_command (string=272417249, bound=56838170,
      noerror=56838194, count=56838170, direction=1, RE=1, posix=0)
      at search.c:997
  #7  0x010fef92 in Fre_search_forward (regexp=272417249, bound=56838170,
      noerror=56838194, count=56838170) at search.c:2162
  (gdb) fr 7
  #7  0x010fef92 in Fre_search_forward (regexp=272417249, bound=56838170,
      noerror=56838194, count=56838170) at search.c:2162
  2162      return search_command (regexp, bound, noerror, count, 1, 1, 0);
  (gdb) p regexp
  $9 = 272417249
  (gdb) xstring
  $10 = (struct Lisp_String *) 0x103cc1e0
  "(provide[	 \n]+\\('\\|(quote[	 \n]\\)[	 \n]*ethio-util[	 \n)]"
  (gdb) fr 3
  #3  0x0129e66e in char_table_translate (table=61843973, ch=4195283)
      at chartab.c:680
  680       value = Faref (table, make_number (ch));
  (gdb) p table
  $11 = 61843973
  (gdb) xtype
  Lisp_Vectorlike
  PVEC_CHAR_TABLE
  (gdb) xchartable
  $12 = (struct Lisp_Char_Table *) 0x3afaa00
  Purpose: "case-table"  3 extra slots
  (gdb) p current_buffer->text->beg
  $13 = (
      unsigned char *) 0x10744948 ";;; ethio-util.el --- utilities for Ethiopic  -*- coding: utf-8-emacs; -*-\n\n;; Copyright (C) 1997-1998, 2002-2012  Free Software Foundation, Inc.\n;; Copyright (C) 1997, 1998, 1999, 2000, 2001, 2002, 20"...
  (gdb) p current_buffer->pt
  $14 = 1

So we are searching from buffer beginning, and the buffer text looks
perfectly OK.  But look what "text" is regex.c trying to search:

#4  0x01142f6a in re_search_2 (bufp=0x1933c00,
    str1=0x10757948 "\002╘⌐\002\303¿\002\312\212\002\307╖\002\333¼\002\334ק\002\341\214\217ß\002\341\214\216ß\002\303▒\002╘╗\002\341\214\215ß\002\303╗\002\341\214\214ß\002\341\214ßכ\002\305⌐\002\307ע\002\305┐\002\307ח\002\305¿\002\305\262\002\303צ\002\306\263\002\341\214\212ß\002\341\214ßי\002\303\220",
    size1=51024, str2=0x1077fb17 <Address 0x1077fb17 out of bounds>, size2=0,
    startpos=23749, range=27240, regs=0x19351f0, stop=51024) at regex.c:4422

Total garbage.

So I think that what happened is that something, probably the
translation through a char-table, caused allocation of a large chunk
of memory, which in turn relocated the text of the current buffer
behind regex.c's back, which still uses C pointers to the old location
of the buffer text.  Here's how search.c calls re_search_2:

      p1 = BEGV_ADDR;
      s1 = GPT_BYTE - BEGV_BYTE;
      p2 = GAP_END_ADDR;
      s2 = ZV_BYTE - GPT_BYTE;
      if (s1 < 0)
	{
	  p2 = p1;
	  s2 = ZV_BYTE - BEGV_BYTE;
	  s1 = 0;
	}
      if (s2 < 0)
	{
	  s1 = ZV_BYTE - BEGV_BYTE;
	  s2 = 0;
	}
      re_match_object = Qnil;

      while (n < 0)
	{
	  EMACS_INT val;
	  val = re_search_2 (bufp, (char *) p1, s1, (char *) p2, s2,
			     pos_byte - BEGV_BYTE, lim_byte - pos_byte,
			     (NILP (Vinhibit_changing_match_data)
			      ? &search_regs : &search_regs_1),
			     /* Don't allow match past current point */
			     pos_byte - BEGV_BYTE);

We pass s1 and s2, which are pointers to buffer text, so if buffer
text is relocated, we are screwed.

Does this explanation sound reasonable?  If so, any ideas how to fix
this?

Here's the full backtrace:

  #0  xsignal2 (error_symbol=56891898, arg1=56963794, arg2=16781132)
      at eval.c:1740
  #1  0x01024a51 in wrong_type_argument (predicate=56891898, value=16781132)
      at data.c:111
  #2  0x0102b4e8 in Faref (array=61843973, idx=16781132) at data.c:2090
  #3  0x0129e66e in char_table_translate (table=61843973, ch=4195283)
      at chartab.c:680
  #4  0x01142f6a in re_search_2 (bufp=0x1933c00,
      str1=0x10757948 "\002╘⌐\002\303¿\002\312\212\002\307╖\002\333¼\002\334ק\002 341\214\\217ß\002\341\214\216ß\002\303▒\002╘╗\002\341\214\215ß\002\303╗\002\341\214\214ß\002\341\214ßכ\002\305⌐\002\307ע\002\305┐\002\307ח\002\305¿\002\305\262\002\303צ\002\306\263\002\341\214\212ß\002\341\214ßי\002\303\220",
      size1=51024, str2=0x1077fb17 <Address 0x1077fb17 out of bounds>, size2=0,
      startpos=23749, range=27240, regs=0x19351f0, stop=51024) at regex.c:4422
  #5  0x010fbd70 in search_buffer (string=272417249, pos=1, pos_byte=1,
      lim=48448, lim_byte=51025, n=1, RE=1, trt=61843973, inverse_trt=61841925,
      posix=0) at search.c:1205
  #6  0x010fb578 in search_command (string=272417249, bound=56838170,
      noerror=56838194, count=56838170, direction=1, RE=1, posix=0)
      at search.c:997
  #7  0x010fef92 in Fre_search_forward (regexp=272417249, bound=56838170,
      noerror=56838194, count=56838170) at search.c:2162
  #8  0x01036e84 in Ffuncall (nargs=4, args=0x82df40) at eval.c:2946
  #9  0x011236db in exec_byte_code (bytestr=66621569, vector=62818821,
      maxdepth=32, args_template=56838170, nargs=0, args=0x0) at bytecode.c:785
  #10 0x01037c4e in funcall_lambda (fun=61992805, nargs=0, arg_vector=0x82e1b4)
      at eval.c:3166
  #11 0x0103712c in Ffuncall (nargs=1, args=0x82e1b0) at eval.c:2984
  #12 0x01034c3c in eval_sub (form=59285366) at eval.c:2255
  #13 0x01030431 in Fprogn (args=59285382) at eval.c:364
  #14 0x010302b1 in Fif (args=59285334) at eval.c:315
  #15 0x01034986 in eval_sub (form=59285310) at eval.c:2231
  #16 0x01030431 in Fprogn (args=59285390) at eval.c:364
  #17 0x01030389 in Fcond (args=59285398) at eval.c:342
  #18 0x01034986 in eval_sub (form=59284782) at eval.c:2231
  #19 0x01030431 in Fprogn (args=59309078) at eval.c:364
  #20 0x01031bb3 in FletX (args=59286038) at eval.c:983
  #21 0x01034986 in eval_sub (form=59285950) at eval.c:2231
  #22 0x01030431 in Fprogn (args=59309158) at eval.c:364
  #23 0x010320db in Fwhile (args=59285942) at eval.c:1075
  #24 0x01034986 in eval_sub (form=59285934) at eval.c:2231
  #25 0x01030431 in Fprogn (args=59309166) at eval.c:364
  #26 0x01032044 in Flet (args=59285606) at eval.c:1053
  #27 0x01034986 in eval_sub (form=59245286) at eval.c:2231
  #28 0x01030431 in Fprogn (args=62798582) at eval.c:364
  #29 0x01034986 in eval_sub (form=62798590) at eval.c:2231
  #30 0x01030291 in Fif (args=62798606) at eval.c:314
  #31 0x01034986 in eval_sub (form=62798614) at eval.c:2231
  #32 0x0103530e in eval_sub (form=59245270) at eval.c:2344
  #33 0x01030431 in Fprogn (args=59309182) at eval.c:364
  #34 0x01032044 in Flet (args=59245262) at eval.c:1053
  #35 0x01034986 in eval_sub (form=59245198) at eval.c:2231
  #36 0x01030431 in Fprogn (args=59308166) at eval.c:364
  #37 0x01037b1a in funcall_lambda (fun=59308190, nargs=1, arg_vector=0x82f1c0)
      at eval.c:3159
  #38 0x01037461 in apply_lambda (fun=59308198, args=59276566) at eval.c:3043
  #39 0x0103533d in eval_sub (form=59276590) at eval.c:2347
  #40 0x01030431 in Fprogn (args=59276558) at eval.c:364
  #41 0x01037b1a in funcall_lambda (fun=59275998, nargs=0, arg_vector=0x82f420)
      at eval.c:3159
  #42 0x01037461 in apply_lambda (fun=59275990, args=56838170) at eval.c:3043
  #43 0x0103533d in eval_sub (form=59187782) at eval.c:2347
  #44 0x0103263c in Funwind_protect (args=59187766) at eval.c:1304
  #45 0x01034986 in eval_sub (form=59187790) at eval.c:2231
  #46 0x01030431 in Fprogn (args=59181374) at eval.c:364
  #47 0x01032044 in Flet (args=59187798) at eval.c:1053
  #48 0x01034986 in eval_sub (form=59187830) at eval.c:2231
  #49 0x01030431 in Fprogn (args=59181366) at eval.c:364
  #50 0x010302b1 in Fif (args=59066374) at eval.c:315
  #51 0x01034986 in eval_sub (form=59066382) at eval.c:2231
  #52 0x01030431 in Fprogn (args=59210614) at eval.c:364
  #53 0x01037b1a in funcall_lambda (fun=59210590, nargs=0, arg_vector=0x82fb30)
      at eval.c:3159
  #54 0x01037461 in apply_lambda (fun=59210582, args=56838170) at eval.c:3043
  #55 0x0103533d in eval_sub (form=58576574) at eval.c:2347
  #56 0x0103450f in Feval (form=58576574, lexical=56838170) at eval.c:2137
  #57 0x01005123 in top_level_2 () at keyboard.c:1169
  #58 0x01032ab9 in internal_condition_case (bfun=0x1005107 <top_level_2>,
      handlers=56891826, hfun=0x1004cba <cmd_error>) at eval.c:1448
  #59 0x01005155 in top_level_1 (ignore=56838170) at keyboard.c:1177
  #60 0x0103247c in internal_catch (tag=56889874, func=0x1005125 <top_level_1>,
      arg=56838170) at eval.c:1205
  #61 0x01005090 in command_loop () at keyboard.c:1132
  #62 0x01004678 in recursive_edit_1 () at keyboard.c:759
  #63 0x0100499a in Frecursive_edit () at keyboard.c:823
  #64 0x010027c9 in main (argc=39, argv=0xa34058) at emacs.c:1711

  Lisp Backtrace:
  "re-search-forward" (0x82df44)
  "custom-make-dependencies" (0x82e1b4)
  "funcall" (0x82e1b0)
  "if" (0x82e430)
  "cond" (0x82e5c0)
  "let*" (0x82e790)
  "while" (0x82e930)
  "let" (0x82eb40)
  "progn" (0x82ec90)
  "if" (0x82ede0)
  "when" (0x82eef0)
  "let" (0x82f0f0)
  "command-line-1" (0x82f1c0)
  "command-line" (0x82f420)
  "unwind-protect" (0x82f6d0)
  "let" (0x82f8d0)
  "if" (0x82fa60)
  "normal-top-level" (0x82fb30)






^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-21 17:51               ` Eli Zaretskii
@ 2012-05-21 20:39                 ` Stefan Monnier
  2012-05-22 19:00                   ` Eli Zaretskii
  2012-05-22 14:38                 ` Kenichi Handa
  1 sibling, 1 reply; 39+ messages in thread
From: Stefan Monnier @ 2012-05-21 20:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: lekktu, Andreas Schwab, 11519

> So I think that what happened is that something, probably the
> translation through a char-table, caused allocation of a large chunk
> of memory, which in turn relocated the text of the current buffer
> behind regex.c's back, which still uses C pointers to the old location
> of the buffer text.  Here's how search.c calls re_search_2:

>       p1 = BEGV_ADDR;
>       s1 = GPT_BYTE - BEGV_BYTE;
>       p2 = GAP_END_ADDR;
>       s2 = ZV_BYTE - GPT_BYTE;
>       if (s1 < 0)
> 	{
> 	  p2 = p1;
> 	  s2 = ZV_BYTE - BEGV_BYTE;
> 	  s1 = 0;
> 	}
>       if (s2 < 0)
> 	{
> 	  s1 = ZV_BYTE - BEGV_BYTE;
> 	  s2 = 0;
> 	}
>       re_match_object = Qnil;

>       while (n < 0)
> 	{
> 	  EMACS_INT val;
> 	  val = re_search_2 (bufp, (char *) p1, s1, (char *) p2, s2,
> 			     pos_byte - BEGV_BYTE, lim_byte - pos_byte,
> 			     (NILP (Vinhibit_changing_match_data)
> 			      ? &search_regs : &search_regs_1),
> 			     /* Don't allow match past current point */
> 			     pos_byte - BEGV_BYTE);

> We pass s1 and s2, which are pointers to buffer text, so if buffer
> text is relocated, we are screwed.

> Does this explanation sound reasonable?  If so, any ideas how to fix
> this?

I suggest you let-bind some witness variable is re_search_2 and then in
the buffer-relocation code, you test this var and abort if it's non-nil.
That should let us catch the offender red-handed, after which we will
know better how to fix the problem.


        Stefan





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-21 17:51               ` Eli Zaretskii
  2012-05-21 20:39                 ` Stefan Monnier
@ 2012-05-22 14:38                 ` Kenichi Handa
  2012-05-22 19:02                   ` Eli Zaretskii
  1 sibling, 1 reply; 39+ messages in thread
From: Kenichi Handa @ 2012-05-22 14:38 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: lekktu, schwab, 11519

In article <83vcjpxw18.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> So I think that what happened is that something, probably the
> translation through a char-table, caused allocation of a large chunk
> of memory, which in turn relocated the text of the current buffer
> behind regex.c's back, which still uses C pointers to the old location
> of the buffer text.

The function char_table_translate may allocate some amount
of memory.  But that's only when the char-table is one of a
"unicode character property table" loaded via the function
uniprop_table.  When char_table_translate is called from
re_search_2 (via the macro RE_TRANSLATE), the char-table to
lookup is one of a case-related table (equiv? canon?) which
is not a unicode property table.

The function that acutally allocates memory via.
char_table_translate is uniprop_table_uncompress.

---
Kenichi Handa
handa@m17n.org





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-21 20:39                 ` Stefan Monnier
@ 2012-05-22 19:00                   ` Eli Zaretskii
  2012-05-22 19:19                     ` Stefan Monnier
  0 siblings, 1 reply; 39+ messages in thread
From: Eli Zaretskii @ 2012-05-22 19:00 UTC (permalink / raw)
  To: Stefan Monnier, Kenichi Handa; +Cc: lekktu, schwab, 11519

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Andreas Schwab <schwab@linux-m68k.org>,  lekktu@gmail.com,  11519@debbugs.gnu.org
> Date: Mon, 21 May 2012 16:39:56 -0400
> 
> I suggest you let-bind some witness variable is re_search_2 and then in
> the buffer-relocation code, you test this var and abort if it's non-nil.
> That should let us catch the offender red-handed, after which we will
> know better how to fix the problem.

I did the equivalent of the above, but without changing the source (to
minimize the chances that the bug will disappear).

Is the evidence below conclusive enough?

  Breakpoint 3, search_buffer (string=272417249, pos=1, pos_byte=1, lim=48448,
      lim_byte=51025, n=1, RE=1, trt=61843973, inverse_trt=61841925, posix=0)
      at search.c:1206
  1206              val = re_search_2 (bufp, (char *) p1, s1, (char *) p2, s2,
  $1771 = 272417249
  $1772 = (struct Lisp_String *) 0x103cc1e0
  "(provide[ 	\n]+\\('\\|(quote[ 	\n]\\)[ 	\n]*ethio-util[ 	\n)]"
  (gdb) p current_buffer->text->beg
  $1773 = (
      unsigned char *) 0x10757948 ";;; ethio-util.el --- utilities for Ethiopic -*- coding: utf-8-emacs; -*-\n\n;; Copyright (C) 1997-1998, 2002-2012  Free Software Foundation, Inc.\n;; Copyright (C) 1997, 1998, 1999, 2000, 2001, 2002, 20"...
  (gdb) p *p1@20
  $1774 = ";;; ethio-util.el --"
  (gdb) p p1
  $1775 = (
      unsigned char *) 0x10757948 ";;; ethio-util.el --- utilities for Ethiopic -*- coding: utf-8-emacs; -*-\n\n;; Copyright (C) 1997-1998, 2002-2012  Free Software Foundation, Inc.\n;; Copyright (C) 1997, 1998, 1999, 2000, 2001, 2002, 20"...

So at this point, before we call re_search_2, p1 and
current_buffer->text->beg point to the same memory.  Now:

  (gdb) watch current_buffer->text->beg
  Hardware watchpoint 4: current_buffer->text->beg
  (gdb) c
  Continuing.
  Hardware watchpoint 4: current_buffer->text->beg

  Old value =
      (unsigned char *) 0x10757948 ";;; ethio-util.el --- utilities for Ethiopic
  -*- coding: utf-8-emacs; -*-\n\n;; Copyright (C) 1997-1998, 2002-2012  Free Soft
  ware Foundation, Inc.\n;; Copyright (C) 1997, 1998, 1999, 2000, 2001, 2002, 20".
  ..
  New value =
      (unsigned char *) 0x10826948 ";;; ethio-util.el --- utilities for Ethiopic
  -*- coding: utf-8-emacs; -*-\n\n;; Copyright (C) 1997-1998, 2002-2012  Free Soft
  ware Foundation, Inc.\n;; Copyright (C) 1997, 1998, 1999, 2000, 2001, 2002, 20"...
  r_alloc_sbrk (size=847872) at ralloc.c:808
  808               for (b = last_bloc; b != NIL_BLOC; b = b->prev)

Note that the address of buffer text has changed from 0x10757948 to
0x10826948.  And the culprit is ...

  (gdb) bt
  #0  r_alloc_sbrk (size=847872) at ralloc.c:808
  #1  0x012e9dc0 in get_contiguous_space (size=847872, position=0x10748000)
      at gmalloc.c:447
  #2  0x012ea662 in _malloc_internal_nolock (size=786436) at gmalloc.c:821
  #3  0x012eaa4c in _malloc_internal (size=786436) at gmalloc.c:904
  #4  0x012eaa99 in e_malloc (size=786436) at gmalloc.c:927
  #5  0x0103a3b2 in emacs_blocked_malloc (size=786436, ptr=0x0) at alloc.c:1308
  #6  0x012eaa99 in e_malloc (size=786436) at gmalloc.c:927
  #7  0x010397fb in xmalloc (size=786436) at alloc.c:727
  #8  0x0120da2d in load_charset_map_from_file (charset=0x1944970,
      mapfile=57455953, control_flag=1) at charset.c:501
  #9  0x0120e480 in load_charset (charset=0x1944970, control_flag=1)
      at charset.c:646
  #10 0x01214cc9 in maybe_unify_char (c=1704385, val=57027682) at charset.c:1644
  #11 0x0128b4a0 in string_char (
      p=0x1075d630 "  2.÷áחג  3.÷áחד  4.÷áחה  5.÷áח\200\")\n  (cond\n   ((= arg ?1)\n    (insert \"÷áח\201\"))\n   ((= arg ?2)\n    (insert \"÷áחג\"))\n   ((= arg ?3)\n    (insert \"÷áחד\"))\n   ((= arg ?4)\n    (insert \"÷áחה\"))\n   ((= arg ?5"..., advanced=0x0, len=0x82dcec) at character.c:200
  #12 0x01142f65 in re_search_2 (bufp=0x1933c08,
      str1=0x10757948 ";;; ethio-util.el --- utilities for Ethiopic       -*- coding: utf-8-emacs; -*-\n\n;; Copyright (C) 1997-1998, 2002-2012  Free Software Foundation, Inc.\n;; Copyright (C) 1997, 1998, 1999, 2000, 2001, 2002, 20"...,
      size1=51024, str2=0x1077fb17 "", size2=0, startpos=23749, range=27244,
      regs=0x19351f8, stop=51024) at regex.c:4421
  #13 0x010fbd78 in search_buffer (string=272417249, pos=1, pos_byte=1,
      lim=48448, lim_byte=51025, n=1, RE=1, trt=61843973, inverse_trt=61841925,
      posix=0) at search.c:1207
  #14 0x010fb578 in search_command (string=272417249, bound=56838170,
      noerror=56838194, count=56838170, direction=1, RE=1, posix=0)
      at search.c:997
  #15 0x010fefa4 in Fre_search_forward (regexp=272417249, bound=56838170,
      noerror=56838194, count=56838170) at search.c:2165

The fragment of regex.c that triggers this is as follows:

	      if (RE_TRANSLATE_P (translate))
		{
		  if (multibyte)
		    while (range > lim)
		      {
			int buf_charlen;

    >>>>>>>>>>>>>>>>>	buf_ch = STRING_CHAR_AND_LENGTH (d, buf_charlen);
			buf_ch = RE_TRANSLATE (translate, buf_ch);
			if (fastmap[CHAR_LEADING_CODE (buf_ch)])
			  break;

			range -= buf_charlen;
			d += buf_charlen;
		      }

The marked line calls string_char, which calls maybe_unify_char, which
calls load_charset, which causes memory allocation and relocation of
buffer text.  The very next call to RE_TRANSLATE, which calls
char_table_translate, throws an error because buf_ch is garbage.

If you agree with the diagnosis, then how about the change below?  It
fixes the problem for me.  (Or is there a better way?)  If accepted, I
will add the necessary commentary to this code and a prototype for the
new function.  In any case, I suggest to install the fix on the
emacs-24 branch, because this issue is a disaster waiting to happen.


=== modified file 'src/ralloc.c'
--- src/ralloc.c	2012-04-16 01:18:13 +0000
+++ src/ralloc.c	2012-05-22 18:39:25 +0000
@@ -1143,6 +1143,12 @@ r_alloc_reset_variable (POINTER *old, PO
   bloc->variable = new;
 }
 
+void
+r_alloc_inhibit_buffer_relocation (int inhibit)
+{
+  use_relocatable_buffers = (inhibit ? 0 : 1);
+}
+
 \f
 /***********************************************************************
 			    Initialization

=== modified file 'src/search.c'
--- src/search.c	2012-05-17 00:03:49 +0000
+++ src/search.c	2012-05-22 18:41:23 +0000
@@ -1158,12 +1158,19 @@ search_buffer (Lisp_Object string, EMACS
       while (n < 0)
 	{
 	  EMACS_INT val;
+
+#ifdef REL_ALLOC
+	  r_alloc_inhibit_buffer_relocation (1);
+#endif
 	  val = re_search_2 (bufp, (char *) p1, s1, (char *) p2, s2,
 			     pos_byte - BEGV_BYTE, lim_byte - pos_byte,
 			     (NILP (Vinhibit_changing_match_data)
 			      ? &search_regs : &search_regs_1),
 			     /* Don't allow match past current point */
 			     pos_byte - BEGV_BYTE);
+#ifdef REL_ALLOC
+	  r_alloc_inhibit_buffer_relocation (0);
+#endif
 	  if (val == -2)
 	    {
 	      matcher_overflow ();
@@ -1202,11 +1209,18 @@ search_buffer (Lisp_Object string, EMACS
       while (n > 0)
 	{
 	  EMACS_INT val;
+
+#ifdef REL_ALLOC
+	  r_alloc_inhibit_buffer_relocation (1);
+#endif
 	  val = re_search_2 (bufp, (char *) p1, s1, (char *) p2, s2,
 			     pos_byte - BEGV_BYTE, lim_byte - pos_byte,
 			     (NILP (Vinhibit_changing_match_data)
 			      ? &search_regs : &search_regs_1),
 			     lim_byte - BEGV_BYTE);
+#ifdef REL_ALLOC
+	  r_alloc_inhibit_buffer_relocation (0);
+#endif
 	  if (val == -2)
 	    {
 	      matcher_overflow ();







^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-22 14:38                 ` Kenichi Handa
@ 2012-05-22 19:02                   ` Eli Zaretskii
  0 siblings, 0 replies; 39+ messages in thread
From: Eli Zaretskii @ 2012-05-22 19:02 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: lekktu, schwab, 11519

> From: Kenichi Handa <handa@gnu.org>
> Cc: schwab@linux-m68k.org, lekktu@gmail.com, 11519@debbugs.gnu.org
> Date: Tue, 22 May 2012 23:38:34 +0900
> 
> In article <83vcjpxw18.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> 
> > So I think that what happened is that something, probably the
> > translation through a char-table, caused allocation of a large chunk
> > of memory, which in turn relocated the text of the current buffer
> > behind regex.c's back, which still uses C pointers to the old location
> > of the buffer text.
> 
> The function char_table_translate may allocate some amount
> of memory.  But that's only when the char-table is one of a
> "unicode character property table" loaded via the function
> uniprop_table.  When char_table_translate is called from
> re_search_2 (via the macro RE_TRANSLATE), the char-table to
> lookup is one of a case-related table (equiv? canon?) which
> is not a unicode property table.

You are right: it was not char_table_translate, it was
load_charset_map_from_file, invoked through STRING_CHAR_AND_LENGTH.





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-22 19:00                   ` Eli Zaretskii
@ 2012-05-22 19:19                     ` Stefan Monnier
  2012-05-22 19:47                       ` Eli Zaretskii
  2012-05-23 14:10                       ` Kenichi Handa
  0 siblings, 2 replies; 39+ messages in thread
From: Stefan Monnier @ 2012-05-22 19:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: schwab, 11519, lekktu

> Note that the address of buffer text has changed from 0x10757948 to
> 0x10826948.  And the culprit is ...

>   #7  0x010397fb in xmalloc (size=786436) at alloc.c:727
>   #8  0x0120da2d in load_charset_map_from_file (charset=0x1944970,
>       mapfile=57455953, control_flag=1) at charset.c:501

Huh!  Indeed, I always assumed that relocation would be something that
can only happen during GC, not in a mere xmalloc.

> The marked line calls string_char, which calls maybe_unify_char, which
> calls load_charset, which causes memory allocation and relocation of
> buffer text.

This brings up back to the issue of calling maybe_unify_char in
STRING_CHAR_AND_LENGTH.  One more strike against it.  Handa, could you
prepare a patch that removes this?

> If you agree with the diagnosis, then how about the change below?

Might be an acceptable workaround for the emacs-24 branch, yes (tho I'd
replace "inhibit ? 0 : 1" with "!inhibit").  But is it really new in
Emacs-24?  It seems the same problem is already present in Emacs-23, so
it's probably not so urgent to fix it for 24.1.

> It fixes the problem for me.  (Or is there a better way?)  If accepted, I
> will add the necessary commentary to this code and a prototype for the
> new function.  In any case, I suggest to install the fix on the
> emacs-24 branch, because this issue is a disaster waiting to happen.

I wonder: why do we use REL_ALLOC?


        Stefan





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-22 19:19                     ` Stefan Monnier
@ 2012-05-22 19:47                       ` Eli Zaretskii
  2012-05-23  0:47                         ` Stefan Monnier
  2012-05-23 14:10                       ` Kenichi Handa
  1 sibling, 1 reply; 39+ messages in thread
From: Eli Zaretskii @ 2012-05-22 19:47 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: schwab, 11519, lekktu

> From: Stefan Monnier <monnier@IRO.UMontreal.CA>
> Cc: Kenichi Handa <handa@gnu.org>, schwab@linux-m68k.org, lekktu@gmail.com,
>         11519@debbugs.gnu.org
> Date: Tue, 22 May 2012 15:19:12 -0400
> 
> > Note that the address of buffer text has changed from 0x10757948 to
> > 0x10826948.  And the culprit is ...
> 
> >   #7  0x010397fb in xmalloc (size=786436) at alloc.c:727
> >   #8  0x0120da2d in load_charset_map_from_file (charset=0x1944970,
> >       mapfile=57455953, control_flag=1) at charset.c:501
> 
> Huh!  Indeed, I always assumed that relocation would be something that
> can only happen during GC, not in a mere xmalloc.

It happens in xmalloc/xrealloc when ralloc.c is used.

> > If you agree with the diagnosis, then how about the change below?
> 
> Might be an acceptable workaround for the emacs-24 branch, yes (tho I'd
> replace "inhibit ? 0 : 1" with "!inhibit").  But is it really new in
> Emacs-24?  It seems the same problem is already present in Emacs-23, so
> it's probably not so urgent to fix it for 24.1.

The problem is indeed not new, but so what?  It is real, and it just
happened to us in real life, albeit on the trunk.  Who knows how many
other problems which we dismiss as not reproducible could have been
caused by this (especially when exotic character sets were involved)?

> I wonder: why do we use REL_ALLOC?

AFAIK, we do that only on platforms that don't support mmap for
allocating buffer text.

The relocatable allocator makes a more efficient use of memory,
especially when it is almost full: it can potentially produce a large
block from several small ones by moving buffer text around to
"compact" their storage.





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-22 19:47                       ` Eli Zaretskii
@ 2012-05-23  0:47                         ` Stefan Monnier
  2012-05-23  2:59                           ` Eli Zaretskii
  0 siblings, 1 reply; 39+ messages in thread
From: Stefan Monnier @ 2012-05-23  0:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: schwab, 11519, lekktu

>> > If you agree with the diagnosis, then how about the change below?
>> Might be an acceptable workaround for the emacs-24 branch, yes (tho I'd
>> replace "inhibit ? 0 : 1" with "!inhibit").  But is it really new in
>> Emacs-24?  It seems the same problem is already present in Emacs-23, so
>> it's probably not so urgent to fix it for 24.1.
> The problem is indeed not new, but so what?  It is real, and it just
> happened to us in real life, albeit on the trunk.  Who knows how many
> other problems which we dismiss as not reproducible could have been
> caused by this (especially when exotic character sets were involved)?

I assume that the problem can show up in many other places than
re_search, so yes, it's a real problem that we need to fix for real, but
your workaround will only fix the problem we happened to bump into, so
adding this fix to the emacs-24 branch may be completely useless if this
bug never shows up at that place there.

>> I wonder: why do we use REL_ALLOC?
> AFAIK, we do that only on platforms that don't support mmap for
> allocating buffer text.

So, IIUC the only reason to use it is so that we can more often return
memory to the OS even for the non-mmap case?  Is that because returning
memory can only be done via sbrk style memory management?

I wonder how effective it is in practice.


        Stefan





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-23  0:47                         ` Stefan Monnier
@ 2012-05-23  2:59                           ` Eli Zaretskii
  2012-05-23 14:16                             ` Stefan Monnier
  0 siblings, 1 reply; 39+ messages in thread
From: Eli Zaretskii @ 2012-05-23  2:59 UTC (permalink / raw)
  To: Stefan Monnier, Richard Stallman; +Cc: schwab, 11519, lekktu

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: handa@gnu.org,  schwab@linux-m68k.org,  lekktu@gmail.com,  11519@debbugs.gnu.org
> Date: Tue, 22 May 2012 20:47:54 -0400
> 
> I assume that the problem can show up in many other places than
> re_search, so yes, it's a real problem that we need to fix for real, but
> your workaround will only fix the problem we happened to bump into

Which other places use C pointers to buffer text and call functions
that can allocate memory?  If you know about such places, please point
them out.  I don't think we have them, but that's me.

> adding this fix to the emacs-24 branch may be completely useless if this
> bug never shows up at that place there.

That "if" is profoundly false, as I can now easily craft a use case
where it _will_ show up.

Anyway, are you against committing this to the release branch?  I'd be
very sad if you were, having invested so much time in hunting this
bug, but I guess I'll survive.

> >> I wonder: why do we use REL_ALLOC?
> > AFAIK, we do that only on platforms that don't support mmap for
> > allocating buffer text.
> 
> So, IIUC the only reason to use it is so that we can more often return
> memory to the OS even for the non-mmap case?  Is that because returning
> memory can only be done via sbrk style memory management?

I don't think this is only about _returning_ memory.  It is first and
foremost about not _asking_ for more memory when we can come up with
it by reshuffling buffer text.

> I wonder how effective it is in practice.

I have no idea.  ralloc.c was in place (and used by all platforms)
long before I became involved with Emacs.  Perhaps Richard can tell.





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-22 19:19                     ` Stefan Monnier
  2012-05-22 19:47                       ` Eli Zaretskii
@ 2012-05-23 14:10                       ` Kenichi Handa
  2012-05-23 15:27                         ` Stefan Monnier
  1 sibling, 1 reply; 39+ messages in thread
From: Kenichi Handa @ 2012-05-23 14:10 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: lekktu, schwab, 11519

In article <jwv8vgkt4lc.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes:

> This brings up back to the issue of calling maybe_unify_char in
> STRING_CHAR_AND_LENGTH.  One more strike against it.  Handa, could you
> prepare a patch that removes this?

Ok, I'll work on it.  But, it's not for 24.1, right?

> > If you agree with the diagnosis, then how about the change below?

> Might be an acceptable workaround for the emacs-24 branch, yes (tho I'd
> replace "inhibit ? 0 : 1" with "!inhibit").

I think it is not just a workaround.  If we can suppress
buffer/string relocation temporarily, we can utilize that
functionality in several other places.

---
Kenichi Handa
handa@m17n.org





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-23  2:59                           ` Eli Zaretskii
@ 2012-05-23 14:16                             ` Stefan Monnier
  2012-05-23 15:23                               ` Ken Brown
                                                 ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Stefan Monnier @ 2012-05-23 14:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: schwab, Richard Stallman, 11519, lekktu

> Which other places use C pointers to buffer text and call functions
> that can allocate memory?

IIUC any place that uses STRING_CHAR_AND_LENGTH on buffer text is
vulnerable to the problem.

> Anyway, are you against committing this to the release branch?  I'd be
> very sad if you were, having invested so much time in hunting this
> bug, but I guess I'll survive.

I'm not dead set against it, and I'm glad we found the culprit so we can
fix it: fixing it on the release branch is not that important, since this
bug has been with us since Emacs-23.1, AFAICT.

If you really want to install your workaround on the emacs-24 branch, go
for it but let's try to find a real fix for the trunk.

>> >> I wonder: why do we use REL_ALLOC?
>> > AFAIK, we do that only on platforms that don't support mmap for
>> > allocating buffer text.
>> So, IIUC the only reason to use it is so that we can more often return
>> memory to the OS even for the non-mmap case?  Is that because returning
>> memory can only be done via sbrk style memory management?
> I don't think this is only about _returning_ memory.  It is first and
> foremost about not _asking_ for more memory when we can come up with
> it by reshuffling buffer text.

So you're saying it's use for fragmentation reasons?
But on other platforms where we use mmap, we do suffer from this
fragmentation, and yet it doesn't seem to be a real source of problem.
That's why I think the only real reason is because memory can only be
returned via sbrk-style memory management (i.e. only free memory at the
end of the heap can be returned).  Is that right?

I guess my question turns into "why do we use gmalloc.c instead of
a malloc library that uses mmap (or some other mechanism that lets it
return large free chunks to the OS)"?

AFAIK, Windows is pretty much the only system where we use gmalloc.c and
ralloc.c nowadays.  Does anyone remember why we don't use the system
malloc under Windows (and Cygwin)?


        Stefan





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-23 14:16                             ` Stefan Monnier
@ 2012-05-23 15:23                               ` Ken Brown
  2012-05-23 16:52                               ` Eli Zaretskii
  2012-05-23 17:34                               ` Eli Zaretskii
  2 siblings, 0 replies; 39+ messages in thread
From: Ken Brown @ 2012-05-23 15:23 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: lekktu, schwab, Richard Stallman, 11519

On 5/23/2012 10:16 AM, Stefan Monnier wrote:
>> Which other places use C pointers to buffer text and call functions
>> that can allocate memory?
>
> IIUC any place that uses STRING_CHAR_AND_LENGTH on buffer text is
> vulnerable to the problem.
>
>> Anyway, are you against committing this to the release branch?  I'd be
>> very sad if you were, having invested so much time in hunting this
>> bug, but I guess I'll survive.
>
> I'm not dead set against it, and I'm glad we found the culprit so we can
> fix it: fixing it on the release branch is not that important, since this
> bug has been with us since Emacs-23.1, AFAICT.
>
> If you really want to install your workaround on the emacs-24 branch, go
> for it but let's try to find a real fix for the trunk.
>
>>>>> I wonder: why do we use REL_ALLOC?
>>>> AFAIK, we do that only on platforms that don't support mmap for
>>>> allocating buffer text.
>>> So, IIUC the only reason to use it is so that we can more often return
>>> memory to the OS even for the non-mmap case?  Is that because returning
>>> memory can only be done via sbrk style memory management?
>> I don't think this is only about _returning_ memory.  It is first and
>> foremost about not _asking_ for more memory when we can come up with
>> it by reshuffling buffer text.
>
> So you're saying it's use for fragmentation reasons?
> But on other platforms where we use mmap, we do suffer from this
> fragmentation, and yet it doesn't seem to be a real source of problem.
> That's why I think the only real reason is because memory can only be
> returned via sbrk-style memory management (i.e. only free memory at the
> end of the heap can be returned).  Is that right?
>
> I guess my question turns into "why do we use gmalloc.c instead of
> a malloc library that uses mmap (or some other mechanism that lets it
> return large free chunks to the OS)"?
>
> AFAIK, Windows is pretty much the only system where we use gmalloc.c and
> ralloc.c nowadays.  Does anyone remember why we don't use the system
> malloc under Windows (and Cygwin)?

Cygwin uses gmalloc.c but not ralloc.c; it uses mmap for buffers.  There 
are two reasons for using gmalloc.c on Cygwin.  The first, which may or 
may not be important, is that Cygwin's malloc doesn't support 
__after_morecore_hook, malloc_get_state, or malloc_set_state.  The 
second has to do with the way emacs is dumped on Cygwin.  See the 
comment starting at line 302 in gmalloc.c.

I would love to find a better way to deal with this and be able to use 
the system malloc on Cygwin.

Ken






^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-23 14:10                       ` Kenichi Handa
@ 2012-05-23 15:27                         ` Stefan Monnier
  2012-05-23 17:02                           ` Eli Zaretskii
  0 siblings, 1 reply; 39+ messages in thread
From: Stefan Monnier @ 2012-05-23 15:27 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: lekktu, schwab, 11519

>> This brings up back to the issue of calling maybe_unify_char in
>> STRING_CHAR_AND_LENGTH.  One more strike against it.  Handa, could you
>> prepare a patch that removes this?
> Ok, I'll work on it.

Thanks.

> But, it's not for 24.1, right?

Right, it's for 24.2, so it's not super urgent.

>> > If you agree with the diagnosis, then how about the change below?
>> Might be an acceptable workaround for the emacs-24 branch, yes (tho I'd
>> replace "inhibit ? 0 : 1" with "!inhibit").
> I think it is not just a workaround.  If we can suppress buffer/string
> relocation temporarily, we can utilize that functionality in several
> other places.

We could suppress relocation all the time.  After all, we don't use
relalloc on GNU systems and under Mac OS X.


        Stefan





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-23 14:16                             ` Stefan Monnier
  2012-05-23 15:23                               ` Ken Brown
@ 2012-05-23 16:52                               ` Eli Zaretskii
  2012-05-23 20:07                                 ` Stefan Monnier
  2012-05-23 17:34                               ` Eli Zaretskii
  2 siblings, 1 reply; 39+ messages in thread
From: Eli Zaretskii @ 2012-05-23 16:52 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: schwab, rms, 11519, lekktu

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Richard Stallman <rms@gnu.org>,  handa@gnu.org,  schwab@linux-m68k.org,  lekktu@gmail.com,  11519@debbugs.gnu.org
> Date: Wed, 23 May 2012 10:16:17 -0400
> 
> > Which other places use C pointers to buffer text and call functions
> > that can allocate memory?
> 
> IIUC any place that uses STRING_CHAR_AND_LENGTH on buffer text is
> vulnerable to the problem.

That's not true.  As long as you access buffer text through character
position, you are safe.  The only situation where you are vulnerable
is if you store a C pointer to buffer text, e.g., like this:

   char *text = BEGV_ADDR;
or
   char *text = BYTE_POS_ADDR (current_buffer->pt);

then invoke some function that can allocate or reallocate memory, and
_then_ access buffer text through that pointer.

If you find such a code anywhere, then that's a bug similar to this
one.

> If you really want to install your workaround on the emacs-24 branch, go
> for it but let's try to find a real fix for the trunk.

What kind of real fix are you looking for?  I agree with Handa-san:
being able to suppress relocation in select places is a good feature.
Why shouldn't it be the fix in this case, and what better fix can we
invent when we use an essentially externally maintained code (AFAIR,
regex will at some point be re-sync'ed with gnulib) that cannot be
expected to change its code radically so as not to access buffer text
through C pointers?

> >> >> I wonder: why do we use REL_ALLOC?
> >> > AFAIK, we do that only on platforms that don't support mmap for
> >> > allocating buffer text.
> >> So, IIUC the only reason to use it is so that we can more often return
> >> memory to the OS even for the non-mmap case?  Is that because returning
> >> memory can only be done via sbrk style memory management?
> > I don't think this is only about _returning_ memory.  It is first and
> > foremost about not _asking_ for more memory when we can come up with
> > it by reshuffling buffer text.
> 
> So you're saying it's use for fragmentation reasons?

Yes.

> But on other platforms where we use mmap, we do suffer from this
> fragmentation, and yet it doesn't seem to be a real source of problem.

I don't know enough about mmap to answer that.  I vaguely recollect
that mmap avoids such fragmentation as an inherent feature, but I may
be wrong.

> That's why I think the only real reason is because memory can only be
> returned via sbrk-style memory management (i.e. only free memory at the
> end of the heap can be returned).  Is that right?

Yes, AFAIK.

> I guess my question turns into "why do we use gmalloc.c instead of
> a malloc library that uses mmap (or some other mechanism that lets it
> return large free chunks to the OS)"?

Use of gmalloc is a different issue.  We were talking about ralloc.c.
You could use one, but not the other.

> AFAIK, Windows is pretty much the only system where we use gmalloc.c and
> ralloc.c nowadays.

My reading of configure is that we use it on more than just Windows
(and MS-DOS).  Basically, any platform that uses gmalloc.c (which is
the default, turned off only for GNU/Linux and Darwin) also uses
ralloc.c.

> Does anyone remember why we don't use the system malloc under
> Windows (and Cygwin)?

I find it hard to believe that going through system malloc on
MS-Windows will let us use buffers as large as 1.5 GB (on a 32-bit
machine).  To achieve this today, we reserve a 2GB contiguous chunk of
address space at startup, and then commit and uncommit parts of it as
needed (see w32heap.c).  ralloc.c has an important part in this
arrangement.





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-23 15:27                         ` Stefan Monnier
@ 2012-05-23 17:02                           ` Eli Zaretskii
  0 siblings, 0 replies; 39+ messages in thread
From: Eli Zaretskii @ 2012-05-23 17:02 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: schwab, 11519, lekktu

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: eliz@gnu.org,  schwab@linux-m68k.org,  lekktu@gmail.com,  11519@debbugs.gnu.org
> Date: Wed, 23 May 2012 11:27:16 -0400
> 
> We could suppress relocation all the time.

I suspect that will increase the probability of out-of-memory
conditions due to fragmentation, and system-wide memory pressure in
general, on those platforms that do use ralloc.c today.





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-23 14:16                             ` Stefan Monnier
  2012-05-23 15:23                               ` Ken Brown
  2012-05-23 16:52                               ` Eli Zaretskii
@ 2012-05-23 17:34                               ` Eli Zaretskii
  2 siblings, 0 replies; 39+ messages in thread
From: Eli Zaretskii @ 2012-05-23 17:34 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: schwab, rms, 11519, lekktu

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Richard Stallman <rms@gnu.org>,  handa@gnu.org,  schwab@linux-m68k.org,  lekktu@gmail.com,  11519@debbugs.gnu.org
> Date: Wed, 23 May 2012 10:16:17 -0400
> 
> If you really want to install your workaround on the emacs-24 branch, go
> for it

Done (revision 108012 on the emacs-24 branch).





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-23 16:52                               ` Eli Zaretskii
@ 2012-05-23 20:07                                 ` Stefan Monnier
  2012-05-24 16:22                                   ` Eli Zaretskii
  0 siblings, 1 reply; 39+ messages in thread
From: Stefan Monnier @ 2012-05-23 20:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: schwab, rms, 11519, lekktu

>> > Which other places use C pointers to buffer text and call functions
>> > that can allocate memory?
>> IIUC any place that uses STRING_CHAR_AND_LENGTH on buffer text is
>> vulnerable to the problem.
> That's not true.  As long as you access buffer text through character
> position, you are safe.

Right, some of those uses might be safe, indeed.  Of course it's not
only STRING_CHAR_AND_LENGTH but STRING_CHAR_ADVANCE as well, together
with FETCH_* macros which use those, etc...

Grepping for those macros shows they're used at *many* places, and I'd
be amazed if re_search is the only place where we don't go through the
BYTE_POS_ADDR rigmarole.

Let's see ...hmmm... yup, set-buffer-multibyte is another example,
find_charsets_in_text yet another, and I'm not even trying hard.
Just grep for "STRING_CHAR_" and see for yourself.

>> If you really want to install your workaround on the emacs-24 branch, go
>> for it but let's try to find a real fix for the trunk.
> What kind of real fix are you looking for?

One that lets us write code without having to worry about such
corner cases.  E.g. changing STRING_CHAR_ADVANCE so it can't cause
relocation.  Not using ralloc.c any more would be another good option.
Otherwise, changing our macros so they do the BYTE_POS_ADDR
internally, discouraging the use of direct pointers into the
buffer's content.

> Why shouldn't it be the fix in this case, and what better fix can we
> invent when we use an essentially externally maintained code (AFAIR,
> regex will at some point be re-sync'ed with gnulib) that cannot be
> expected to change its code radically so as not to access buffer text
> through C pointers?

To me, it's clearly a workaround.  It's an OK workaround if/when we use
a "blackbox" (i.e. externally maintained) regexp engine and keep using
ralloc.c.  But better would be to eliminate the problem altogether.

>> But on other platforms where we use mmap, we do suffer from this
>> fragmentation, and yet it doesn't seem to be a real source of problem.
> I don't know enough about mmap to answer that.  I vaguely recollect
> that mmap avoids such fragmentation as an inherent feature, but I may
> be wrong.

No, fragmentation is a property of the address space, so without
relocation you can't avoid it.

>> I guess my question turns into "why do we use gmalloc.c instead of
>> a malloc library that uses mmap (or some other mechanism that lets it
>> return large free chunks to the OS)"?
> Use of gmalloc is a different issue.  We were talking about ralloc.c.
> You could use one, but not the other.

Well, still we use ralloc because we don't use mmap, so the question to
me is: why don't we use mmap (either via a malloc that does, or
directly via USE_MMAP_FOR_BUFFERS) and get rid of ralloc.c?

>> AFAIK, Windows is pretty much the only system where we use gmalloc.c and
>> ralloc.c nowadays.
> My reading of configure is that we use it on more than just Windows
> (and MS-DOS).  Basically, any platform that uses gmalloc.c (which is
> the default, turned off only for GNU/Linux and Darwin) also uses
> ralloc.c.

To me "all minus GNU/Linux, Mac OS X, and Cygwin (which apparently uses
gmalloc but not ralloc)" is pretty close to "just Windows" nowadays.

>> Does anyone remember why we don't use the system malloc under
>> Windows (and Cygwin)?
> I find it hard to believe that going through system malloc on
> MS-Windows will let us use buffers as large as 1.5 GB (on a 32-bit
> machine).  To achieve this today, we reserve a 2GB contiguous chunk of
> address space at startup, and then commit and uncommit parts of it as
> needed (see w32heap.c).  ralloc.c has an important part in this
> arrangement.

You mean that Windows's system malloc library has a memory that's too
fragmented to be able to allocate a single 1.5G chunk?  Why?
[ I know next to nothing about the w32 API and plead guilty of
  POSIX-only thinking, so please bear with me.  ]


        Stefan





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-23 20:07                                 ` Stefan Monnier
@ 2012-05-24 16:22                                   ` Eli Zaretskii
  2012-05-28  2:15                                     ` Stefan Monnier
  0 siblings, 1 reply; 39+ messages in thread
From: Eli Zaretskii @ 2012-05-24 16:22 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: schwab, rms, 11519, lekktu

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: rms@gnu.org,  handa@gnu.org,  schwab@linux-m68k.org,  lekktu@gmail.com,  11519@debbugs.gnu.org
> Date: Wed, 23 May 2012 16:07:05 -0400
> 
> >> > Which other places use C pointers to buffer text and call functions
> >> > that can allocate memory?
> >> IIUC any place that uses STRING_CHAR_AND_LENGTH on buffer text is
> >> vulnerable to the problem.
> > That's not true.  As long as you access buffer text through character
> > position, you are safe.
> 
> Right, some of those uses might be safe, indeed.  Of course it's not
> only STRING_CHAR_AND_LENGTH but STRING_CHAR_ADVANCE as well, together
> with FETCH_* macros which use those, etc...

No, FETCH_* macros are safe, because they accept buffer positions,
fetch only one character at a time, and call STRING_CHAR_* _after_
they access the buffer.

> Grepping for those macros shows they're used at *many* places, and I'd
> be amazed if re_search is the only place where we don't go through the
> BYTE_POS_ADDR rigmarole.
> 
> Let's see ...hmmm... yup, set-buffer-multibyte is another example,
> find_charsets_in_text yet another, and I'm not even trying hard.
> Just grep for "STRING_CHAR_" and see for yourself.

I didn't mean STRING_CHAR_*.  I agree that they should be fixed not to
have such unexpected side effect.  They should be read-only operations.
As a temporary band-aid for Emacs 24.1, I suggest the change below.

What I meant is the code besides STRING_CHAR_* macros.  I don't think
you will find code that manipulates C pointers to buffer text and
calls functions that can allocate memory.

> >> But on other platforms where we use mmap, we do suffer from this
> >> fragmentation, and yet it doesn't seem to be a real source of problem.
> > I don't know enough about mmap to answer that.  I vaguely recollect
> > that mmap avoids such fragmentation as an inherent feature, but I may
> > be wrong.
> 
> No, fragmentation is a property of the address space, so without
> relocation you can't avoid it.

I asked Gerd Möllmann, who wrote the mmap-based buffer allocation
code, about this.  That code originally resided on ralloc.c and was
meant to replace the sbrk-based implementation.  So I would expect
that the issue of relocation was considered back then, and I hope Gerd
will remember why the mmap-based code was considered good enough to
replace ralloc.c.

> > I find it hard to believe that going through system malloc on
> > MS-Windows will let us use buffers as large as 1.5 GB (on a 32-bit
> > machine).  To achieve this today, we reserve a 2GB contiguous chunk of
> > address space at startup, and then commit and uncommit parts of it as
> > needed (see w32heap.c).  ralloc.c has an important part in this
> > arrangement.
> 
> You mean that Windows's system malloc library has a memory that's too
> fragmented to be able to allocate a single 1.5G chunk?  Why?

This has nothing to do with Windows APIs, so you are well equipped to
reason about this ;-)

You said "malloc", so I took an issue with the MS C runtime
implementation of malloc.  Since all the other implementations suffer
from fragmentation, there's no reason to believe that the MS
implementation avoids that danger.  A general-purpose function that
cannot move buffers behind the scenes cannot possibly avoid that.
Doing better was the original reason for writing ralloc.c.

If you meant to bypass malloc and use the Windows memory-allocation
APIs, such as VirtualAlloc, directly, then that's what we already do
in w32heap.c, which implements an emulation of sbrk that is good
enough for Emacs.  The fact that gmalloc and ralloc are used on top of
that are simply to avoid reinventing the wheel.

We could easily turn off buffer relocation in ralloc.c for good, by
fixing the value of use_relocatable_buffers at zero.  But I'm worried
that this would cause Emacs on Windows run out of memory (or act as if
it were) faster.  For example, in an Emacs session that runs for 2
weeks and has a 200MB working set, I just visited a 1.3GB file, went
to its middle and typed "C-u 30000 d" to insert 30K characters.  Emacs
had no problems enlarging the buffer, although it has only 1.9GB of
reserved memory space on that machine, so if it needed to allocate
another 1.3GB+30KB buffer (due to fragmentation, which is a certainty
after such a long time), it would have failed, I presume.

Yet another alternative is to emulate mmap on Windows using the
equivalent Windows API.  But that requires a research comparing mmap
features we need and use on Posix platforms with the features offered
by Windows, to make sure this is at all feasible.  Such a research
would need to be done by someone who knows enough about mmap and is
willing to do the job.  Do we have such a person on board?  And then
there's the implementation and testing.  Doesn't sound like an
efficient use of our time and energy.

Are there other alternatives?

Here's the band-aid I propose for emacs-24, to make the STRING_CHAR_*
macros safe:

=== modified file 'src/charset.c'
--- src/charset.c	2012-01-19 07:21:25 +0000
+++ src/charset.c	2012-05-24 16:14:05 +0000
@@ -1641,6 +1641,12 @@ maybe_unify_char (int c, Lisp_Object val
     return c;
 
   CHECK_CHARSET_GET_CHARSET (val, charset);
+#ifdef REL_ALLOC
+  /* The call to load_charset below can allocate memory, whcih screws
+     callers of this function through STRING_CHAR_* macros that hold C
+     pointers to buffer text, if REL_ALLOC is used.  */
+  r_alloc_inhibit_buffer_relocation (1);
+#endif
   load_charset (charset, 1);
   if (! inhibit_load_charset_map)
     {
@@ -1656,6 +1662,9 @@ maybe_unify_char (int c, Lisp_Object val
       if (unified > 0)
 	c = unified;
     }
+#ifdef REL_ALLOC
+  r_alloc_inhibit_buffer_relocation (0);
+#endif
   return c;
 }
 

=== modified file 'src/ralloc.c'
--- src/ralloc.c	2012-05-23 17:32:28 +0000
+++ src/ralloc.c	2012-05-24 16:16:14 +0000
@@ -1204,7 +1204,12 @@ r_alloc_reset_variable (POINTER *old, PO
 void
 r_alloc_inhibit_buffer_relocation (int inhibit)
 {
-  use_relocatable_buffers = !inhibit;
+  if (use_relocatable_buffers < 0)
+    use_relocatable_buffers = 0;
+  if (inhibit)
+    use_relocatable_buffers++;
+  else if (use_relocatable_buffers > 0)
+    use_relocatable_buffers--;
 }
 
 \f







^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-24 16:22                                   ` Eli Zaretskii
@ 2012-05-28  2:15                                     ` Stefan Monnier
  2012-05-28 16:53                                       ` Eli Zaretskii
  0 siblings, 1 reply; 39+ messages in thread
From: Stefan Monnier @ 2012-05-28  2:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: schwab, rms, 11519, lekktu

> I didn't mean STRING_CHAR_*.  I agree that they should be fixed not to
> have such unexpected side effect.  They should be read-only operations.
> As a temporary band-aid for Emacs 24.1, I suggest the change below.

Looks fine (should make the regex.c patch unnecessary).

> You said "malloc", so I took an issue with the MS C runtime
> implementation of malloc.  Since all the other implementations suffer
> from fragmentation, there's no reason to believe that the MS
> implementation avoids that danger.

The problem is inherent to the malloc API, pretty much, yes.

> We could easily turn off buffer relocation in ralloc.c for good, by
> fixing the value of use_relocatable_buffers at zero.  But I'm worried
> that this would cause Emacs on Windows run out of memory (or act as if
> it were) faster.

AFAIK, Emacs under GNU/Linux and Mac OS X uses non-relocatable buffers,
and they don't seem to suffer more from fragmentation problems than
Emacs under Windows.  But yes, unless we use an mmap-style allocation,
it would use more actual memory.

> For example, in an Emacs session that runs for 2
> weeks and has a 200MB working set, I just visited a 1.3GB file, went
> to its middle and typed "C-u 30000 d" to insert 30K characters.  Emacs

For sure editing such large file in a 32bit address space might prove
problematic without relocation, (and even with buffer relocation, some
non-buffer allocation might end up fragmenting the address space too
much) but luckily few people do that (you need to compile
with --wide-int to be able to do that).

> Yet another alternative is to emulate mmap on Windows using the
> equivalent Windows API.  But that requires a research comparing mmap
> features we need and use on Posix platforms with the features offered
> by Windows, to make sure this is at all feasible.

That would be nice.

> Such a research
> would need to be done by someone who knows enough about mmap and is
> willing to do the job.  Do we have such a person on board?

I don't volunteer.


        Stefan





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-28  2:15                                     ` Stefan Monnier
@ 2012-05-28 16:53                                       ` Eli Zaretskii
  2012-05-28 19:44                                         ` Stefan Monnier
  0 siblings, 1 reply; 39+ messages in thread
From: Eli Zaretskii @ 2012-05-28 16:53 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: schwab, rms, 11519, lekktu

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: rms@gnu.org,  handa@gnu.org,  schwab@linux-m68k.org,  lekktu@gmail.com,  11519@debbugs.gnu.org
> Date: Sun, 27 May 2012 22:15:00 -0400
> 
> > I didn't mean STRING_CHAR_*.  I agree that they should be fixed not to
> > have such unexpected side effect.  They should be read-only operations.
> > As a temporary band-aid for Emacs 24.1, I suggest the change below.
> 
> Looks fine

Installed as revision 108020 on the emacs-24 branch.

> (should make the regex.c patch unnecessary).

I wouldn't recommend removing the calls to
r_alloc_inhibit_buffer_relocation around calls to regex, as that is an
external function we don't control.  But if you feel strongly about
that, I will remove them.





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-28 16:53                                       ` Eli Zaretskii
@ 2012-05-28 19:44                                         ` Stefan Monnier
  2012-05-28 20:47                                           ` Eli Zaretskii
  0 siblings, 1 reply; 39+ messages in thread
From: Stefan Monnier @ 2012-05-28 19:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: schwab, rms, 11519, lekktu

>> (should make the regex.c patch unnecessary).

> I wouldn't recommend removing the calls to
> r_alloc_inhibit_buffer_relocation around calls to regex, as that is an
> external function we don't control.  But if you feel strongly about
> that, I will remove them.

I don't feel strongly about that.  But this code is all ours, it's not
an external function (it used to be half-external, and then I hacked on
it half-extensively, after which the new version was sync'd back from
Emacs to gnulib, after which the gnulib side dropped it, and AFAIK Emacs
is the last remaining development line of this code, except maybe for
XEmacs's version).


        Stefan





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-28 19:44                                         ` Stefan Monnier
@ 2012-05-28 20:47                                           ` Eli Zaretskii
  2012-05-29  1:23                                             ` Stefan Monnier
  0 siblings, 1 reply; 39+ messages in thread
From: Eli Zaretskii @ 2012-05-28 20:47 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: schwab, rms, 11519, lekktu

> From: Stefan Monnier <monnier@IRO.UMontreal.CA>
> Cc: rms@gnu.org, handa@gnu.org, schwab@linux-m68k.org, lekktu@gmail.com,
>         11519@debbugs.gnu.org
> Date: Mon, 28 May 2012 15:44:18 -0400
> 
> > I wouldn't recommend removing the calls to
> > r_alloc_inhibit_buffer_relocation around calls to regex, as that is an
> > external function we don't control.  But if you feel strongly about
> > that, I will remove them.
> 
> I don't feel strongly about that.  But this code is all ours, it's not
> an external function (it used to be half-external, and then I hacked on
> it half-extensively, after which the new version was sync'd back from
> Emacs to gnulib, after which the gnulib side dropped it, and AFAIK Emacs
> is the last remaining development line of this code, except maybe for
> XEmacs's version).

If you are confident that regex.c doesn't call malloc in any other
place, and that chances for it to do so in the future are low enough,
then I'm okay with removing the calls to
r_alloc_inhibit_buffer_relocation.





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-28 20:47                                           ` Eli Zaretskii
@ 2012-05-29  1:23                                             ` Stefan Monnier
  2012-05-29 16:02                                               ` Eli Zaretskii
  0 siblings, 1 reply; 39+ messages in thread
From: Stefan Monnier @ 2012-05-29  1:23 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: schwab, rms, 11519, lekktu

> If you are confident that regex.c doesn't call malloc in any other
> place, and that chances for it to do so in the future are low enough,

I'm pretty confident, yes.


        Stefan





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-29  1:23                                             ` Stefan Monnier
@ 2012-05-29 16:02                                               ` Eli Zaretskii
  2012-06-02 20:44                                                 ` Juanma Barranquero
  0 siblings, 1 reply; 39+ messages in thread
From: Eli Zaretskii @ 2012-05-29 16:02 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: schwab, rms, 11519, lekktu

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: rms@gnu.org,  handa@gnu.org,  schwab@linux-m68k.org,  lekktu@gmail.com,  11519@debbugs.gnu.org
> Date: Mon, 28 May 2012 21:23:13 -0400
> 
> > If you are confident that regex.c doesn't call malloc in any other
> > place, and that chances for it to do so in the future are low enough,
> 
> I'm pretty confident, yes.

Removed in revision 108021 on the emacs-24 branch.





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-05-29 16:02                                               ` Eli Zaretskii
@ 2012-06-02 20:44                                                 ` Juanma Barranquero
  2012-06-03  4:18                                                   ` Eli Zaretskii
  0 siblings, 1 reply; 39+ messages in thread
From: Juanma Barranquero @ 2012-06-02 20:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 11519

> Removed in revision 108021 on the emacs-24 branch.

This bug fix hasn't yet been ported back to the trunk, has it?

    Juanma





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-06-02 20:44                                                 ` Juanma Barranquero
@ 2012-06-03  4:18                                                   ` Eli Zaretskii
  2013-12-28  8:41                                                     ` Glenn Morris
  0 siblings, 1 reply; 39+ messages in thread
From: Eli Zaretskii @ 2012-06-03  4:18 UTC (permalink / raw)
  To: Juanma Barranquero; +Cc: 11519

> From: Juanma Barranquero <lekktu@gmail.com>
> Date: Sat, 2 Jun 2012 22:44:39 +0200
> Cc: 11519@debbugs.gnu.org
> 
> > Removed in revision 108021 on the emacs-24 branch.
> 
> This bug fix hasn't yet been ported back to the trunk, has it?

?? Of course, it was.  The function maybe_unify_char inhibits
relocation of buffer text.

Why, did you have the problem again?





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2012-06-03  4:18                                                   ` Eli Zaretskii
@ 2013-12-28  8:41                                                     ` Glenn Morris
  2013-12-28  9:48                                                       ` Eli Zaretskii
  0 siblings, 1 reply; 39+ messages in thread
From: Glenn Morris @ 2013-12-28  8:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Juanma Barranquero, 11519


Is this report still relevant?





^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
  2013-12-28  8:41                                                     ` Glenn Morris
@ 2013-12-28  9:48                                                       ` Eli Zaretskii
  0 siblings, 0 replies; 39+ messages in thread
From: Eli Zaretskii @ 2013-12-28  9:48 UTC (permalink / raw)
  To: Glenn Morris; +Cc: lekktu, 11519-done

> From: Glenn Morris <rgm@gnu.org>
> Cc: Juanma Barranquero <lekktu@gmail.com>,  11519@debbugs.gnu.org
> Date: Sat, 28 Dec 2013 03:41:32 -0500
> 
> 
> Is this report still relevant?

No.  Closing.





^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2013-12-28  9:48 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-19 16:10 bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping Juanma Barranquero
2012-05-19 16:27 ` Eli Zaretskii
2012-05-19 21:40   ` Juanma Barranquero
2012-05-20 17:27     ` Eli Zaretskii
2012-05-20 19:00       ` Juanma Barranquero
2012-05-21  1:50         ` Stefan Monnier
2012-05-21  2:51           ` Eli Zaretskii
2012-05-21  7:59             ` Andreas Schwab
2012-05-21 17:51               ` Eli Zaretskii
2012-05-21 20:39                 ` Stefan Monnier
2012-05-22 19:00                   ` Eli Zaretskii
2012-05-22 19:19                     ` Stefan Monnier
2012-05-22 19:47                       ` Eli Zaretskii
2012-05-23  0:47                         ` Stefan Monnier
2012-05-23  2:59                           ` Eli Zaretskii
2012-05-23 14:16                             ` Stefan Monnier
2012-05-23 15:23                               ` Ken Brown
2012-05-23 16:52                               ` Eli Zaretskii
2012-05-23 20:07                                 ` Stefan Monnier
2012-05-24 16:22                                   ` Eli Zaretskii
2012-05-28  2:15                                     ` Stefan Monnier
2012-05-28 16:53                                       ` Eli Zaretskii
2012-05-28 19:44                                         ` Stefan Monnier
2012-05-28 20:47                                           ` Eli Zaretskii
2012-05-29  1:23                                             ` Stefan Monnier
2012-05-29 16:02                                               ` Eli Zaretskii
2012-06-02 20:44                                                 ` Juanma Barranquero
2012-06-03  4:18                                                   ` Eli Zaretskii
2013-12-28  8:41                                                     ` Glenn Morris
2013-12-28  9:48                                                       ` Eli Zaretskii
2012-05-23 17:34                               ` Eli Zaretskii
2012-05-23 14:10                       ` Kenichi Handa
2012-05-23 15:27                         ` Stefan Monnier
2012-05-23 17:02                           ` Eli Zaretskii
2012-05-22 14:38                 ` Kenichi Handa
2012-05-22 19:02                   ` Eli Zaretskii
2012-05-21  1:49       ` Stefan Monnier
2012-05-21  2:50         ` Eli Zaretskii
2012-05-21  3:21           ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).