From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping Date: Thu, 24 May 2012 19:22:46 +0300 Message-ID: <83wr41wnu1.fsf@gnu.org> References: <83d360yw48.fsf@gnu.org> <834nrazrtl.fsf@gnu.org> <831umez1p7.fsf@gnu.org> <83vcjpxw18.fsf@gnu.org> <83k404xcpt.fsf@gnu.org> <83hav8xak1.fsf@gnu.org> <83ehqby542.fsf@gnu.org> <838vgiyh4q.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE X-Trace: dough.gmane.org 1337876604 10060 80.91.229.3 (24 May 2012 16:23:24 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Thu, 24 May 2012 16:23:24 +0000 (UTC) Cc: schwab@linux-m68k.org, rms@gnu.org, 11519@debbugs.gnu.org, lekktu@gmail.com To: Stefan Monnier Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu May 24 18:23:21 2012 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1SXaop-0005sr-Lg for geb-bug-gnu-emacs@m.gmane.org; Thu, 24 May 2012 18:23:19 +0200 Original-Received: from localhost ([::1]:41373 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SXaop-00086t-9G for geb-bug-gnu-emacs@m.gmane.org; Thu, 24 May 2012 12:23:19 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:51367) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SXaom-00086B-1P for bug-gnu-emacs@gnu.org; Thu, 24 May 2012 12:23:18 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SXaod-0007FB-3S for bug-gnu-emacs@gnu.org; Thu, 24 May 2012 12:23:15 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:60891) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SXaoc-0007Ez-Vd for bug-gnu-emacs@gnu.org; Thu, 24 May 2012 12:23:07 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.72) (envelope-from ) id 1SXapW-0001uM-6i for bug-gnu-emacs@gnu.org; Thu, 24 May 2012 12:24:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 24 May 2012 16:24:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 11519 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 11519-submit@debbugs.gnu.org id=B11519.13378766267300 (code B ref 11519); Thu, 24 May 2012 16:24:02 +0000 Original-Received: (at 11519) by debbugs.gnu.org; 24 May 2012 16:23:46 +0000 Original-Received: from localhost ([127.0.0.1]:42203 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SXapE-0001tg-M2 for submit@debbugs.gnu.org; Thu, 24 May 2012 12:23:46 -0400 Original-Received: from mtaout20.012.net.il ([80.179.55.166]:44823) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SXapA-0001tI-KQ for 11519@debbugs.gnu.org; Thu, 24 May 2012 12:23:42 -0400 Original-Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0M4J00300BWQSE00@a-mtaout20.012.net.il> for 11519@debbugs.gnu.org; Thu, 24 May 2012 19:22:38 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([87.69.210.75]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0M4J002I5C5JWPH0@a-mtaout20.012.net.il>; Thu, 24 May 2012 19:22:32 +0300 (IDT) In-reply-to: X-012-Sender: halo1@inter.net.il X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:60337 Archived-At: > From: Stefan Monnier > Cc: rms@gnu.org, handa@gnu.org, schwab@linux-m68k.org, lekktu@gm= ail.com, 11519@debbugs.gnu.org > Date: Wed, 23 May 2012 16:07:05 -0400 >=20 > >> > Which other places use C pointers to buffer text and call func= tions > >> > that can allocate memory? > >> IIUC any place that uses STRING_CHAR_AND_LENGTH on buffer text i= s > >> vulnerable to the problem. > > That's not true. As long as you access buffer text through chara= cter > > position, you are safe. >=20 > Right, some of those uses might be safe, indeed. Of course it's no= t > only STRING_CHAR_AND_LENGTH but STRING_CHAR_ADVANCE as well, togeth= er > with FETCH_* macros which use those, etc... No, FETCH_* macros are safe, because they accept buffer positions, fetch only one character at a time, and call STRING_CHAR_* _after_ they access the buffer. > Grepping for those macros shows they're used at *many* places, and = I'd > be amazed if re_search is the only place where we don't go through = the > BYTE_POS_ADDR rigmarole. >=20 > Let's see ...hmmm... yup, set-buffer-multibyte is another example, > find_charsets_in_text yet another, and I'm not even trying hard. > Just grep for "STRING_CHAR_" and see for yourself. I didn't mean STRING_CHAR_*. I agree that they should be fixed not t= o have such unexpected side effect. They should be read-only operation= s. As a temporary band-aid for Emacs 24.1, I suggest the change below. What I meant is the code besides STRING_CHAR_* macros. I don't think you will find code that manipulates C pointers to buffer text and calls functions that can allocate memory. > >> But on other platforms where we use mmap, we do suffer from this > >> fragmentation, and yet it doesn't seem to be a real source of pr= oblem. > > I don't know enough about mmap to answer that. I vaguely recolle= ct > > that mmap avoids such fragmentation as an inherent feature, but I= may > > be wrong. >=20 > No, fragmentation is a property of the address space, so without > relocation you can't avoid it. I asked Gerd M=F6llmann, who wrote the mmap-based buffer allocation code, about this. That code originally resided on ralloc.c and was meant to replace the sbrk-based implementation. So I would expect that the issue of relocation was considered back then, and I hope Ger= d will remember why the mmap-based code was considered good enough to replace ralloc.c. > > I find it hard to believe that going through system malloc on > > MS-Windows will let us use buffers as large as 1.5 GB (on a 32-bi= t > > machine). To achieve this today, we reserve a 2GB contiguous chu= nk of > > address space at startup, and then commit and uncommit parts of i= t as > > needed (see w32heap.c). ralloc.c has an important part in this > > arrangement. >=20 > You mean that Windows's system malloc library has a memory that's t= oo > fragmented to be able to allocate a single 1.5G chunk? Why? This has nothing to do with Windows APIs, so you are well equipped to reason about this ;-) You said "malloc", so I took an issue with the MS C runtime implementation of malloc. Since all the other implementations suffer =66rom fragmentation, there's no reason to believe that the MS implementation avoids that danger. A general-purpose function that cannot move buffers behind the scenes cannot possibly avoid that. Doing better was the original reason for writing ralloc.c. If you meant to bypass malloc and use the Windows memory-allocation APIs, such as VirtualAlloc, directly, then that's what we already do in w32heap.c, which implements an emulation of sbrk that is good enough for Emacs. The fact that gmalloc and ralloc are used on top o= f that are simply to avoid reinventing the wheel. We could easily turn off buffer relocation in ralloc.c for good, by fixing the value of use_relocatable_buffers at zero. But I'm worried that this would cause Emacs on Windows run out of memory (or act as i= f it were) faster. For example, in an Emacs session that runs for 2 weeks and has a 200MB working set, I just visited a 1.3GB file, went to its middle and typed "C-u 30000 d" to insert 30K characters. Emac= s had no problems enlarging the buffer, although it has only 1.9GB of reserved memory space on that machine, so if it needed to allocate another 1.3GB+30KB buffer (due to fragmentation, which is a certainty after such a long time), it would have failed, I presume. Yet another alternative is to emulate mmap on Windows using the equivalent Windows API. But that requires a research comparing mmap features we need and use on Posix platforms with the features offered by Windows, to make sure this is at all feasible. Such a research would need to be done by someone who knows enough about mmap and is willing to do the job. Do we have such a person on board? And then there's the implementation and testing. Doesn't sound like an efficient use of our time and energy. Are there other alternatives? Here's the band-aid I propose for emacs-24, to make the STRING_CHAR_* macros safe: =3D=3D=3D modified file 'src/charset.c' --- src/charset.c=092012-01-19 07:21:25 +0000 +++ src/charset.c=092012-05-24 16:14:05 +0000 @@ -1641,6 +1641,12 @@ maybe_unify_char (int c, Lisp_Object val return c; =20 CHECK_CHARSET_GET_CHARSET (val, charset); +#ifdef REL_ALLOC + /* The call to load_charset below can allocate memory, whcih screw= s + callers of this function through STRING_CHAR_* macros that hold= C + pointers to buffer text, if REL_ALLOC is used. */ + r_alloc_inhibit_buffer_relocation (1); +#endif load_charset (charset, 1); if (! inhibit_load_charset_map) { @@ -1656,6 +1662,9 @@ maybe_unify_char (int c, Lisp_Object val if (unified > 0) =09c =3D unified; } +#ifdef REL_ALLOC + r_alloc_inhibit_buffer_relocation (0); +#endif return c; } =20 =3D=3D=3D modified file 'src/ralloc.c' --- src/ralloc.c=092012-05-23 17:32:28 +0000 +++ src/ralloc.c=092012-05-24 16:16:14 +0000 @@ -1204,7 +1204,12 @@ r_alloc_reset_variable (POINTER *old, PO void r_alloc_inhibit_buffer_relocation (int inhibit) { - use_relocatable_buffers =3D !inhibit; + if (use_relocatable_buffers < 0) + use_relocatable_buffers =3D 0; + if (inhibit) + use_relocatable_buffers++; + else if (use_relocatable_buffers > 0) + use_relocatable_buffers--; } =20 =0C