From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: regex.c bug? - Re: HTML Mode and Turkish Locale - Segfault Date: Tue, 28 Nov 2006 10:17:29 +0900 Message-ID: References: <871wnqncwa.fsf@medic.epidio.net> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1164676926 17787 80.91.229.2 (28 Nov 2006 01:22:06 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 28 Nov 2006 01:22:06 +0000 (UTC) Cc: eliz@gnu.org, cfb@cafer.org, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Nov 28 02:21:59 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1GorfS-00068j-FI for ged-emacs-devel@m.gmane.org; Tue, 28 Nov 2006 02:21:50 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GorfR-0000TQ-P3 for ged-emacs-devel@m.gmane.org; Mon, 27 Nov 2006 20:21:49 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Gorae-0004IG-7E for emacs-devel@gnu.org; Mon, 27 Nov 2006 20:16:52 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Gorad-0004HC-1c for emacs-devel@gnu.org; Mon, 27 Nov 2006 20:16:51 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Gorac-0004H9-TH for emacs-devel@gnu.org; Mon, 27 Nov 2006 20:16:50 -0500 Original-Received: from [150.29.246.133] (helo=mx1.aist.go.jp) by monty-python.gnu.org with esmtp (Exim 4.52) id 1Goraa-0005dr-BU; Mon, 27 Nov 2006 20:16:49 -0500 Original-Received: from rqsmtp1.aist.go.jp (rqsmtp1.aist.go.jp [150.29.254.115]) by mx1.aist.go.jp with ESMTP id kAS1GVeA019864; Tue, 28 Nov 2006 10:16:31 +0900 (JST) env-from (handa@m17n.org) Original-Received: from smtp3.aist.go.jp by rqsmtp1.aist.go.jp with ESMTP id kAS1GVgo012110; Tue, 28 Nov 2006 10:16:31 +0900 (JST) env-from (handa@m17n.org) Original-Received: by smtp3.aist.go.jp with ESMTP id kAS1GSgB028390; Tue, 28 Nov 2006 10:16:28 +0900 (JST) env-from (handa@m17n.org) Original-Received: from handa by etlken.m17n.org with local (Exim 4.63) (envelope-from ) id 1GorbF-0001He-Dk; Tue, 28 Nov 2006 10:17:29 +0900 Original-To: Kenichi Handa In-reply-to: (message from Kenichi Handa on Mon, 27 Nov 2006 17:19:19 +0900) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/22.0.91 (i686-pc-linux-gnu) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:62886 Archived-At: It seems that I found the reason of the attached crash. Currently we have this code in regex.c. if (multibyte) SET_RANGE_TABLE_WORK_AREA_BIT (range_table_work, re_wctype_to_bit (cc)); for (ch = 0; ch < 1 << BYTEWIDTH; ++ch) { int translated = TRANSLATE (ch); if (re_iswctype (btowc (ch), cc)) SET_LIST_BIT (translated); } In tr_TR.UTF-8, 'I' is translated to #x51051 (U+0131). But, it seems that SET_LIST_BIT assumes that the argument is less than 256 (or 128). So, I've just installed the following change. @@ -2939,7 +2939,8 @@ for (ch = 0; ch < 1 << BYTEWIDTH; ++ch) { int translated = TRANSLATE (ch); - if (re_iswctype (btowc (ch), cc)) + if (translated < (1 << BYTEWIDTH) + && re_iswctype (btowc (ch), cc)) SET_LIST_BIT (translated); } If translated is set to a mutibyte character, I think the above SET_RANGE_TABLE_WORK_AREA_BIT handles such a case. Stefan, could you please confirm that my guess above is correct? --- Kenichi Handa handa@m17n.org In article , Kenichi Handa writes: > [1 ] > In article , Eli Zaretskii writes: > > > From: cfb@cafer.org (Cafer =?utf-8?B?xZ5pbcWfZWs=?=) > > > Date: Sun, 26 Nov 2006 22:58:29 +0200 > > > > > > It's crash when using html-mode randomly (seg fault) when using > > > tr_TR.UTF-8 locale. I've tried it en_US.UTF-8 locale and it seems > > > working. > > > > > > I've tried with both (from CVS and from Debian Repository) > > > > > > Version: 22.0.91.1 > > Thank you for your report. > > However, there's not enough information in this for us to try to find > > out what is wrong. Please use "M-x report-emacs-bug RET" to provide > > the information. Also, since this is a segfault, please run GDB on > > the core file, type the command "bt" inside GDB, and post the > > resulting backtrace here. > I can reproduce it with the following scenario (on Debian > testing) with the attached temp.html. But, I have not yet > found what is wrong. I suspect that case-table handling has > a problem because it happenes only in tr_TR.UTF-8. > (gdb) set env LANG=tr_TR.UTF-8 > (gdb) run -Q temp.html > ESC : (garbage-collect) RET > Then Emacs crashes as this: > Program received signal SIGSEGV, Segmentation fault. > mark_object (arg=139689009) at alloc.c:5717 > (gdb) bt > #0 mark_object (arg=139689009) at alloc.c:5717 > #1 0x0813ab66 in mark_object (arg=139272845) at alloc.c:5825 > #2 0x0813ab66 in mark_object (arg=141201765) at alloc.c:5825 > #3 0x0813af6e in mark_object (arg=138980860) at alloc.c:5700 > [...] > #119 0x0813aa7f in mark_object (arg=139883241) at alloc.c:5714 > #120 0x0813af6e in mark_object (arg=137465060) at alloc.c:5700 > #121 0x0813e8ff in Fgarbage_collect () at alloc.c:5156 > #122 0x081522b3 in Feval (form=141197693) at eval.c:2325 > #123 0x08152da7 in Ffuncall (nargs=2, args=0xafcfabb0) at eval.c:2997 > #124 0x0817d61a in Fbyte_code (bytestr=136311491, vector=136311508, maxdepth=40) at bytecode.c:679 > #125 0x08152844 in funcall_lambda (fun=136311436, nargs=2, arg_vector=0xafcface4) at eval.c:3184 > #126 0x08152c5b in Ffuncall (nargs=3, args=0xafcface0) at eval.c:3054 > #127 0x08154523 in Fapply (nargs=2, args=0xafcfad30) at eval.c:2485 > #128 0x08154654 in apply1 (fn=137689233, arg=141197565) at eval.c:2749 > #129 0x0814fdf7 in Fcall_interactively (function=137689233, record_flag=137464009, keys=137504524) at callint.c:406 > #130 0x080f09c3 in Fcommand_execute (cmd=137689233, record_flag=137464009, keys=137464009, special=137464009) at keyboard.c:9867 > #131 0x080fc00a in command_loop_1 () at keyboard.c:1858 > #132 0x0815187b in internal_condition_case (bfun=0x80fbc90 , handlers=137508713, hfun=0x80f66a0 ) at eval.c:1481 > #133 0x080f5a7e in command_loop_2 () at keyboard.c:1326 > #134 0x0815193c in internal_catch (tag=137504921, func=0x80f5a50 , arg=137464009) at eval.c:1222 > #135 0x080f64ee in command_loop () at keyboard.c:1305 > #136 0x080f6878 in recursive_edit_1 () at keyboard.c:1003 > #137 0x080f6966 in Frecursive_edit () at keyboard.c:1064 > #138 0x080ecbb2 in main (argc=1526726658, argv=0xafcfb5c4) at emacs.c:1794 > Lisp Backtrace: > "garbage-collect" (0x2) > "eval" (0x86a817d) > "eval-expression" (0x86a817d) > "call-interactively" (0x834f891) > (gdb) > --- > Kenichi Handa > handa@m17n.org > [2 ] > > > > > Sample > > > > > [3 ] > _______________________________________________ > Emacs-devel mailing list > Emacs-devel@gnu.org > http://lists.gnu.org/mailman/listinfo/emacs-devel