all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Kenichi Handa <handa@m17n.org>
Cc: eliz@gnu.org, cfb@cafer.org, emacs-devel@gnu.org
Subject: regex.c bug? - Re: HTML Mode and Turkish Locale - Segfault
Date: Tue, 28 Nov 2006 10:17:29 +0900	[thread overview]
Message-ID: <E1GorbF-0001He-Dk@etlken.m17n.org> (raw)
In-Reply-To: <E1Gobhv-0007ci-6R@etlken.m17n.org> (message from Kenichi Handa on Mon, 27 Nov 2006 17:19:19 +0900)

It seems that I found the reason of the attached crash.

Currently we have this code in regex.c.

			if (multibyte)
			  SET_RANGE_TABLE_WORK_AREA_BIT (range_table_work,
							 re_wctype_to_bit (cc));

                        for (ch = 0; ch < 1 << BYTEWIDTH; ++ch)
			  {
			    int translated = TRANSLATE (ch);
			    if (re_iswctype (btowc (ch), cc))
			      SET_LIST_BIT (translated);
			  }

In tr_TR.UTF-8, 'I' is translated to #x51051 (U+0131).  But,
it seems that SET_LIST_BIT assumes that the argument is less
than 256 (or 128).  So, I've just installed the following
change.

@@ -2939,7 +2939,8 @@
                         for (ch = 0; ch < 1 << BYTEWIDTH; ++ch)
 			  {
 			    int translated = TRANSLATE (ch);
-			    if (re_iswctype (btowc (ch), cc))
+			    if (translated < (1 << BYTEWIDTH)
+				&& re_iswctype (btowc (ch), cc))
 			      SET_LIST_BIT (translated);
 			  }

If translated is set to a mutibyte character, I think the
above SET_RANGE_TABLE_WORK_AREA_BIT handles such a case.

Stefan, could you please confirm that my guess above is
correct?

---
Kenichi Handa
handa@m17n.org

In article <E1Gobhv-0007ci-6R@etlken.m17n.org>, Kenichi Handa <handa@m17n.org> writes:

> [1  <text/plain; US-ASCII (7bit)>]
> In article <uu00lz96t.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > > From: cfb@cafer.org (Cafer =?utf-8?B?xZ5pbcWfZWs=?=)
> > > Date: Sun, 26 Nov 2006 22:58:29 +0200
> > > 
> > > It's crash when using html-mode randomly (seg fault) when using
> > > tr_TR.UTF-8 locale. I've tried it en_US.UTF-8 locale and it seems
> > > working.
> > > 
> > > I've tried with both (from CVS and from Debian Repository)
> > > 
> > > Version: 22.0.91.1

> > Thank you for your report.

> > However, there's not enough information in this for us to try to find
> > out what is wrong.  Please use "M-x report-emacs-bug RET" to provide
> > the information.  Also, since this is a segfault, please run GDB on
> > the core file, type the command "bt" inside GDB, and post the
> > resulting backtrace here.

> I can reproduce it with the following scenario (on Debian
> testing) with the attached temp.html.  But, I have not yet
> found what is wrong.  I suspect that case-table handling has
> a problem because it happenes only in tr_TR.UTF-8.

> (gdb) set env LANG=tr_TR.UTF-8
> (gdb) run -Q temp.html

> ESC : (garbage-collect) RET

> Then Emacs crashes as this:

> Program received signal SIGSEGV, Segmentation fault.
> mark_object (arg=139689009) at alloc.c:5717
> (gdb) bt
> #0  mark_object (arg=139689009) at alloc.c:5717
> #1  0x0813ab66 in mark_object (arg=139272845) at alloc.c:5825
> #2  0x0813ab66 in mark_object (arg=141201765) at alloc.c:5825
> #3  0x0813af6e in mark_object (arg=138980860) at alloc.c:5700
> [...]
> #119 0x0813aa7f in mark_object (arg=139883241) at alloc.c:5714
> #120 0x0813af6e in mark_object (arg=137465060) at alloc.c:5700
> #121 0x0813e8ff in Fgarbage_collect () at alloc.c:5156
> #122 0x081522b3 in Feval (form=141197693) at eval.c:2325
> #123 0x08152da7 in Ffuncall (nargs=2, args=0xafcfabb0) at eval.c:2997
> #124 0x0817d61a in Fbyte_code (bytestr=136311491, vector=136311508, maxdepth=40) at bytecode.c:679
> #125 0x08152844 in funcall_lambda (fun=136311436, nargs=2, arg_vector=0xafcface4) at eval.c:3184
> #126 0x08152c5b in Ffuncall (nargs=3, args=0xafcface0) at eval.c:3054
> #127 0x08154523 in Fapply (nargs=2, args=0xafcfad30) at eval.c:2485
> #128 0x08154654 in apply1 (fn=137689233, arg=141197565) at eval.c:2749
> #129 0x0814fdf7 in Fcall_interactively (function=137689233, record_flag=137464009, keys=137504524) at callint.c:406
> #130 0x080f09c3 in Fcommand_execute (cmd=137689233, record_flag=137464009, keys=137464009, special=137464009) at keyboard.c:9867
> #131 0x080fc00a in command_loop_1 () at keyboard.c:1858
> #132 0x0815187b in internal_condition_case (bfun=0x80fbc90 <command_loop_1>, handlers=137508713, hfun=0x80f66a0 <cmd_error>) at eval.c:1481
> #133 0x080f5a7e in command_loop_2 () at keyboard.c:1326
> #134 0x0815193c in internal_catch (tag=137504921, func=0x80f5a50 <command_loop_2>, arg=137464009) at eval.c:1222
> #135 0x080f64ee in command_loop () at keyboard.c:1305
> #136 0x080f6878 in recursive_edit_1 () at keyboard.c:1003
> #137 0x080f6966 in Frecursive_edit () at keyboard.c:1064
> #138 0x080ecbb2 in main (argc=1526726658, argv=0xafcfb5c4) at emacs.c:1794

> Lisp Backtrace:
> "garbage-collect" (0x2)
> "eval" (0x86a817d)
> "eval-expression" (0x86a817d)
> "call-interactively" (0x834f891)
> (gdb) 

> ---
> Kenichi Handa
> handa@m17n.org

> [2  <text/html; US-ASCII (7bit)>]
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
> <html>

> <head>
>    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
>    <title> Sample </title>
> </head>

> <body>
> </body>
> </html>
> [3  <text/plain; us-ascii (7bit)>]
> _______________________________________________
> Emacs-devel mailing list
> Emacs-devel@gnu.org
> http://lists.gnu.org/mailman/listinfo/emacs-devel

  parent reply	other threads:[~2006-11-28  1:17 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-26 20:58 HTML Mode and Turkish Locale - Segfault Cafer Şimşek
2006-11-27  6:37 ` Eli Zaretskii
2006-11-27  8:19   ` Kenichi Handa
2006-11-27 13:24     ` Eli Zaretskii
2006-11-28  1:17     ` Kenichi Handa [this message]
2006-11-28  6:49       ` regex.c bug? - " Stefan Monnier
2006-11-29 18:15         ` Cafer Şimşek
2006-11-29 19:46           ` Andreas Schwab
2006-11-30  2:09           ` Kenichi Handa
2006-12-04  1:30             ` Cafer Simsek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E1GorbF-0001He-Dk@etlken.m17n.org \
    --to=handa@m17n.org \
    --cc=cfb@cafer.org \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.