From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: 23.0.60; Segmentation fault loading auto-lang.el Date: Tue, 08 Apr 2008 21:42:14 -0400 Message-ID: References: <87r6dg3oe2.fsf@stupidchicken.com> <87skxwl29o.fsf@stupidchicken.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1207705351 1309 80.91.229.12 (9 Apr 2008 01:42:31 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 9 Apr 2008 01:42:31 +0000 (UTC) Cc: intrigeri@boum.org, emacs-devel@gnu.org, 103@emacsbugs.donarmstrong.com, Kenichi Handa To: Chong Yidong Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Apr 09 03:43:03 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1JjPL4-0002Iu-LB for ged-emacs-devel@m.gmane.org; Wed, 09 Apr 2008 03:43:02 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JjPKR-0004mu-6r for ged-emacs-devel@m.gmane.org; Tue, 08 Apr 2008 21:42:23 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1JjPKM-0004is-AX for emacs-devel@gnu.org; Tue, 08 Apr 2008 21:42:18 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1JjPKK-0004fc-Ll for emacs-devel@gnu.org; Tue, 08 Apr 2008 21:42:17 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JjPKK-0004fM-GY for emacs-devel@gnu.org; Tue, 08 Apr 2008 21:42:16 -0400 Original-Received: from ironport2-out.pppoe.ca ([206.248.154.182] helo=ironport2-out.teksavvy.com) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1JjPKK-0002v9-Ds for emacs-devel@gnu.org; Tue, 08 Apr 2008 21:42:16 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AiUFAJ27+0dMCpOw/2dsb2JhbACBXKtC X-IronPort-AV: E=Sophos;i="4.25,626,1199682000"; d="scan'208";a="18153129" Original-Received: from smtp.pppoe.ca (HELO smtp.teksavvy.com) ([65.39.196.238]) by ironport2-out.teksavvy.com with ESMTP; 08 Apr 2008 21:42:15 -0400 Original-Received: from pastel.home ([76.10.147.176]) by smtp.teksavvy.com (Internet Mail Server v1.0) with ESMTP id PFZ30415; Tue, 08 Apr 2008 21:42:15 -0400 Original-Received: by pastel.home (Postfix, from userid 20848) id E066E8C24; Tue, 8 Apr 2008 21:42:14 -0400 (EDT) In-Reply-To: <87skxwl29o.fsf@stupidchicken.com> (Chong Yidong's message of "Tue, 08 Apr 2008 12:50:11 -0400") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux) X-detected-kernel: by monty-python.gnu.org: Genre and OS details not recognized. X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:94745 Archived-At: >>> (let ((str (string-as-unibyte "=E4"))) >>> (string-match (char-to-string (string-to-char str)) str)) >>=20 >>> evaluates to 0 in Emacs 22, and to nil in Emacs 23. It turns out that >>> this screws up the use of all-completions in regexp-opt-group. >>=20 >>> Anyone have any idea what's going on here? >>=20 >> (string-as-unibyte "=E4") =3D> "\303\244" >> (string-to-char "\303\244") =3D> 195 (because ?\303 =3D=3D 195) >> (char-to-string 195) =3D> "=C3" (because 195=3D=3D0xC3 U+00C3=3D=3D'=C3') >> (string-match "=C3" "=E4") =3D> nil (obvious) >>=20 >> Any Lisp program that depends on the result of >> string-as-unibyte (thus Emacs' internal character >> representation) won't work in Emacs 23. Notice that the problem is unrelated to string-as-unibyte: (string-match (char-to-string (string-to-char str)) str) this should intuitively always return 0. Of course, once you replace `char-to-string' with just `string', you may be reminded that Emacs-23 introduced `unibyte-string', which leads you to the key, if `str' is unibyte, you need to do (string-match (unibyte-string (string-to-char str)) str) In Emacs-22, `string' used a heuristic to decide whether to build a unibyte or multibyte string, and more importantly, the character representing byte code 209 had code 209, whereas in Emacs-23, we have the strange situation that byte 209 is character 4194257. So an integer <256 needs to be accompagnied with some contextual info that says whether it represents a char or a byte, otherwise you get ambiguity that lead to bugs. And string-to-char returns either a byte or a char depending on whether the string was unibyte or multibyte. > I see. However, maybe the following change to regexp-opt-group in > regexp-opt.el would make things a little more predictable. What do you > think? Yes, it looks like a good fix. Maybe "-no-properties" would be even better. Stefan