From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: YAMAMOTO Mitsuharu Newsgroups: gmane.emacs.bugs Subject: bug#3687: 23.1.50; inconsistency in multibyte eight-bit regexps Date: Sat, 27 Jun 2009 10:30:10 +0900 Organization: Faculty of Science, Chiba University Message-ID: References: <200906260956.n5Q9uo917123@church.math.s.chiba-u.ac.jp> <83my7vyute.fsf@gnu.org> Reply-To: YAMAMOTO Mitsuharu , 3687@emacsbugs.donarmstrong.com NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII X-Trace: ger.gmane.org 1246066680 8811 80.91.229.12 (27 Jun 2009 01:38:00 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 27 Jun 2009 01:38:00 +0000 (UTC) Cc: 3687@emacsbugs.donarmstrong.com To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sat Jun 27 03:37:53 2009 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1MKMrY-0000aY-KF for geb-bug-gnu-emacs@m.gmane.org; Sat, 27 Jun 2009 03:37:52 +0200 Original-Received: from localhost ([127.0.0.1]:56493 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MKMrY-00085R-20 for geb-bug-gnu-emacs@m.gmane.org; Fri, 26 Jun 2009 21:37:52 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MKMrE-0007p8-Tz for bug-gnu-emacs@gnu.org; Fri, 26 Jun 2009 21:37:33 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MKMrA-0007kK-08 for bug-gnu-emacs@gnu.org; Fri, 26 Jun 2009 21:37:32 -0400 Original-Received: from [199.232.76.173] (port=57877 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MKMr9-0007k9-LW for bug-gnu-emacs@gnu.org; Fri, 26 Jun 2009 21:37:27 -0400 Original-Received: from rzlab.ucr.edu ([138.23.92.77]:46291) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1MKMr8-0004Ep-Vs for bug-gnu-emacs@gnu.org; Fri, 26 Jun 2009 21:37:27 -0400 Original-Received: from rzlab.ucr.edu (rzlab.ucr.edu [127.0.0.1]) by rzlab.ucr.edu (8.14.3/8.14.3/Debian-5) with ESMTP id n5R1bN6i015958; Fri, 26 Jun 2009 18:37:25 -0700 Original-Received: (from debbugs@localhost) by rzlab.ucr.edu (8.14.3/8.14.3/Submit) id n5R1Z83O015458; Fri, 26 Jun 2009 18:35:08 -0700 X-Loop: owner@emacsbugs.donarmstrong.com Resent-From: YAMAMOTO Mitsuharu Resent-To: bug-submit-list@donarmstrong.com Resent-CC: Emacs Bugs Resent-Date: Sat, 27 Jun 2009 01:35:08 +0000 Resent-Message-ID: Resent-Sender: owner@emacsbugs.donarmstrong.com X-Emacs-PR-Message: followup 3687 X-Emacs-PR-Package: emacs X-Emacs-PR-Keywords: Original-Received: via spool by 3687-submit@emacsbugs.donarmstrong.com id=B3687.124606621814710 (code B ref 3687); Sat, 27 Jun 2009 01:35:08 +0000 Original-Received: (at 3687) by emacsbugs.donarmstrong.com; 27 Jun 2009 01:30:18 +0000 X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. Original-Received: from mathmail.math.s.chiba-u.ac.jp (mathmail.math.s.chiba-u.ac.jp [133.82.132.2]) by rzlab.ucr.edu (8.14.3/8.14.3/Debian-5) with ESMTP id n5R1UBcV014656 for <3687@emacsbugs.donarmstrong.com>; Fri, 26 Jun 2009 18:30:13 -0700 Original-Received: from church.math.s.chiba-u.ac.jp (church [133.82.132.36]) by mathmail.math.s.chiba-u.ac.jp (Postfix) with ESMTP id 443552C40; Sat, 27 Jun 2009 10:30:10 +0900 (JST) In-Reply-To: <83my7vyute.fsf@gnu.org> User-Agent: Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.8 =?UTF-8?Q?(Shij=C5=8D)?= APEL/10.6 Emacs/22.3 (sparc-sun-solaris2.8) MULE/5.0 (SAKAKI) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 2) Resent-Date: Fri, 26 Jun 2009 21:37:32 -0400 X-BeenThere: bug-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:29013 Archived-At: >>>>> On Fri, 26 Jun 2009 16:43:25 +0300, Eli Zaretskii said: >> Date: Fri, 26 Jun 2009 18:56:50 +0900 (JST) >> From: YAMAMOTO Mitsuharu >> Cc: >> >> The following results look inconsistent: >> >> (string-match (string-to-multibyte "\x80") (string-to-multibyte "\x80")) >> => 0 >> (string-match (string-to-multibyte "\x80") "\x80") >> => nil >> >> (string-match (string-to-multibyte "[\x80]") (string-to-multibyte "\x80")) >> => nil >> (string-match (string-to-multibyte "[\x80]") "\x80") >> => 0 > Please tell why you think they are inconsistent. I thought there's no room for argument about their inconsistency with respect to the specification of "[...]" in regexps. > More importantly, please show real-life examples of code or > situations where this gets in your way. If you decode some data containing invalid (undecodable) byte sequences using a coding system such as utf-8, then such sequences are embedded in the decoded result as eight-bit characters in multibyte form. You can detect particular such sequences by searching a "characer alternative" regexp (or its multibyte form) in the decoded result if it works. Further examples that look inconsistent: (string-match (string-to-multibyte "[\x80\x81]") (string-to-multibyte "\x80")) => nil (string-match (string-to-multibyte "[\x80-\xbf]") (string-to-multibyte "\x80")) => nil (string-match (string-to-multibyte "[\x80-\xc0]") (string-to-multibyte "\x80")) => 0 (string-match (string-to-multibyte "[\x80-\xc0]") (string-to-multibyte "\xbf")) => 0 (string-match (string-to-multibyte "[\x80-\xc0]") (string-to-multibyte "\xc0")) => nil > This area is full of subtleties and gotchas, and in general the > current code does what it does because it needs to cater to many > different practical situations. > There could still be bugs, of course. Yeah. I found another suspected bug in this area: (string-match "[[:unibyte:]]" "\x80") => nil (string-match "[[:unibyte:]]" (string-to-multibyte "\x80")) => nil YAMAMOTO Mitsuharu mituharu@math.s.chiba-u.ac.jp