From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: rasmith@tamu.edu Newsgroups: gmane.emacs.help Subject: Re: search-forward in emacs23 lisp Date: Mon, 29 Mar 2010 10:01:17 -0500 (CDT) Message-ID: <20100329.100117.57809164175956070.rasmith@aristotle.tamu.edu> References: <20100327.153148.886429907165788179.rasmith@aristotle.tamu.edu> <87y6hchx0i.fsf@gnu.org> <831vf339k4.fsf@gnu.org> Reply-To: rasmith@tamu.edu NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1269875946 15275 80.91.229.12 (29 Mar 2010 15:19:06 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Mon, 29 Mar 2010 15:19:06 +0000 (UTC) Cc: help-gnu-emacs@gnu.org To: eliz@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Mon Mar 29 17:19:02 2010 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1NwGjz-0004qx-CH for geh-help-gnu-emacs@m.gmane.org; Mon, 29 Mar 2010 17:18:59 +0200 Original-Received: from localhost ([127.0.0.1]:36089 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NwGjy-0005xq-H4 for geh-help-gnu-emacs@m.gmane.org; Mon, 29 Mar 2010 11:18:58 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NwGTA-0006Vq-1v for help-gnu-emacs@gnu.org; Mon, 29 Mar 2010 11:01:36 -0400 Original-Received: from [140.186.70.92] (port=34048 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NwGT8-0006Vc-I6 for help-gnu-emacs@gnu.org; Mon, 29 Mar 2010 11:01:35 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1NwGT6-0003rH-JI for help-gnu-emacs@gnu.org; Mon, 29 Mar 2010 11:01:33 -0400 Original-Received: from aristotle.tamu.edu ([128.194.75.5]:27906) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1NwGSs-0003pW-WA; Mon, 29 Mar 2010 11:01:19 -0400 Original-Received: from localhost (localhost [127.0.0.1]) by aristotle.tamu.edu (Postfix) with ESMTP id A3571E041C; Mon, 29 Mar 2010 10:01:17 -0500 (CDT) In-Reply-To: <831vf339k4.fsf@gnu.org> X-Mailer: Mew version 6.3 on Emacs 23.1 / Mule 6.0 (HANACHIRUSATO) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:72534 Archived-At: From: Eli Zaretskii Subject: Re: search-forward in emacs23 lisp Date: Mon, 29 Mar 2010 09:51:07 +0300 >> From: bojohan@gnu.org (Johan =3D?utf-8?Q?Bockg=3DC3=3DA5rd?=3D) >> Date: Mon, 29 Mar 2010 01:00:45 +0200 >> Cc: = >> = >> There does seem to be a bug regarding search in unibyte buffers, > = > Please report this ASAP to the Emacs bug-tracker. Emacs 23.2 is in > the last stages of pretest, and so we should not waste any time > discussing bugs here, if we want them to be fixed in the next release= .= > = After further investigation, I'm not certain it's a bug: it may be an intentional part of the modifications to accommodate utf-8. Here are the details; In a multibyte-buffer (set-buffer-multibyte t), = = (search-forward (char-to-string ?\xff)) matches utf-8 "=FF" (i.e. \303\= 277) (search-forward (char-to-string ?\377)) matches utf-8 "=FF" (search-forward (unibyte-string ?\377)) matches byte \377 In a unibyte buffer (set-buffer-multibyte nil) (search-forward (char-to-string ?\xff)) matches \231\277 (search-forward (char-to-string ?\377)) matches \231\277 (search-forward (unibyte-string ?\377)) matches \231\277 In other words, search-forward cannot find byte \377 when searching in a *unibyte* buffer, but it can find that same byte if the buffer is changed to multibyte. The reason is that in a unibyte buffer, search-forward apparently changes byte \377 to a two-byte representation (but not to utf-8, which would be \303\277). = The code I had a problem with can be fixed by using char-after (or more elegantly, I've now learned, using skip-chars-forward), However, there's probably other code out there that's now broken because of this. Is it a bug, or was it a mistake to expect search-forward to find a single high byte in a multibyte buffer in the first place? Robin Smith