From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: rasmith@tamu.edu Newsgroups: gmane.emacs.bugs Subject: bug#5797: 23.1; search-forward in unibyte buffer for \377 Date: Mon, 29 Mar 2010 10:09:19 -0500 (CDT) Message-ID: <20100329.100919.319083499807539873.rasmith@aristotle.tamu.edu> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1269896370 1912 80.91.229.12 (29 Mar 2010 20:59:30 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Mon, 29 Mar 2010 20:59:30 +0000 (UTC) To: 5797@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon Mar 29 22:59:25 2010 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1NwM3P-0004sH-Qn for geb-bug-gnu-emacs@m.gmane.org; Mon, 29 Mar 2010 22:59:24 +0200 Original-Received: from localhost ([127.0.0.1]:55396 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NwM3P-00034N-3P for geb-bug-gnu-emacs@m.gmane.org; Mon, 29 Mar 2010 16:59:23 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NwIFa-0002sO-2e for bug-gnu-emacs@gnu.org; Mon, 29 Mar 2010 12:55:42 -0400 Original-Received: from [140.186.70.92] (port=42195 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NwIFY-0002rx-EQ for bug-gnu-emacs@gnu.org; Mon, 29 Mar 2010 12:55:41 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1NwIFU-0002wU-5L for bug-gnu-emacs@gnu.org; Mon, 29 Mar 2010 12:55:40 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:44879) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1NwIFT-0002wN-VD for bug-gnu-emacs@gnu.org; Mon, 29 Mar 2010 12:55:36 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.69) (envelope-from ) id 1NwHvZ-0001k5-PY; Mon, 29 Mar 2010 12:35:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: rasmith@tamu.edu Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 29 Mar 2010 16:35:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 5797 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.12698804906688 (code B ref -1); Mon, 29 Mar 2010 16:35:01 +0000 Original-Received: (at submit) by debbugs.gnu.org; 29 Mar 2010 16:34:50 +0000 Original-Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1NwHvN-0001jp-1K for submit@debbugs.gnu.org; Mon, 29 Mar 2010 12:34:50 -0400 Original-Received: from mail.gnu.org ([199.232.76.166] helo=mx10.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1NwGp5-0000nI-Rt for submit@debbugs.gnu.org; Mon, 29 Mar 2010 11:24:17 -0400 Original-Received: from lists.gnu.org ([199.232.76.165]:51456) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1NwGoz-0003b0-Cm for submit@debbugs.gnu.org; Mon, 29 Mar 2010 11:24:09 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NwGoy-0008Sa-Bz for bug-gnu-emacs@gnu.org; Mon, 29 Mar 2010 11:24:08 -0400 Original-Received: from [140.186.70.92] (port=45707 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NwGov-0006Kh-Dy for bug-gnu-emacs@gnu.org; Mon, 29 Mar 2010 11:24:07 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1NwGaj-0004nD-KC for bug-gnu-emacs@gnu.org; Mon, 29 Mar 2010 11:09:27 -0400 Original-Received: from aristotle.tamu.edu ([128.194.75.5]:27894) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1NwGaj-0004n8-CX for bug-gnu-emacs@gnu.org; Mon, 29 Mar 2010 11:09:25 -0400 Original-Received: from localhost (localhost [127.0.0.1]) by aristotle.tamu.edu (Postfix) with ESMTP id 2B731E041C for ; Mon, 29 Mar 2010 10:09:19 -0500 (CDT) X-Mailer: Mew version 6.3 on Emacs 23.1 / Mule 6.0 (HANACHIRUSATO) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-Mailman-Approved-At: Mon, 29 Mar 2010 12:34:47 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list Resent-Date: Mon, 29 Mar 2010 12:35:01 -0400 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Mailman-Approved-At: Mon, 29 Mar 2010 16:54:41 -0400 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:35796 Archived-At: Please write in English if possible, because the Emacs maintainers usually do not have translators to read other languages for them. Your bug report will be posted to the bug-gnu-emacs@gnu.org mailing lis= t, and to the gnu.emacs.bug news group. Please describe exactly what actions triggered the bug and the precise symptoms of the bug: search-forward fails to find a unibyte \377 in a raw unibyte buffer. I use "cgreek", a package written by Naoto Takahashi for handling polytonic (ancient, fully accented) Greek. It includes a file, cgreek-tlg.el, for processing the files in the Thesaurus Linguae Graecae, which have their own unique formats. In these files, the byte \377 is used as a string terminator. Prior to emacs23, these files could be processed by reading the file in with insert-file-contents-literally, making the buffer unibyte with (set-buffer-multibyte nil), and searching for the string terminator with (search-forward (char-to-string ?\xff)). However, that search now fails to find a single byte \377 and instead matches on the two-byte sequence \231\277. = Changing the search function to (search-forward (unibyte-string ?\377))= has the same result. = On investigation, I see the following: After further investigation, I'm not certain it's a bug: it may be an intentional part of the modifications to accommodate utf-8. Here are the details; In a multibyte-buffer (set-buffer-multibyte t), = = (search-forward (char-to-string ?\xff)) matches utf-8 "=FF" (i.e. \303\= 277) (search-forward (char-to-string ?\377)) matches utf-8 "=FF" (search-forward (unibyte-string ?\377)) matches byte \377 In a unibyte buffer (set-buffer-multibyte nil) (search-forward (char-to-string ?\xff)) matches \231\277 (search-forward (char-to-string ?\377)) matches \231\277 (search-forward (unibyte-string ?\377)) matches \231\277 In other words, search-forward cannot find byte \377 when searching in a *unibyte* buffer, but it can find that same byte if the buffer is changed to multibyte. The reason is that in a unibyte buffer, search-forward apparently changes byte \377 to a two-byte representation (but not to utf-8, which would be \303\277). = This may be exactly the intended behavior of search-forward, but it breaks scripts expecting search-forward to be able to find a single high 8-bit byte in a unibyte buffer. In context, changing the buffer to multibyte is not a solution. The code in which I found this error can be fixed by replacing (search-forward (char-to-string ?\xff)) with (skip-chars-forward "^\377") (forward-char 1) (fix provided by Naoto Takahashi) However, that means that scripts counting on the old behavior of search-forward will have to be modified. = If Emacs crashed, and you have the Emacs process in the gdb debugger, please include the output from the following gdb commands: `bt full' and `xbacktrace'. If you would like to further debug the crash, please read the file /usr/local/share/emacs/23.1/etc/DEBUG for instructions. In GNU Emacs 23.1.1 (amd64-portbld-freebsd8.0, GTK+ Version 2.18.7) of 2010-03-25 on aristotle.tamu.edu Windowing system distributor `The X.Org Foundation', version 11.0.10605= 000 configured using `configure '--with-x-toolkit=3Dgtk' '--x-libraries=3D= /usr/local/lib' '--x-includes=3D/usr/local/include' '--prefix=3D/usr/lo= cal' '--mandir=3D/usr/local/man' '--infodir=3D/usr/local/info/' '--buil= d=3Damd64-portbld-freebsd8.0' 'build_alias=3Damd64-portbld-freebsd8.0' = 'CC=3Dcc' 'CFLAGS=3D-O2 -pipe -fno-strict-aliasing' 'LDFLAGS=3D-L/usr/l= ocal/lib -lintl' 'CPPFLAGS=3D-I/usr/local/include'' Important settings: value of $LC_ALL: en_US.UTF-8 value of $LC_COLLATE: nil value of $LC_CTYPE: nil value of $LC_MESSAGES: nil value of $LC_MONETARY: nil value of $LC_NUMERIC: nil value of $LC_TIME: nil value of $LANG: en_US.UTF-8 value of $XMODIFIERS: nil locale-coding-system: utf-8-unix default-enable-multibyte-characters: t Major mode: Lisp Interaction Minor modes in effect: tooltip-mode: t tool-bar-mode: t mouse-wheel-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t blink-cursor-mode: t global-auto-composition-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t line-number-mode: t transient-mark-mode: t Recent input: o C-q 0 0 0 = C-q 3 7 7 C-x C-e = C-x o = C-q 2 3 1 ] C-q 2 7 7 = C-e C-x C-e C-x = C-e = = = = C-k C-y C-y = t = C-x C-e = = C-x C-e C-x o C-x C-e = = ( s e a r c h - f o r w a r d SPC ( c h a r - = t o - s t r i o n g = g g SPC n g SPC = ? \ x f f ) ) C-x C-e C-x o C-x = C-e C-e C-x C-e C-e = C-x C-e C-x C-e = C-e C-x C-e C-e C-x C-e C-x o = C-q 3 7 = 7 = C-x C-e = C-x C-e C-x C-e = = C-x C-e C-e C-x C-e = C-e C-x C-e C-e C-x C-e = = M-x r e p o r t b Recent messages: Entering debugger... 326 Entering debugger... nil 369 [3 times] t Entering debugger... 374 [2 times] 366 nil 369 [3 times]