From: rasmith@tamu.edu
To: 5797@debbugs.gnu.org
Subject: bug#5797: 23.1; search-forward in unibyte buffer for \377
Date: Mon, 29 Mar 2010 10:09:19 -0500 (CDT) [thread overview]
Message-ID: <20100329.100919.319083499807539873.rasmith@aristotle.tamu.edu> (raw)
Please write in English if possible, because the Emacs maintainers
usually do not have translators to read other languages for them.
Your bug report will be posted to the bug-gnu-emacs@gnu.org mailing list,
and to the gnu.emacs.bug news group.
Please describe exactly what actions triggered the bug
and the precise symptoms of the bug:
search-forward fails to find a unibyte \377 in a raw unibyte buffer.
I use "cgreek", a package written by Naoto Takahashi for handling
polytonic (ancient, fully accented) Greek. It includes a file,
cgreek-tlg.el, for processing the files in the Thesaurus Linguae
Graecae, which have their own unique formats. In these files, the
byte \377 is used as a string terminator. Prior to emacs23, these
files could be processed by reading the file in with
insert-file-contents-literally, making the buffer unibyte with
(set-buffer-multibyte nil), and searching for the string terminator
with (search-forward (char-to-string ?\xff)). However, that search
now fails to find a single byte \377 and instead matches on the
two-byte sequence \231\277.
Changing the search function to (search-forward (unibyte-string ?\377))
has the same result.
On investigation, I see the following:
After further investigation, I'm not certain it's a bug: it may be an
intentional part of the modifications to accommodate utf-8. Here are
the details;
In a multibyte-buffer (set-buffer-multibyte t),
(search-forward (char-to-string ?\xff)) matches utf-8 "ÿ" (i.e. \303\277)
(search-forward (char-to-string ?\377)) matches utf-8 "ÿ"
(search-forward (unibyte-string ?\377)) matches byte \377
In a unibyte buffer (set-buffer-multibyte nil)
(search-forward (char-to-string ?\xff)) matches \231\277
(search-forward (char-to-string ?\377)) matches \231\277
(search-forward (unibyte-string ?\377)) matches \231\277
In other words, search-forward cannot find byte \377 when searching in
a *unibyte* buffer, but it can find that same byte if the buffer is
changed to multibyte. The reason is that in a unibyte buffer,
search-forward apparently changes byte \377 to a two-byte
representation (but not to utf-8, which would be \303\277).
This may be exactly the intended behavior of search-forward, but it
breaks scripts expecting search-forward to be able to find a single
high 8-bit byte in a unibyte buffer. In context, changing the buffer
to multibyte is not a solution.
The code in which I found this error can be fixed by replacing
(search-forward (char-to-string ?\xff))
with
(skip-chars-forward "^\377")
(forward-char 1)
(fix provided by Naoto Takahashi)
However, that means that scripts counting on the old behavior of
search-forward will have to be modified.
If Emacs crashed, and you have the Emacs process in the gdb debugger,
please include the output from the following gdb commands:
`bt full' and `xbacktrace'.
If you would like to further debug the crash, please read the file
/usr/local/share/emacs/23.1/etc/DEBUG for instructions.
In GNU Emacs 23.1.1 (amd64-portbld-freebsd8.0, GTK+ Version 2.18.7)
of 2010-03-25 on aristotle.tamu.edu
Windowing system distributor `The X.Org Foundation', version 11.0.10605000
configured using `configure '--with-x-toolkit=gtk' '--x-libraries=/usr/local/lib' '--x-includes=/usr/local/include' '--prefix=/usr/local' '--mandir=/usr/local/man' '--infodir=/usr/local/info/' '--build=amd64-portbld-freebsd8.0' 'build_alias=amd64-portbld-freebsd8.0' 'CC=cc' 'CFLAGS=-O2 -pipe -fno-strict-aliasing' 'LDFLAGS=-L/usr/local/lib -lintl' 'CPPFLAGS=-I/usr/local/include''
Important settings:
value of $LC_ALL: en_US.UTF-8
value of $LC_COLLATE: nil
value of $LC_CTYPE: nil
value of $LC_MESSAGES: nil
value of $LC_MONETARY: nil
value of $LC_NUMERIC: nil
value of $LC_TIME: nil
value of $LANG: en_US.UTF-8
value of $XMODIFIERS: nil
locale-coding-system: utf-8-unix
default-enable-multibyte-characters: t
Major mode: Lisp Interaction
Minor modes in effect:
tooltip-mode: t
tool-bar-mode: t
mouse-wheel-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
global-auto-composition-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
line-number-mode: t
transient-mark-mode: t
Recent input:
o <down> <down> <down> <return> C-q 0 0 0 <return>
C-q 3 7 7 <return> <up> <up> <up> <left> <up> C-x C-e
C-x o <down> <down> <down> <down> <backspace> <backspace>
C-q 2 3 1 <return> ] <backspace> C-q 2 7 7 <return>
<up> <up> <up> <up> C-e C-x C-e <up> <up> <left> C-x
C-e <up> <up> <switch-frame> <down-mouse-1> <mouse-movement>
<switch-frame> <mouse-1> <help-echo> <switch-frame>
<switch-frame> <switch-frame> <switch-frame> <switch-frame>
<switch-frame> <switch-frame> <switch-frame> <help-echo>
<up> <up> <left> <up> <right> C-k C-y <return> C-y
<left> <backspace> <backspace> <backspace> t <right>
C-x C-e <down> <right> <right> <right> <right> <right>
<right> <right> <right> <right> <right> <right> <right>
<right> <right> <right> C-x C-e C-x o <down> C-x C-e
<up> <up> <up> <left> <left> <left> <left> <return>
<up> ( s e a r c h - f o r w a r d SPC ( c h a r -
t o - s t r i o n g <backspace> <backspace> <backspace>
g <backspace> g SPC <backspace> <backspace> n g SPC
? \ x f f ) ) C-x C-e C-x o <up> <up> <down> <up> C-x
C-e <down> <down> C-e C-x C-e <up> <up> <up> <up> C-e
C-x C-e <up> <up> <left> C-x C-e <up> <up> <up> <up>
<up> <up> C-e C-x C-e <down> C-e C-x C-e C-x o <down>
<down> <down> <down> <down> <down> <return> C-q 3 7
7 <return> <up> <up> <up> <up> <up> <up> <left> <left>
C-x C-e <up> <up> <up> <up> <up> <up> <down> <left>
<left> C-x C-e <up> <up> <up> <up> <left> C-x C-e <up>
<up> <up> <up> <up> <left> <left> <left> <left> <left>
C-x C-e <down> <down> C-e C-x C-e <up> <up> <up> <up>
C-e C-x C-e <up> <up> <up> C-e C-x C-e <down> <switch-frame>
<switch-frame> <help-echo> <help-echo> <help-echo>
M-x r e p o r t <tab> b <tab> <return>
Recent messages:
Entering debugger...
326
Entering debugger...
nil
369 [3 times]
t
Entering debugger...
374 [2 times]
366
nil
369 [3 times]
reply other threads:[~2010-03-29 15:09 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100329.100919.319083499807539873.rasmith@aristotle.tamu.edu \
--to=rasmith@tamu.edu \
--cc=5797@debbugs.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.