From: "Nelson H. F. Beebe" <beebe@math.utah.edu>
To: 5700@debbugs.gnu.org
Cc: beebe@math.utah.edu
Subject: bug#5700: emacs-23 and 8-bit characters in 128..255
Date: Tue, 9 Mar 2010 12:51:31 -0700 (MST) [thread overview]
Message-ID: <CMM.0.95.0.1268164291.beebe@psi.math.utah.edu> (raw)
When emacs-23 came out and I began to use it, I noticed problems in
some of my extensive locally-written emacs code.
I've been far too busy to try to track down why, and sometimes, the
problems were resolved simply by rerunning byte-compile-file.
This morning, I set out to track down the source of one of the
problems in a function that I use a lot, and eventually narrowed it to
the failure of functions like these:
(string-equal (buffer-substring (point) (1+ (point))) "\377")
(looking-at "\377")
In emacs-22 and earlier, if the character at point is octal 377
(decimal 255, hexadecimal 0xff), this function returns t. In
emacs-23, it returns nil. Further testing shows identical behavior
for characters in the decimal range 128--255 (octal \200--\377).
I suspect the reason is this comment in the NEWS file:
The internal encoding used for buffers and strings is now
Unicode-based and called `utf-8-emacs' (`emacs-internal' is an alias
for this). This encoding is backward-compatible with Unicode's UTF-8
encoding. The internal encoding previously used by Emacs,
`emacs-mule', is still available for reading and writing files.
The code in question uses the character ?\377 as a unique sentinel
that terminates the function's processing. It needs to be a
nonprintable character that is not use in normal text files, and I
found that changing it to ?\177 (ASCII DELete) made the code work
properly. That change is transparent to older emacs versions, so in
this case, it is harmless. Nevertheless, since the technique of using
data sentinels is an ancient practice in many programing languages, I
suspect that my own code is not the only Emacs Lisp code to be
affected by the change.
The question for this list is this:
If UTF-8 is used internally in the buffer text, then why are
numeric representations of unprintable characters in search
strings apparently not translated the same way?
In all of my Emacs Lisp source code files, the character set is plain
ASCII, which is a proper subset of UTF-8, requiring only a single byte
per character.
-------------------------------------------------------------------------------
- Nelson H. F. Beebe Tel: +1 801 581 5254 -
- University of Utah FAX: +1 801 581 4148 -
- Department of Mathematics, 110 LCB Internet e-mail: beebe@math.utah.edu -
- 155 S 1400 E RM 233 beebe@acm.org beebe@computer.org -
- Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------
next reply other threads:[~2010-03-09 19:51 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-09 19:51 Nelson H. F. Beebe [this message]
2010-03-09 22:02 ` bug#5700: emacs-23 and 8-bit characters in 128..255 Stefan Monnier
2016-07-06 23:52 ` npostavs
2016-07-07 16:21 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CMM.0.95.0.1268164291.beebe@psi.math.utah.edu \
--to=beebe@math.utah.edu \
--cc=5700@debbugs.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).