all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Nelson H. F. Beebe" <beebe@math.utah.edu>
To: 5700@debbugs.gnu.org
Cc: beebe@math.utah.edu
Subject: bug#5700: emacs-23 and 8-bit characters in 128..255
Date: Tue, 9 Mar 2010 12:51:31 -0700 (MST)	[thread overview]
Message-ID: <CMM.0.95.0.1268164291.beebe@psi.math.utah.edu> (raw)

When emacs-23 came out and I began to use it, I noticed problems in
some of my extensive locally-written emacs code.

I've been far too busy to try to track down why, and sometimes, the
problems were resolved simply by rerunning byte-compile-file.

This morning, I set out to track down the source of one of the
problems in a function that I use a lot, and eventually narrowed it to
the failure of functions like these:

    (string-equal (buffer-substring (point) (1+ (point))) "\377")
    (looking-at "\377")

In emacs-22 and earlier, if the character at point is octal 377
(decimal 255, hexadecimal 0xff), this function returns t.  In
emacs-23, it returns nil.  Further testing shows identical behavior
for characters in the decimal range 128--255 (octal \200--\377).

I suspect the reason is this comment in the NEWS file:

    The internal encoding used for buffers and strings is now
    Unicode-based and called `utf-8-emacs' (`emacs-internal' is an alias
    for this).  This encoding is backward-compatible with Unicode's UTF-8
    encoding.  The internal encoding previously used by Emacs,
    `emacs-mule', is still available for reading and writing files.

The code in question uses the character ?\377 as a unique sentinel
that terminates the function's processing.  It needs to be a
nonprintable character that is not use in normal text files, and I
found that changing it to ?\177 (ASCII DELete) made the code work
properly.  That change is transparent to older emacs versions, so in
this case, it is harmless.  Nevertheless, since the technique of using
data sentinels is an ancient practice in many programing languages, I
suspect that my own code is not the only Emacs Lisp code to be
affected by the change.

The question for this list is this:

    If UTF-8 is used internally in the buffer text, then why are
    numeric representations of unprintable characters in search
    strings apparently not translated the same way?

In all of my Emacs Lisp source code files, the character set is plain
ASCII, which is a proper subset of UTF-8, requiring only a single byte
per character.

-------------------------------------------------------------------------------
- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- University of Utah                    FAX: +1 801 581 4148                  -
- Department of Mathematics, 110 LCB    Internet e-mail: beebe@math.utah.edu  -
- 155 S 1400 E RM 233                       beebe@acm.org  beebe@computer.org -
- Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------







             reply	other threads:[~2010-03-09 19:51 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-09 19:51 Nelson H. F. Beebe [this message]
2010-03-09 22:02 ` bug#5700: emacs-23 and 8-bit characters in 128..255 Stefan Monnier
2016-07-06 23:52   ` npostavs
2016-07-07 16:21     ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CMM.0.95.0.1268164291.beebe@psi.math.utah.edu \
    --to=beebe@math.utah.edu \
    --cc=5700@debbugs.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.