all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: npostavs@users.sourceforge.net
Cc: beebe@math.utah.edu, monnier@IRO.UMontreal.CA, 5700@debbugs.gnu.org
Subject: bug#5700: emacs-23 and 8-bit characters in 128..255
Date: Thu, 07 Jul 2016 19:21:47 +0300	[thread overview]
Message-ID: <831t35l8pw.fsf@gnu.org> (raw)
In-Reply-To: <87h9c2cojz.fsf@users.sourceforge.net> (npostavs@users.sourceforge.net)

> From: npostavs@users.sourceforge.net
> Date: Wed, 06 Jul 2016 19:52:16 -0400
> Cc: "Nelson H. F. Beebe" <beebe@math.utah.edu>, 5700@debbugs.gnu.org
> 
> With Emacs 24/25, using "\u00FF" works:
> 
> (string-equal (buffer-substring (point) (1+ (point))) "\u00FF")
> (looking-at "\u00FF")
> 
> Seems to be another instance of the unibyte vs multibyte string escape syntax thing:
> 
>        You can also use hexadecimal escape sequences (‘\xN’) and octal
>     escape sequences (‘\N’) in string constants.  *But beware:* If a
>     string constant contains hexadecimal or octal escape sequences, and
>     these escape sequences all specify unibyte characters (i.e., less
>     than 256), and there are no other literal non-ASCII characters or
>     Unicode-style escape sequences in the string, then Emacs
>     automatically assumes that it is a unibyte string.  That is to say,
>     it assumes that all non-ASCII characters occurring in the string are
>     8-bit raw bytes.
> 
> Stefan Monnier <monnier@IRO.UMontreal.CA> writes:
> > which seems acceptable, whereas under Emacs-23 we have:
> >
> [...]
> >   (multibyte-string-p "\377")   prints as    "\377"
> 
> In 23.4 it returns returns nil

Yes.

The other significant piece of the puzzle is described in this text
from the ELisp manual:

     For technical reasons, a unibyte and a multibyte string are ‘equal’
     if and only if they contain the same sequence of character codes
     and all these codes are either in the range 0 through 127 (ASCII)
     or 160 through 255 (‘eight-bit-graphic’).  However, when a unibyte
     string is converted to a multibyte string, all characters with
     codes in the range 160 through 255 are converted to characters with
     higher codes, whereas ASCII characters remain unchanged.  Thus, a
     unibyte string and its conversion to multibyte are only ‘equal’ if
     the string is all ASCII.  Character codes 160 through 255 are not
     entirely proper in multibyte text, even though they can occur.  As
     a consequence, the situation where a unibyte and a multibyte string
     are ‘equal’ without both being all ASCII is a technical oddity that
     very few Emacs Lisp programmers ever get confronted with.  *Note
     Text Representations::.

This was one of the significant changes in Emacs 23, and I think it is
the main factor for the changed behavior reported by Nelson.





      reply	other threads:[~2016-07-07 16:21 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-09 19:51 bug#5700: emacs-23 and 8-bit characters in 128..255 Nelson H. F. Beebe
2010-03-09 22:02 ` Stefan Monnier
2016-07-06 23:52   ` npostavs
2016-07-07 16:21     ` Eli Zaretskii [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=831t35l8pw.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=5700@debbugs.gnu.org \
    --cc=beebe@math.utah.edu \
    --cc=monnier@IRO.UMontreal.CA \
    --cc=npostavs@users.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.