From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#5700: emacs-23 and 8-bit characters in 128..255 Date: Thu, 07 Jul 2016 19:21:47 +0300 Message-ID: <831t35l8pw.fsf@gnu.org> References: <87h9c2cojz.fsf@users.sourceforge.net> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1467908672 7543 80.91.229.3 (7 Jul 2016 16:24:32 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 7 Jul 2016 16:24:32 +0000 (UTC) Cc: beebe@math.utah.edu, monnier@IRO.UMontreal.CA, 5700@debbugs.gnu.org To: npostavs@users.sourceforge.net Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Jul 07 18:24:14 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1bLC62-0007IR-NF for geb-bug-gnu-emacs@m.gmane.org; Thu, 07 Jul 2016 18:24:14 +0200 Original-Received: from localhost ([::1]:41068 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bLC61-0003QW-Sf for geb-bug-gnu-emacs@m.gmane.org; Thu, 07 Jul 2016 12:24:13 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:51031) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bLC4x-0002bZ-Fe for bug-gnu-emacs@gnu.org; Thu, 07 Jul 2016 12:23:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bLC4s-0005FK-Al for bug-gnu-emacs@gnu.org; Thu, 07 Jul 2016 12:23:06 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:57293) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bLC4s-0005FG-7Q for bug-gnu-emacs@gnu.org; Thu, 07 Jul 2016 12:23:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1bLC4s-0000eK-3E for bug-gnu-emacs@gnu.org; Thu, 07 Jul 2016 12:23:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 07 Jul 2016 16:23:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 5700 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: notabug Original-Received: via spool by 5700-submit@debbugs.gnu.org id=B5700.14679085342423 (code B ref 5700); Thu, 07 Jul 2016 16:23:02 +0000 Original-Received: (at 5700) by debbugs.gnu.org; 7 Jul 2016 16:22:14 +0000 Original-Received: from localhost ([127.0.0.1]:41397 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bLC45-0000d0-LO for submit@debbugs.gnu.org; Thu, 07 Jul 2016 12:22:13 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:59770) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bLC43-0000cm-4P for 5700@debbugs.gnu.org; Thu, 07 Jul 2016 12:22:12 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bLC3t-00058F-QF for 5700@debbugs.gnu.org; Thu, 07 Jul 2016 12:22:05 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:57323) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bLC3t-000589-N5; Thu, 07 Jul 2016 12:22:01 -0400 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:1912 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1bLC3r-00020p-Kx; Thu, 07 Jul 2016 12:22:00 -0400 In-reply-to: <87h9c2cojz.fsf@users.sourceforge.net> (npostavs@users.sourceforge.net) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:120574 Archived-At: > From: npostavs@users.sourceforge.net > Date: Wed, 06 Jul 2016 19:52:16 -0400 > Cc: "Nelson H. F. Beebe" , 5700@debbugs.gnu.org > > With Emacs 24/25, using "\u00FF" works: > > (string-equal (buffer-substring (point) (1+ (point))) "\u00FF") > (looking-at "\u00FF") > > Seems to be another instance of the unibyte vs multibyte string escape syntax thing: > > You can also use hexadecimal escape sequences (‘\xN’) and octal > escape sequences (‘\N’) in string constants. *But beware:* If a > string constant contains hexadecimal or octal escape sequences, and > these escape sequences all specify unibyte characters (i.e., less > than 256), and there are no other literal non-ASCII characters or > Unicode-style escape sequences in the string, then Emacs > automatically assumes that it is a unibyte string. That is to say, > it assumes that all non-ASCII characters occurring in the string are > 8-bit raw bytes. > > Stefan Monnier writes: > > which seems acceptable, whereas under Emacs-23 we have: > > > [...] > > (multibyte-string-p "\377") prints as "\377" > > In 23.4 it returns returns nil Yes. The other significant piece of the puzzle is described in this text from the ELisp manual: For technical reasons, a unibyte and a multibyte string are ‘equal’ if and only if they contain the same sequence of character codes and all these codes are either in the range 0 through 127 (ASCII) or 160 through 255 (‘eight-bit-graphic’). However, when a unibyte string is converted to a multibyte string, all characters with codes in the range 160 through 255 are converted to characters with higher codes, whereas ASCII characters remain unchanged. Thus, a unibyte string and its conversion to multibyte are only ‘equal’ if the string is all ASCII. Character codes 160 through 255 are not entirely proper in multibyte text, even though they can occur. As a consequence, the situation where a unibyte and a multibyte string are ‘equal’ without both being all ASCII is a technical oddity that very few Emacs Lisp programmers ever get confronted with. *Note Text Representations::. This was one of the significant changes in Emacs 23, and I think it is the main factor for the changed behavior reported by Nelson.