From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#3687: 23.1.50; inconsistency in multibyte eight-bit regexps [PATCH] Date: Fri, 28 Jun 2019 17:56:14 +0300 Message-ID: <83r27dhi7l.fsf@gnu.org> References: <831rzdj1z9.fsf@gnu.org> <6138515E-3202-437D-8341-7A8856AD0AE9@acm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="172525"; mail-complaints-to="usenet@blaine.gmane.org" Cc: monnier@iro.umontreal.ca, 3687@debbugs.gnu.org To: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Fri Jun 28 18:11:35 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1hgtTK-000ijD-1G for geb-bug-gnu-emacs@m.gmane.org; Fri, 28 Jun 2019 18:11:34 +0200 Original-Received: from localhost ([::1]:33826 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hgtTI-00075B-QW for geb-bug-gnu-emacs@m.gmane.org; Fri, 28 Jun 2019 12:11:33 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:48216) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hgsJD-00064m-Jo for bug-gnu-emacs@gnu.org; Fri, 28 Jun 2019 10:57:04 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hgsJC-0000ef-KF for bug-gnu-emacs@gnu.org; Fri, 28 Jun 2019 10:57:03 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:57461) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hgsJC-0000eW-Fx for bug-gnu-emacs@gnu.org; Fri, 28 Jun 2019 10:57:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hgsJC-0005m1-Bh for bug-gnu-emacs@gnu.org; Fri, 28 Jun 2019 10:57:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 28 Jun 2019 14:57:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 3687 X-GNU-PR-Package: emacs Original-Received: via spool by 3687-submit@debbugs.gnu.org id=B3687.156173381922182 (code B ref 3687); Fri, 28 Jun 2019 14:57:02 +0000 Original-Received: (at 3687) by debbugs.gnu.org; 28 Jun 2019 14:56:59 +0000 Original-Received: from localhost ([127.0.0.1]:42772 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hgsJ9-0005li-0b for submit@debbugs.gnu.org; Fri, 28 Jun 2019 10:56:59 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:42252) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hgsJ7-0005lW-E9 for 3687@debbugs.gnu.org; Fri, 28 Jun 2019 10:56:58 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:56807) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hgsIy-0000UX-BK; Fri, 28 Jun 2019 10:56:50 -0400 Original-Received: from [176.228.60.248] (port=2589 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1hgsIu-0002OS-T2; Fri, 28 Jun 2019 10:56:48 -0400 In-reply-to: <6138515E-3202-437D-8341-7A8856AD0AE9@acm.org> (message from Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= on Fri, 28 Jun 2019 16:05:07 +0200) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:161741 Archived-At: > From: Mattias EngdegÄrd > Date: Fri, 28 Jun 2019 16:05:07 +0200 > Cc: mituharu@math.s.chiba-u.ac.jp, monnier@iro.umontreal.ca, > 3687@debbugs.gnu.org > > > 1. What do you mean by "raw bytes"? Is #xab a raw byte or a Unicode > > point U+00AB? IOW, how do we distinguish, in a regexp, between a > > raw byte and a character whose Unicode codepoint is that byte's > > value? And how does one go about concocting a regexp that matches > > raw bytes in a unibyte or multibyte buffer or string? > > Sorry, I should have been more clear. The terminology in the manual is a bit muddled; in this case I mean the characters (or whatever you prefer calling them) obtained with hex or octal escapes in the range 128-255, such as "\xff" or "\377", regardless of the string's type (unibyte or multibyte). > > Unicode characters in the range 128-255 can be generated using the \u00HH or \U000000HH notations, or by just including them literally. They are distinct from raw bytes. > > To match raw bytes, just write them. They are not special in regexp syntax and need no escaping. And one more question about this part: if hex and octal escapes are reserved for raw bytes, then what is \123456 and its ilk, i.e. octal escapes whose values are above 255 decimal? Are they errors to be signaled about?