all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: YAMAMOTO Mitsuharu <mituharu@math.s.chiba-u.ac.jp>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: 3687@emacsbugs.donarmstrong.com
Subject: bug#3687: 23.1.50; inconsistency in multibyte eight-bit regexps
Date: Fri, 24 Jul 2009 10:08:11 +0900	[thread overview]
Message-ID: <wlbpnavqf8.wl%mituharu@math.s.chiba-u.ac.jp> (raw)
In-Reply-To: <jwvbpo74eet.fsf-monnier+emacsbugreports@gnu.org>

>>>>> On Mon, 29 Jun 2009 10:47:30 +0200, Stefan Monnier <monnier@iro.umontreal.ca> said:

>> It seemed to be too obvious to explain and I hesitated to do that.
>> Anyway, I assume "C" and "[C]" work equivalently as regexps if the
>> character C has no special meaning in either context.

> Yes, it's pretty obvious, thank you.  I haven't had time to look
> deeper, but that part of the code is pretty nasty because it tries
> to be clever about the fact that values between 128-256 can be
> either latin-1 chars and eight-bit-bytes and it tries to be lenient
> about confusion between the two.

Are there any written specifications explaining how the leniency is
supposed to work?

As for documentations, the description below in the elisp info
(Special Characters in Regular Expressions) probably needs to be
updated.

     The beginning and end of a range of multibyte characters must be in
     the same character set (*note Character Sets::).  Thus,
     `"[\x8e0-\x97c]"' is invalid because character 0x8e0 (`a' with
     grave accent) is in the Emacs character set for Latin-1 but the
     character 0x97c (`u' with diaeresis) is in the Emacs character set
     for Latin-2.  (We use Lisp string syntax to write that example,
     and a few others in the next few paragraphs, in order to include
     hex escape sequences in them.)

     If a range starts with a unibyte character C and ends with a
     multibyte character C2, the range is divided into two parts: one
     is `C..?\377', the other is `C1..C2', where C1 is the first
     character of the charset to which C2 belongs.

     You cannot always match all non-ASCII characters with the regular
     expression `"[\200-\377]"'.  This works when searching a unibyte
     buffer or string (*note Text Representations::), but not in a
     multibyte buffer or string, because many non-ASCII characters have
     codes above octal 0377.  However, the regular expression
     `"[^\000-\177]"' does match all non-ASCII characters (see below
     regarding `^'), in both multibyte and unibyte representations,
     because only the ASCII characters are excluded.

				     YAMAMOTO Mitsuharu
				mituharu@math.s.chiba-u.ac.jp





  reply	other threads:[~2009-07-24  1:08 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-26  9:56 bug#3687: 23.1.50; inconsistency in multibyte eight-bit regexps YAMAMOTO Mitsuharu
2009-06-26 13:43 ` Eli Zaretskii
2009-06-27  1:30   ` YAMAMOTO Mitsuharu
2009-06-27  9:36     ` Eli Zaretskii
2009-06-29  3:02       ` YAMAMOTO Mitsuharu
2009-06-29  8:47         ` Stefan Monnier
2009-07-24  1:08           ` YAMAMOTO Mitsuharu [this message]
2019-06-28 12:41 ` bug#3687: 23.1.50; inconsistency in multibyte eight-bit regexps [PATCH] Mattias Engdegård
2019-06-28 13:03   ` Eli Zaretskii
2019-06-28 14:05     ` Mattias Engdegård
2019-06-28 14:40       ` Eli Zaretskii
2019-06-28 15:00         ` Mattias Engdegård
2019-06-28 16:20           ` Eli Zaretskii
2019-06-28 16:47             ` Mattias Engdegård
2019-06-28 14:56       ` Eli Zaretskii
2019-06-28 15:18         ` Stefan Monnier
2019-06-28 15:34           ` Mattias Engdegård

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=wlbpnavqf8.wl%mituharu@math.s.chiba-u.ac.jp \
    --to=mituharu@math.s.chiba-u.ac.jp \
    --cc=3687@emacsbugs.donarmstrong.com \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.