* elisp 23.2 doc on regex on multibyte char still correct?
@ 2011-01-03 9:12 Xah Lee
2011-01-03 18:17 ` Eli Zaretskii
[not found] ` <mailman.7.1294078730.3992.help-gnu-emacs@gnu.org>
0 siblings, 2 replies; 3+ messages in thread
From: Xah Lee @ 2011-01-03 9:12 UTC (permalink / raw
To: help-gnu-emacs
in elisp doc for emacs 23.2, section on regex, it
it has a section that talks about multibyte chars.
is that info still correct?
________________________________________
This is edition 3.0 of the GNU Emacs Lisp Reference Manual,
corresponding to Emacs version 23.2.
(elisp) Regexp Special
The beginning and end of a range of multibyte characters must be
in
the same character set (*note Character Sets::). Thus,
`"[\x8e0-\x97c]"' is invalid because character 0x8e0 (`a' with
grave accent) is in the Emacs character set for Latin-1 but the
character 0x97c (`u' with diaeresis) is in the Emacs character
set
for Latin-2. (We use Lisp string syntax to write that example,
and a few others in the next few paragraphs, in order to include
hex escape sequences in them.)
If a range starts with a unibyte character C and ends with a
multibyte character C2, the range is divided into two parts: one
is `C..?\377', the other is `C1..C2', where C1 is the first
character of the charset to which C2 belongs.
You cannot always match all non-ASCII characters with the regular
expression `"[\200-\377]"'. This works when searching a unibyte
buffer or string (*note Text Representations::), but not in a
multibyte buffer or string, because many non-ASCII characters
have
codes above octal 0377. However, the regular expression
`"[^\000-\177]"' does match all non-ASCII characters (see below
regarding `^'), in both multibyte and unibyte representations,
because only the ASCII characters are excluded.
A character alternative can also specify named character classes
(*note Char Classes::). This is a POSIX feature whose syntax is
`[:CLASS:]'. Using a character class is equivalent to mentioning
each of the characters in that class; but the latter is not
feasible in practice, since some classes include thousands of
different characters.
________________________________________
Xah ∑ http://xahlee.org/ ☄
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2011-01-03 21:25 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-03 9:12 elisp 23.2 doc on regex on multibyte char still correct? Xah Lee
2011-01-03 18:17 ` Eli Zaretskii
[not found] ` <mailman.7.1294078730.3992.help-gnu-emacs@gnu.org>
2011-01-03 21:25 ` Xah Lee
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.