From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Xah Lee Newsgroups: gmane.emacs.help Subject: elisp 23.2 doc on regex on multibyte char still correct? Date: Mon, 3 Jan 2011 01:12:47 -0800 (PST) Organization: http://groups.google.com Message-ID: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1294048419 18118 80.91.229.12 (3 Jan 2011 09:53:39 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Mon, 3 Jan 2011 09:53:39 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Mon Jan 03 10:53:35 2011 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1PZh6b-00063l-Fh for geh-help-gnu-emacs@m.gmane.org; Mon, 03 Jan 2011 10:53:33 +0100 Original-Received: from localhost ([127.0.0.1]:50834 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PZh6a-0005lK-R5 for geh-help-gnu-emacs@m.gmane.org; Mon, 03 Jan 2011 04:53:33 -0500 Original-Path: usenet.stanford.edu!postnews.google.com!c17g2000prm.googlegroups.com!not-for-mail Original-Newsgroups: gnu.emacs.help,comp.emacs Original-Lines: 50 Original-NNTP-Posting-Host: 76.126.112.84 Original-X-Trace: posting.google.com 1294045968 5957 127.0.0.1 (3 Jan 2011 09:12:48 GMT) Original-X-Complaints-To: groups-abuse@google.com Original-NNTP-Posting-Date: Mon, 3 Jan 2011 09:12:48 +0000 (UTC) Complaints-To: groups-abuse@google.com Injection-Info: c17g2000prm.googlegroups.com; posting-host=76.126.112.84; posting-account=bRPKjQoAAACxZsR8_VPXCX27T2YcsyMA User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.224 Safari/534.10, gzip(gfe) Original-Xref: usenet.stanford.edu gnu.emacs.help:183891 comp.emacs:100982 X-Mailman-Approved-At: Mon, 03 Jan 2011 04:49:00 -0500 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:78099 Archived-At: in elisp doc for emacs 23.2, section on regex, it it has a section that talks about multibyte chars. is that info still correct? ________________________________________ This is edition 3.0 of the GNU Emacs Lisp Reference Manual, corresponding to Emacs version 23.2. (elisp) Regexp Special The beginning and end of a range of multibyte characters must be in the same character set (*note Character Sets::). Thus, `"[\x8e0-\x97c]"' is invalid because character 0x8e0 (`a' with grave accent) is in the Emacs character set for Latin-1 but the character 0x97c (`u' with diaeresis) is in the Emacs character set for Latin-2. (We use Lisp string syntax to write that example, and a few others in the next few paragraphs, in order to include hex escape sequences in them.) If a range starts with a unibyte character C and ends with a multibyte character C2, the range is divided into two parts: one is `C..?\377', the other is `C1..C2', where C1 is the first character of the charset to which C2 belongs. You cannot always match all non-ASCII characters with the regular expression `"[\200-\377]"'. This works when searching a unibyte buffer or string (*note Text Representations::), but not in a multibyte buffer or string, because many non-ASCII characters have codes above octal 0377. However, the regular expression `"[^\000-\177]"' does match all non-ASCII characters (see below regarding `^'), in both multibyte and unibyte representations, because only the ASCII characters are excluded. A character alternative can also specify named character classes (*note Char Classes::). This is a POSIX feature whose syntax is `[:CLASS:]'. Using a character class is equivalent to mentioning each of the characters in that class; but the latter is not feasible in practice, since some classes include thousands of different characters. ________________________________________ Xah =E2=88=91 http://xahlee.org/ =E2=98=84