From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: eight-bit char handling in emacs-unicode Date: Fri, 21 Nov 2003 08:41:06 +0900 (JST) Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200311202341.IAA18243@etlken.m17n.org> References: <200311130153.KAA04615@etlken.m17n.org> <200311130610.PAA04983@etlken.m17n.org> <200311130901.SAA05204@etlken.m17n.org> <200311140047.JAA06414@etlken.m17n.org> <200311180733.QAA13703@etlken.m17n.org> <200311190006.JAA14847@etlken.m17n.org> <87ptfovdnj.fsf@mail.jurta.org> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1069372165 3450 80.91.224.253 (20 Nov 2003 23:49:25 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Thu, 20 Nov 2003 23:49:25 +0000 (UTC) Cc: monnier@IRO.UMontreal.CA, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Fri Nov 21 00:49:22 2003 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1AMyY6-0001uI-00 for ; Fri, 21 Nov 2003 00:49:22 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1AMyY6-0003de-00 for ; Fri, 21 Nov 2003 00:49:22 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1AMzTT-00067X-KX for emacs-devel@quimby.gnus.org; Thu, 20 Nov 2003 19:48:39 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24) id 1AMzOH-0003wx-QW for emacs-devel@gnu.org; Thu, 20 Nov 2003 19:43:17 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24) id 1AMzNj-0003nD-Tk for emacs-devel@gnu.org; Thu, 20 Nov 2003 19:43:15 -0500 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1AMzNd-0003ld-BA for emacs-devel@gnu.org; Thu, 20 Nov 2003 19:42:37 -0500 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2]) by tsukuba.m17n.org (8.11.6p2/3.7W-20010518204228) with ESMTP id hAKNf7h05968; Fri, 21 Nov 2003 08:41:07 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by fs.m17n.org (8.11.6/3.7W-20010823150639) with ESMTP id hAKNf6s11414; Fri, 21 Nov 2003 08:41:06 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id IAA18243; Fri, 21 Nov 2003 08:41:06 +0900 (JST) Original-To: juri@jurta.org In-reply-to: <87ptfovdnj.fsf@mail.jurta.org> (message from Juri Linkov on Wed, 19 Nov 2003 12:46:56 +0200) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:17992 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:17992 In article <87ptfovdnj.fsf@mail.jurta.org>, Juri Linkov writes: > (progn > (set-language-environment 'ukrainian) > (re-search-forward "[\000-\007\013\015-\032\034-\037\200-\237]" nil t)) > It fails with the (invalid-regexp "Invalid range end"). > Could you suggest how to fix this bug? The current Emacs simply makes the unibyte regex string to multibyte, and in Uktranian, as nonascii-translation-table converts ?\200 to 299040, but ?\237 to 2295, the above regexp leads to "Invalid range end". This behaviour itself is a bug. We must treat \200-\237 as the same way as \200\201...\236\237 (emacs-unicode already does that). But fixing that bug doesn't solve the Gnus problem because the intention of the part "\200-\237" is apparently to match with C1 control chars, not to match with the multibyte equivalence in the current language environment. So changing the above as below is correct. (re-search-forward (string-as-multibyte "[\000-\007\013\015-\032\034-\037\200-\237]" nil t)) --- Ken'ichi HANDA handa@m17n.org