From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Chip Coldwell Newsgroups: gmane.emacs.devel Subject: Re: regex encoding Date: Tue, 1 Aug 2006 16:38:57 -0400 (EDT) Message-ID: References: NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="8323328-1123356647-1154464737=:2363" X-Trace: sea.gmane.org 1154464800 18267 80.91.229.2 (1 Aug 2006 20:40:00 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 1 Aug 2006 20:40:00 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Aug 01 22:39:55 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1G811F-0003ot-5O for ged-emacs-devel@m.gmane.org; Tue, 01 Aug 2006 22:39:13 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1G811E-0003La-DJ for ged-emacs-devel@m.gmane.org; Tue, 01 Aug 2006 16:39:12 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1G8110-0003Jm-RT for emacs-devel@gnu.org; Tue, 01 Aug 2006 16:38:58 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1G810z-0003GE-2Z for emacs-devel@gnu.org; Tue, 01 Aug 2006 16:38:58 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1G810y-0003Fu-Tz for emacs-devel@gnu.org; Tue, 01 Aug 2006 16:38:56 -0400 Original-Received: from [66.187.233.31] (helo=mx1.redhat.com) by monty-python.gnu.org with esmtp (Exim 4.52) id 1G813w-0003qa-FY for emacs-devel@gnu.org; Tue, 01 Aug 2006 16:42:00 -0400 Original-Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.12.11.20060308/8.12.11) with ESMTP id k71KcsR8028972; Tue, 1 Aug 2006 16:38:54 -0400 Original-Received: from mail.boston.redhat.com (mail.boston.redhat.com [172.16.76.12]) by int-mx1.corp.redhat.com (8.12.11.20060308/8.12.11) with ESMTP id k71KcsYx031488; Tue, 1 Aug 2006 16:38:54 -0400 Original-Received: from dhcp83-29.boston.redhat.com (dhcp83-29.boston.redhat.com [172.16.83.29]) by mail.boston.redhat.com (8.12.8/8.12.8) with ESMTP id k71Kcsw4009444; Tue, 1 Aug 2006 16:38:54 -0400 Original-To: Reiner Steib In-Reply-To: X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:57966 Archived-At: This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --8323328-1123356647-1154464737=:2363 Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.redhat.com id k71KcsR8028972 On Tue, 1 Aug 2006, Reiner Steib wrote: > > | [a-zA-Z=C4=D6=DC=E4=F6=DF=FC] > > If you intend to send UTF-8, you MUA should not declare it as > "Content-Type: TEXT/PLAIN; charset=3DISO-8859-1". ;-) Ooops. I haven't tamed my MUA yet. It's only been ten years. >> My question is: are emacs regex character classes limited to the >> iso-8859-1 character set, or is there some way to represent Unicode >> (such as UTF-8) characters in a character class? > > AFAIK, you can write the chars in UTF-8 if you specify the encoding of > the lisp file, cf. (info "(emacs)Specify Coding"): > > --8<---------------cut here---------------start------------->8--- > ;; -*- coding: utf-8 -*- > (defun rs-test () > (interactive) > (re-search-forward "[=C4=D6=DC=E4=F6=DF=FC]")) > --8<---------------cut here---------------end--------------->8--- > > I don't know if there's a reason why isn't used in `ispell.el'. The particular issue is that ispell is currently broken if your LANG environment variable specifies UTF-8 encoding, your buffer is UTF-8 encoded and contains one of these non-ASCII characters, and you specify the "deutsch8" dictionary. ispell-word generates the error: "Ispell and its process have different character maps" What happens is that emacs transcodes the word to iso-8859-1 before sending it to the aspell process, which most likely is respecting a the LANG environment variable. If you change the value of the ispell-dictionary-alist for "deutch8" from ("deutsch8" "[a-zA-Z\304\326\334\344\366\337\374]" "[^a-zA-Z\304\326\334\344\366\337\374]" "[']" t ("-C" "-d" "deutsch") "~latin1" iso-8859-1) to ("deutsch8" "[a-zA-Z\304\326\334\344\366\337\374]" "[^a-zA-Z\304\326\334\344\366\337\374]" "[']" t ("-C" "-d" "deutsch") "~latin1" utf-8) the regex doesn't match words in the buffer properly. Changing it to ("deutsch8" "[a-zA-Z\304\326\334\344\366\337\374]" "[^a-zA-Z\304\326\334\344\366\337\374]" "[']" t ("-C" "-d" "deutsch" "--encoding=3Diso8859-1") "~latin1" iso= -8859-1) does seem to work. Chip --=20 Charles M. "Chip" Coldwell Senior Software Engineer Red Hat, Inc 978-392-2426 --8323328-1123356647-1154464737=:2363 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Emacs-devel mailing list Emacs-devel@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-devel --8323328-1123356647-1154464737=:2363--