From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Stefan Newsgroups: gmane.emacs.devel Subject: Re: Bug 130397 Date: Tue, 04 Jan 2005 09:55:09 -0500 Message-ID: References: <20040517120658.GA6919@agmartin.aq.upm.es> <20041217121515.GA2270@agmartin.aq.upm.es> <200412221237.VAA07262@etlken.m17n.org> <20041222171306.GA4462@agmartin.aq.upm.es> <200501041250.VAA10883@etlken.m17n.org> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: sea.gmane.org 1104850560 30459 80.91.229.6 (4 Jan 2005 14:56:00 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 4 Jan 2005 14:56:00 +0000 (UTC) Cc: 130397@bugs.debian.org, Agustin Martin , lionel@mamane.lu, Ken Stevens , emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Jan 04 15:55:45 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1Clq64-00077K-00 for ; Tue, 04 Jan 2005 15:55:44 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1ClqHH-0001Yx-9O for ged-emacs-devel@m.gmane.org; Tue, 04 Jan 2005 10:07:19 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33) id 1ClqH2-0001Uh-CU for emacs-devel@gnu.org; Tue, 04 Jan 2005 10:07:04 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.33) id 1ClqGy-0001Sc-8l for emacs-devel@gnu.org; Tue, 04 Jan 2005 10:07:01 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1ClqGx-0001SB-TM for emacs-devel@gnu.org; Tue, 04 Jan 2005 10:06:59 -0500 Original-Received: from [206.47.199.166] (helo=simmts8-srv.bellnexxia.net) by monty-python.gnu.org with esmtp (Exim 4.34) id 1Clq5W-00012o-IS for emacs-devel@gnu.org; Tue, 04 Jan 2005 09:55:10 -0500 Original-Received: from empanada.home ([67.71.119.166]) by simmts8-srv.bellnexxia.net (InterMail vM.5.01.06.10 201-253-122-130-110-20040306) with ESMTP id <20050104145510.OOGK1623.simmts8-srv.bellnexxia.net@empanada.home>; Tue, 4 Jan 2005 09:55:10 -0500 Original-Received: by empanada.home (Postfix, from userid 502) id 348F03CE1A6; Tue, 4 Jan 2005 09:55:10 -0500 (EST) Original-To: Kenichi Handa In-Reply-To: <200501041250.VAA10883@etlken.m17n.org> (Kenichi Handa's message of "Tue, 4 Jan 2005 21:50:33 +0900 (JST)") User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/21.3.50 (darwin) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:31826 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:31826 > Hmmm, then how about the attached patch to the latest CVS > emacs? With that, all equivalent charaters (e.g a-grave in > all laitn-X) should be handled well. This patch will be > applicable also to Emacs 21.3 but not yet tested in that > version. Can someone explain to me why ispell.el needs those kinds of things? My vague understanding is that ispell.el needs to know which chars are part of a word and that in the past (pre-MULE), this had to be redefined for each and every language since the codes 128-255 could mean completely different things. Why can't ispell.el just use the `w' syntax to decide what is a word and then rely on the decoding/encoding to do the rest of the work? That would fix the problem where a word like "exp=E9rience" is checked as t= wo words if the dictionary is "american". > + ;; Char-table that maps an Unicode character (charset: > + ;; latin-iso8859-1, mule-unicode-0100-24ff) to > + ;; a string in which all equivalent characters are listed. > +=20 > + (defconst ispell-unified-chars-table > + (let ((table (make-char-table 'ispell-unified-chars-table))) > + (map-char-table > + #'(lambda (c v) > + (if (and v (/=3D c v)) > + (let ((unified (or (aref table v) (string v)))) > + (aset table v (concat unified (string c)))))) > + ucs-mule-8859-to-mule-unicode) > + table)) All the elements of this table should be multibyte strings. For this, we may need to wrap the (string X) into (string-to-multibyte (string X)) > + (string-as-multibyte > + (mapconcat > + #'(lambda (c) > + (let ((unichar (aref ucs-mule-8859-to-mule-unicode c))) > + (if unichar > + (aref ispell-unified-chars-table unichar) > + (string c)))) > + str "")))) Do you expect the output of mapconcat to be unibyte and to contain emacs-mule encoding of multibyte chars? I don't. So I'd recommend string-to-multibyte rather than string-as-multibyte. If I'm wrong, could you explain where the emacs-mule encoding got introduced? Stefan