From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: Bug 130397 Date: Wed, 5 Jan 2005 11:00:11 +0900 (JST) Message-ID: <200501050200.LAA12589@etlken.m17n.org> References: <20040517120658.GA6919@agmartin.aq.upm.es> <20041217121515.GA2270@agmartin.aq.upm.es> <200412221237.VAA07262@etlken.m17n.org> <20041222171306.GA4462@agmartin.aq.upm.es> <200501041250.VAA10883@etlken.m17n.org> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: sea.gmane.org 1104890677 17094 80.91.229.6 (5 Jan 2005 02:04:37 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 5 Jan 2005 02:04:37 +0000 (UTC) Cc: agustin.martin@hispalinux.es, lionel@mamane.lu, emacs-devel@gnu.org, k.stevens@ieee.org, 130397@bugs.debian.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Jan 05 03:04:26 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1Cm0XC-0004d0-00 for ; Wed, 05 Jan 2005 03:04:26 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1Cm0iQ-0004J1-7d for ged-emacs-devel@m.gmane.org; Tue, 04 Jan 2005 21:16:02 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33) id 1Cm0gw-0003Xz-Dw for emacs-devel@gnu.org; Tue, 04 Jan 2005 21:14:31 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.33) id 1Cm0gr-0003VD-GE for emacs-devel@gnu.org; Tue, 04 Jan 2005 21:14:27 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1Cm0gr-0003TW-20 for emacs-devel@gnu.org; Tue, 04 Jan 2005 21:14:25 -0500 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 4.34) id 1Cm0Tm-0007yu-88 for emacs-devel@gnu.org; Tue, 04 Jan 2005 21:00:54 -0500 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2]) by tsukuba.m17n.org (8.12.3/8.12.3/Debian-7.1) with ESMTP id j0520GY7019826; Wed, 5 Jan 2005 11:00:16 +0900 Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by fs.m17n.org (8.11.6p2/8.11.6) with ESMTP id j0520Cq27077; Wed, 5 Jan 2005 11:00:12 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id LAA12589; Wed, 5 Jan 2005 11:00:11 +0900 (JST) Original-To: Stefan In-reply-to: (message from Stefan on Tue, 04 Jan 2005 09:55:09 -0500) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3.50 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:31856 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:31856 In article , Stefan writes: >> Hmmm, then how about the attached patch to the latest CVS >> emacs? With that, all equivalent charaters (e.g a-grave in >> all laitn-X) should be handled well. This patch will be >> applicable also to Emacs 21.3 but not yet tested in that >> version. > Can someone explain to me why ispell.el needs those kinds of things? > My vague understanding is that ispell.el needs to know which chars are pa= rt > of a word and that in the past (pre-MULE), this had to be redefined for e= ach > and every language since the codes 128-255 could mean completely > different things. > Why can't ispell.el just use the `w' syntax to decide what is a word and > then rely on the decoding/encoding to do the rest of the work? > That would fix the problem where a word like "exp=E9rience" is checked as= two > words if the dictionary is "american". That will cause another problem. For instance, when we have "espa=F1ol" in a buffer and the ispell dictionary is czech (latin-2), as "espa=F1ol" is encoded into "espa?ol" by latin-2, it causes the error "Ispell and its process have different character maps" because ispell returns the result of two words "eapa" and "ol". >> + ;; Char-table that maps an Unicode character (charset: >> + ;; latin-iso8859-1, mule-unicode-0100-24ff) to >> + ;; a string in which all equivalent characters are listed. >> +=20 >> + (defconst ispell-unified-chars-table >> + (let ((table (make-char-table 'ispell-unified-chars-table))) >> + (map-char-table >> + #'(lambda (c v) >> + (if (and v (/=3D c v)) >> + (let ((unified (or (aref table v) (string v)))) >> + (aset table v (concat unified (string c)))))) >> + ucs-mule-8859-to-mule-unicode) >> + table)) > All the elements of this table should be multibyte strings. > For this, we may need to wrap the (string X) into > (string-to-multibyte (string X)) As `c' and `v' are always multibyte characters, (string X) always return a multibyte string. >> + (string-as-multibyte >> + (mapconcat >> + #'(lambda (c) >> + (let ((unichar (aref ucs-mule-8859-to-mule-unicode c))) >> + (if unichar >> + (aref ispell-unified-chars-table unichar) >> + (string c)))) >> + str "")))) > Do you expect the output of mapconcat to be unibyte and to contain > emacs-mule encoding of multibyte chars? No. STR may be an ASCII-only string, in which case, the result of mapconcat is a unibyte ASCII-only string. I'd like to change it to a multibyte ASCII-only stirng to avoid converting STR again and again in such a case. --- Ken'ichi HANDA handa@m17n.org