From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: Bug 130397 (Was: Emacs - Ispell problem with i[no]german dictionary) Date: Tue, 4 Jan 2005 21:50:33 +0900 (JST) Message-ID: <200501041250.VAA10883@etlken.m17n.org> References: <20040517120658.GA6919@agmartin.aq.upm.es> <20041217121515.GA2270@agmartin.aq.upm.es> <200412221237.VAA07262@etlken.m17n.org> <20041222171306.GA4462@agmartin.aq.upm.es> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1104843668 10098 80.91.229.6 (4 Jan 2005 13:01:08 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 4 Jan 2005 13:01:08 +0000 (UTC) Cc: lionel@mamane.lu, emacs-devel@gnu.org, 130397@bugs.debian.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Jan 04 14:00:50 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1CloIr-0006Qv-00 for ; Tue, 04 Jan 2005 14:00:50 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1CloPP-00030b-F0 for ged-emacs-devel@m.gmane.org; Tue, 04 Jan 2005 08:07:35 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33) id 1CloNI-0002YF-0E for emacs-devel@gnu.org; Tue, 04 Jan 2005 08:05:24 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.33) id 1CloNB-0002Wd-6t for emacs-devel@gnu.org; Tue, 04 Jan 2005 08:05:18 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1CloMU-0002Gl-29 for emacs-devel@gnu.org; Tue, 04 Jan 2005 08:04:34 -0500 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 4.34) id 1Clo9Z-0002MK-8Z for emacs-devel@gnu.org; Tue, 04 Jan 2005 07:51:13 -0500 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2]) by tsukuba.m17n.org (8.12.3/8.12.3/Debian-7.1) with ESMTP id j04CoZY7008007; Tue, 4 Jan 2005 21:50:36 +0900 Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by fs.m17n.org (8.11.6p2/8.11.6) with ESMTP id j04CoXq16853; Tue, 4 Jan 2005 21:50:34 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id VAA10883; Tue, 4 Jan 2005 21:50:33 +0900 (JST) Original-To: Agustin Martin In-reply-to: <20041222171306.GA4462@agmartin.aq.upm.es> (message from Agustin Martin on Wed, 22 Dec 2004 18:13:06 +0100) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3.50 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:31816 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:31816 In article <20041222171306.GA4462@agmartin.aq.upm.es>, Agustin Martin writes: > I was aware of this, but anyway thanks for reminding. Code is probably too > ad-hoc, but latin{0,1} thing is also a somewhat ad-hoc scenario, where > latin0 should have really be named as something like iso-8859-1v2, that is, > a revision. I cannot imagine somebody using a iso-8859-2 dict and trying to > write in a iso8859-1 buffer, but with iso-8859-1 and iso-8859-15 that is > happening too frequently. > So we have a lot of people that blindly select the locale @euro variant > without realizing its implications, and that iso-8859-1 and iso-8859-15 > are different, but very close encodings (from a practical point of view, > they are fully equivalent for most languages but IIRC french (oe,"Y) and > finnish {sSzZ}^, ^ stands for caron; the euro symbol seems not significant > to spellchecking). > Furthermore (this is probably fixed by the CVS code you mentioned above), > in current sid emacs utf-8 files can be checked with a latin1 dict (of > course if they do not use chars outside latin1) using the ispell.el > internal reencodings, but fails for iso-8859-15 declared dict. No, this is not yet fixed. > The current state of ispell dicts in Debian is that ifrench is iso-8859-15 > as default (although has a real latin1 entry), while finnish do not set at > all the {s,z}-caron chars, so it is a fully latin1 entry. aspell-fr and > aspell-fi are set to plain latin1. > So the only language that might currently require extra work is french, and > for it I find reasonable to use for emacs as default the iso-8859-15 entry > (tagged as iso-8859-1 for the above sustem to work). For this I would like > to hear Lionel's point of view, since he has put a lot of effort to make > iso-8859-15 available for spellchecking (Hi, Lionel). > I personally do not like having separate iso-8859-15 entries unless they are > really required. For the above dicts, that would be for french, and I am not > at all sure that it is really required. Hmmm, then how about the attached patch to the latest CVS emacs? With that, all equivalent charaters (e.g a-grave in all laitn-X) should be handled well. This patch will be applicable also to Emacs 21.3 but not yet tested in that version. --- Ken'ichi HANDA handa@m17n.org *** ispell.el 25 Dec 2004 11:43:11 +0900 1.151 --- ispell.el 03 Jan 2005 16:05:48 +0900 *************** *** 1074,1088 **** (decode-coding-string str (ispell-get-coding-system)) str)) (defun ispell-get-casechars () ! (ispell-decode-string ! (nth 1 (assoc ispell-dictionary ispell-dictionary-alist)))) (defun ispell-get-not-casechars () ! (ispell-decode-string ! (nth 2 (assoc ispell-dictionary ispell-dictionary-alist)))) (defun ispell-get-otherchars () ! (ispell-decode-string ! (nth 3 (assoc ispell-dictionary ispell-dictionary-alist)))) (defun ispell-get-many-otherchars-p () (nth 4 (assoc ispell-dictionary ispell-dictionary-alist))) (defun ispell-get-ispell-args () --- 1074,1127 ---- (decode-coding-string str (ispell-get-coding-system)) str)) + (put 'ispell-unified-chars-table 'char-table-extra-slots 0) + + ;; Char-table that maps an Unicode character (charset: + ;; latin-iso8859-1, mule-unicode-0100-24ff) to + ;; a string in which all equivalent characters are listed. + + (defconst ispell-unified-chars-table + (let ((table (make-char-table 'ispell-unified-chars-table))) + (map-char-table + #'(lambda (c v) + (if (and v (/= c v)) + (let ((unified (or (aref table v) (string v)))) + (aset table v (concat unified (string c)))))) + ucs-mule-8859-to-mule-unicode) + table)) + + ;; Return a string decoded from Nth element of the current dictionary + ;; while splicing equivalent characters into the string. This splicing + ;; is done only if the string is a regular expression of the form + ;; "[...]" because, otherwise, splicing will result in incorrect + ;; regular expression matching. + + (defun ispell-get-decoded-string (n) + (let* ((slot (assoc ispell-dictionary ispell-dictionary-alist)) + (str (nth n slot))) + (when (and (> (length str) 0) + (not (multibyte-string-p str))) + (setq str (ispell-decode-string str)) + (if (and (= (aref str 0) ?\[) + (eq (string-match "\\]" str) (1- (length str)))) + (setq str + (string-as-multibyte + (mapconcat + #'(lambda (c) + (let ((unichar (aref ucs-mule-8859-to-mule-unicode c))) + (if unichar + (aref ispell-unified-chars-table unichar) + (string c)))) + str "")))) + (setcar (nthcdr n slot) str)) + str)) + (defun ispell-get-casechars () ! (ispell-get-decoded-string 1)) (defun ispell-get-not-casechars () ! (ispell-get-decoded-string 2)) (defun ispell-get-otherchars () ! (ispell-get-decoded-string 3)) (defun ispell-get-many-otherchars-p () (nth 4 (assoc ispell-dictionary ispell-dictionary-alist))) (defun ispell-get-ispell-args ()