From mboxrd@z Thu Jan  1 00:00:00 1970
Path: main.gmane.org!not-for-mail
From: Kenichi Handa <handa@m17n.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Bug 130397 (Was: Emacs - Ispell problem
	with	i[no]german	dictionary)
Date: Tue, 4 Jan 2005 21:50:33 +0900 (JST)
Message-ID: <200501041250.VAA10883@etlken.m17n.org>
References: <Pine.LNX.4.43.0305140821370.30166-100000@wr-linux02.rki.ivbb.bund.de>	<m3addpd2ur.fsf@dionysos.nib>
	<E19HNCh-0000tv-00@fencepost.gnu.org>	<20040517120658.GA6919@agmartin.aq.upm.es>	<20041217121515.GA2270@agmartin.aq.upm.es>	<200412221237.VAA07262@etlken.m17n.org>
	<20041222171306.GA4462@agmartin.aq.upm.es>
NNTP-Posting-Host: deer.gmane.org
Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya")
Content-Type: text/plain; charset=US-ASCII
X-Trace: sea.gmane.org 1104843668 10098 80.91.229.6 (4 Jan 2005 13:01:08 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Tue, 4 Jan 2005 13:01:08 +0000 (UTC)
Cc: lionel@mamane.lu, emacs-devel@gnu.org, 130397@bugs.debian.org
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Jan 04 14:00:50 2005
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Original-Received: from lists.gnu.org ([199.232.76.165])
	by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 1CloIr-0006Qv-00
	for <ged-emacs-devel@m.gmane.org>; Tue, 04 Jan 2005 14:00:50 +0100
Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.33)
	id 1CloPP-00030b-F0
	for ged-emacs-devel@m.gmane.org; Tue, 04 Jan 2005 08:07:35 -0500
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33)
	id 1CloNI-0002YF-0E
	for emacs-devel@gnu.org; Tue, 04 Jan 2005 08:05:24 -0500
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.33)
	id 1CloNB-0002Wd-6t
	for emacs-devel@gnu.org; Tue, 04 Jan 2005 08:05:18 -0500
Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.33) id 1CloMU-0002Gl-29
	for emacs-devel@gnu.org; Tue, 04 Jan 2005 08:04:34 -0500
Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org)
	by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168)
	(Exim 4.34) id 1Clo9Z-0002MK-8Z
	for emacs-devel@gnu.org; Tue, 04 Jan 2005 07:51:13 -0500
Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2])
	by tsukuba.m17n.org (8.12.3/8.12.3/Debian-7.1) with ESMTP id
	j04CoZY7008007; Tue, 4 Jan 2005 21:50:36 +0900
Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125])
	by fs.m17n.org (8.11.6p2/8.11.6) with ESMTP id j04CoXq16853;
	Tue, 4 Jan 2005 21:50:34 +0900 (JST)
Original-Received: (from handa@localhost)
	by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id VAA10883;
	Tue, 4 Jan 2005 21:50:33 +0900 (JST)
Original-To: Agustin Martin <agustin.martin@hispalinux.es>
In-reply-to: <20041222171306.GA4462@agmartin.aq.upm.es> (message from Agustin
	Martin on Wed, 22 Dec 2004 18:13:06 +0100)
User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2
	Emacs/21.3.50 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: main.gmane.org gmane.emacs.devel:31816
X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:31816

In article <20041222171306.GA4462@agmartin.aq.upm.es>, Agustin Martin <agustin.martin@hispalinux.es> writes:

> I was aware of this, but anyway thanks for reminding. Code is probably too
> ad-hoc, but latin{0,1} thing is also a somewhat ad-hoc scenario, where
> latin0 should have really be named as something like iso-8859-1v2, that is,
> a revision. I cannot imagine somebody using a iso-8859-2 dict and trying to
> write in a iso8859-1 buffer, but with iso-8859-1 and iso-8859-15 that is
> happening too frequently. 

> So we have a lot of people that blindly select the locale @euro variant
> without realizing its implications, and that iso-8859-1 and iso-8859-15
> are different, but very close encodings (from a practical point of view,
> they are fully equivalent for most languages but IIRC french (oe,"Y) and
> finnish {sSzZ}^, ^ stands for caron; the euro symbol seems not significant
> to spellchecking). 

> Furthermore (this is probably fixed by the CVS code you mentioned above),
> in current sid emacs utf-8 files can be checked with a latin1 dict (of
> course if they do not use chars outside latin1) using the ispell.el
> internal reencodings, but fails for iso-8859-15 declared dict.

No, this is not yet fixed.

> The current state of ispell dicts in Debian is that ifrench is iso-8859-15
> as default (although has a real latin1 entry), while finnish do not set at
> all the {s,z}-caron chars, so it is a fully latin1 entry. aspell-fr and
> aspell-fi are set to plain latin1.

> So the only language that might currently require extra work is french, and
> for it I find reasonable to use for emacs as default the iso-8859-15 entry
> (tagged as iso-8859-1 for the above sustem to work). For this I would like
> to hear Lionel's point of view, since he has put a lot of effort to make
> iso-8859-15 available for spellchecking (Hi, Lionel). 

> I personally do not like having separate iso-8859-15 entries unless they are
> really required. For the above dicts, that would be for french, and I am not
> at all sure that it is really required.

Hmmm, then how about the attached patch to the latest CVS
emacs?  With that, all equivalent charaters (e.g a-grave in
all laitn-X) should be handled well.  This patch will be
applicable also to Emacs 21.3 but not yet tested in that
version.

---
Ken'ichi HANDA
handa@m17n.org


*** ispell.el	25 Dec 2004 11:43:11 +0900	1.151
--- ispell.el	03 Jan 2005 16:05:48 +0900	
***************
*** 1074,1088 ****
        (decode-coding-string str (ispell-get-coding-system))
      str))
  
  (defun ispell-get-casechars ()
!   (ispell-decode-string
!    (nth 1 (assoc ispell-dictionary ispell-dictionary-alist))))
  (defun ispell-get-not-casechars ()
!   (ispell-decode-string
!    (nth 2 (assoc ispell-dictionary ispell-dictionary-alist))))
  (defun ispell-get-otherchars ()
!   (ispell-decode-string
!    (nth 3 (assoc ispell-dictionary ispell-dictionary-alist))))
  (defun ispell-get-many-otherchars-p ()
    (nth 4 (assoc ispell-dictionary ispell-dictionary-alist)))
  (defun ispell-get-ispell-args ()
--- 1074,1127 ----
        (decode-coding-string str (ispell-get-coding-system))
      str))
  
+ (put 'ispell-unified-chars-table 'char-table-extra-slots 0)
+ 
+ ;; Char-table that maps an Unicode character (charset:
+ ;; latin-iso8859-1, mule-unicode-0100-24ff) to
+ ;; a string in which all equivalent characters are listed.
+ 
+ (defconst ispell-unified-chars-table
+   (let ((table (make-char-table 'ispell-unified-chars-table)))
+     (map-char-table
+      #'(lambda (c v)
+ 	 (if (and v (/= c v))
+ 	     (let ((unified (or (aref table v) (string v))))
+ 	       (aset table v (concat unified (string c))))))
+      ucs-mule-8859-to-mule-unicode)
+     table))
+ 
+ ;; Return a string decoded from Nth element of the current dictionary
+ ;; while splicing equivalent characters into the string.  This splicing
+ ;; is done only if the string is a regular expression of the form
+ ;; "[...]" because, otherwise, splicing will result in incorrect
+ ;; regular expression matching.
+ 
+ (defun ispell-get-decoded-string (n)
+   (let* ((slot (assoc ispell-dictionary ispell-dictionary-alist))
+ 	 (str (nth n slot)))
+     (when (and (> (length str) 0)
+ 	       (not (multibyte-string-p str)))
+       (setq str (ispell-decode-string str))
+       (if (and (= (aref str 0) ?\[)
+ 	       (eq (string-match "\\]" str) (1- (length str))))
+ 	  (setq str
+ 		(string-as-multibyte
+ 		 (mapconcat
+ 		  #'(lambda (c)
+ 		      (let ((unichar (aref ucs-mule-8859-to-mule-unicode c)))
+ 			(if unichar
+ 			    (aref ispell-unified-chars-table unichar)
+ 			  (string c))))
+ 		  str ""))))
+       (setcar (nthcdr n slot) str))
+     str))
+ 
  (defun ispell-get-casechars ()
!   (ispell-get-decoded-string 1))
  (defun ispell-get-not-casechars ()
!   (ispell-get-decoded-string 2))
  (defun ispell-get-otherchars ()
!   (ispell-get-decoded-string 3))
  (defun ispell-get-many-otherchars-p ()
    (nth 4 (assoc ispell-dictionary ispell-dictionary-alist)))
  (defun ispell-get-ispell-args ()