From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Agustin Martin Newsgroups: gmane.emacs.devel Subject: Re: Bug 130397 (Was: Emacs - Ispell problem with i[no]german dictionary) Date: Mon, 10 Jan 2005 18:16:11 +0100 Message-ID: <20050110171611.GA10357@agmartin.aq.upm.es> References: <20040517120658.GA6919@agmartin.aq.upm.es> <20041217121515.GA2270@agmartin.aq.upm.es> <200412221237.VAA07262@etlken.m17n.org> <20041222171306.GA4462@agmartin.aq.upm.es> <20050110130641.GB13663@tofu.mamane.lu> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1105378615 4090 80.91.229.6 (10 Jan 2005 17:36:55 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Mon, 10 Jan 2005 17:36:55 +0000 (UTC) Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Jan 10 18:36:46 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1Co3TB-0000VL-00 for ; Mon, 10 Jan 2005 18:36:45 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Co3eg-0001Qf-TC for ged-emacs-devel@m.gmane.org; Mon, 10 Jan 2005 12:48:38 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Co3Si-0005d8-UZ for emacs-devel@gnu.org; Mon, 10 Jan 2005 12:36:17 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Co3SO-0005Tg-FA for emacs-devel@gnu.org; Mon, 10 Jan 2005 12:36:00 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Co3SO-0005RV-6e for emacs-devel@gnu.org; Mon, 10 Jan 2005 12:35:56 -0500 Original-Received: from [138.100.4.49] (helo=edison.ccupm.upm.es) by monty-python.gnu.org with esmtp (Exim 4.34) id 1Co3C2-0004B8-6G for emacs-devel@gnu.org; Mon, 10 Jan 2005 12:19:02 -0500 Original-Received: from mala.aq.upm.es (Agmartin.aq.upm.es [138.100.41.131]) by edison.ccupm.upm.es (8.12.10/8.12.10) with ESMTP id j0AHGuLN021194; Mon, 10 Jan 2005 18:16:56 +0100 Original-Received: by mala.aq.upm.es (Postfix, from userid 1000) id 54B09CC6B; Mon, 10 Jan 2005 18:16:11 +0100 (CET) Original-To: Lionel Elie Mamane , Kenichi Handa , 130397@bugs.debian.org, emacs-devel@gnu.org Content-Disposition: inline In-Reply-To: <20050110130641.GB13663@tofu.mamane.lu> User-Agent: Mutt/1.5.6+20040907i X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:32100 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:32100 (Handa, your patch worked better than I thought, read below) On Mon, Jan 10, 2005 at 02:06:41PM +0100, Lionel Elie Mamane wrote: > On Wed, Dec 22, 2004 at 06:13:06PM +0100, Agustin Martin wrote: > > So the only language that might currently require extra work is > > french, and for it I find reasonable to use for emacs as default the > > iso-8859-15 entry (tagged as iso-8859-1 for the above sustem to > > work). For this I would like to hear Lionel's point of view, since > > he has put a lot of effort to make iso-8859-15 available for > > spellchecking (Hi, Lionel). > > I think that if we do that, then latin1 text won't be spell-checked > correctly: Ispell will try to insert "one half" and "one quarter" > characters (the characters occupying the same place as OE and oe in > latin9), won't it? Yes, things will be that way. I am considering an "exclusion" list, that is languages for which this hack should not be done, so they can have really different iso-8859-1 and iso-8859-15 entries for {x}emacs. That will currently be only french, since seems that finnish dicts do not include the iso-8859-15 chars. For all other languages considering them as internally equivalent seems not unreasonable. It is then up to the french dicts maintainers to decide which one is to be considered as "default", that is, to be called "francais". Also coordination with the aspell french dict maintainer is needed, so they both share the same ispell.el entry in the less conflicting way. > > > I personally do not like having separate iso-8859-15 entries unless > > they are really required. For the above dicts, that would be for > > french, and I am not at all sure that it is really required. > > Having separate entries that the user has to select manually is bad, > but it is the best we can have with the current system if we want to > keep correctness of the spell-checking, as far as I understand. Having > the system (the combination of emacs + dicts-common + the dicts) > select the right dictionary + options combination automatically based > on the (language, encoding) pair (like "-d francais -T ~latin1" for > french and latin1) would be cool from the user's POV. > > We have special entries for "(La)TeX", which can be seen as another > encoding, so why not special entries for iso8859-15 (when necessary)? > What is so fundamentally different about iso8859-15? > The problem was that when editing an utf-8 buffer and using an iso-8859-15 ispell dict entry for emacs there were some problems, notably that some misalignment errors appeared. I vaguely remember that word boundaries were not well found, but I am not sure, and if existed, seems to be gone in current sid emacs21. Also Kenichi Handa provided us with a patch to ensure that all equivalent accented chars are mapped to the same char, if available under different encodings, so are not considered as word boundaries if spell-checkable, but I still got misalignment errors with it. This would however fixed the word boundaries problem for a iso-8859-15 buffer using a iso-8859-1 dict. But I have just noticed that if I add coeur (with oe-1char) to the french dict (ifrench, it contained only the oe-2char version) the misalignment errors disappear (I only tested with coeur, do not know which other words have the same char although I guess that most the oeu) So, patch from Kenichi Handa seems to work well for sid emacs21, much better than I thought. However it uses code that has only been recently added to emacs21, and things that are not available for xemacs or emacs20. Cheers, -- Agustin