From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Geoff Kuenning Newsgroups: gmane.emacs.devel Subject: Re: Bug 130397 Date: 29 Apr 2005 02:29:41 +0200 Message-ID: References: <20040517120658.GA6919@agmartin.aq.upm.es> <20041217121515.GA2270@agmartin.aq.upm.es> <200412221237.VAA07262@etlken.m17n.org> <20041222171306.GA4462@agmartin.aq.upm.es> <200501041250.VAA10883@etlken.m17n.org> <200501050200.LAA12589@etlken.m17n.org> <873bxgjxrp.fsf-monnier+emacs@gnu.org> <200501050550.OAA13142@etlken.m17n.org> <20050107153618.GA1508@agmartin.aq.upm.es> <878y7553qd.fsf@jurta.org> <200501130559.OAA12786@etlken.m17n.org> <878y6rnhd3.fsf@jurta.org> <87sm4yt0o6.fsf@jurta.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1114734308 23813 80.91.229.2 (29 Apr 2005 00:25:08 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Fri, 29 Apr 2005 00:25:08 +0000 (UTC) Cc: agustin.martin@hispalinux.es, emacs-devel@gnu.org, k.stevens@ieee.org, 130397@bugs.debian.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Apr 29 02:25:05 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1DRJJG-0004dE-1E for ged-emacs-devel@m.gmane.org; Fri, 29 Apr 2005 02:24:46 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1DRJPT-0007QC-Rn for ged-emacs-devel@m.gmane.org; Thu, 28 Apr 2005 20:31:11 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1DRJPH-0007OC-S9 for emacs-devel@gnu.org; Thu, 28 Apr 2005 20:31:00 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1DRJPH-0007NV-1R for emacs-devel@gnu.org; Thu, 28 Apr 2005 20:30:59 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1DRJPG-0007N5-Sr for emacs-devel@gnu.org; Thu, 28 Apr 2005 20:30:58 -0400 Original-Received: from [134.173.42.59] (helo=mallet.cs.hmc.edu) by monty-python.gnu.org with esmtp (Exim 4.34) id 1DRJS7-0006Ca-Ip for emacs-devel@gnu.org; Thu, 28 Apr 2005 20:33:55 -0400 Original-Received: from bow.cs.hmc.edu (bow-vpn.cs.hmc.edu [192.168.6.2]) by mallet.cs.hmc.edu (Postfix) with ESMTP id 0535F298269; Thu, 28 Apr 2005 17:29:43 -0700 (PDT) Original-Received: by bow.cs.hmc.edu (Postfix, from userid 13409) id EC4DD24C50E; Fri, 29 Apr 2005 02:29:41 +0200 (CEST) Original-To: Juri Linkov In-Reply-To: <87sm4yt0o6.fsf@jurta.org> Original-Lines: 76 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:36491 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:36491 For those of you who don't know, I've released ispell 3.3.00. Having gotten that off my plate, I'm busily working on some improvements that will go into 3.3.01. Number one on that list is to redo the fixispell-a script that I whipped up a few months ago. Juri points out: > This approach is quite promising, but it doesn't work sufficiently well > for non-English languages. It loses all characters that don't belong > to the alphabet specified in .aff file. and: > But there is another problem. fixispell-a returns a list of near misses > only for the last language in the pipe. It would be better if it > accumulated a list of near misses from all ispell commands in the pipe. The former problem is best addressed using Juri's suggestion of passing the "-w" switch to specify a superset. In addition, in the new release, the english.aff file includes all of Latin-1 (since English sometimes adopts accented words and names from other languages). The -w switch is still needed, though, to handle things like the apostrophe, which isn't in all non-English affix files. I welcome further suggestions. The latter problem motivated me to write an entirely new program, multispell, which does a better job of what fixispell-a attempted. It's invoked as: multispell [ispell-switches] dict1 dict2 dict3 For example: multispell -m english deutsch francais Multispell behaves like ispell -a, but accepts any word that any of the mentioned dictionaries accept. If a word is rejected, it combines suggestions from all dictionaries. So, for example, sending "wuld" to the above line produces: & wuld 0 7 weld, wild, wold, would, Wald, wild, wund This brings me to a question and a discussion point. The question is highlighted in the above line: the word "wild" appears as a suggestion twice, because the English and German dictionaries both produce it. Do people think that's a Bad Thing? I can certainly write code to suppress the duplicates; I'm just feeling lazy at the moment. *grin* The discussion point is a bit more complex. If you invoke multispell with: multispell -T latin1 -m english deutsch francais it will fail because the English dictionary doesn't recognize "latin1" as a valid encoding. How do people think I should handle these variations among affix files? One obvious option would be to make the -T switch be dictionary-specific in multispell, so you'd write: multispell -m -T list english -T latin1 deutsch -T latin1 francais Another option would be to insist that all affix files follow a common naming scheme, so that everybody would be willing to accept "latin1" as an encoding name, and so forth. >>From my point of view, both options are bad. The first requires too much intelligence on the part of ispell.el. The second is going to be hard to enforce. Opinions are welcomed. -- Geoff Kuenning geoff@cs.hmc.edu http://www.cs.hmc.edu/~geoff/ Windows XP is the "most reliable Windows ever," which is like saying that asparagus is "the most articulate vegetable ever." -- Dave Barry