From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Geoff Kuenning Newsgroups: gmane.emacs.devel Subject: Re: Bug 130397 Date: 08 Jan 2005 13:31:11 +0100 Message-ID: References: <28878.1105029010@ichips.intel.com> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1105187981 4961 80.91.229.6 (8 Jan 2005 12:39:41 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 8 Jan 2005 12:39:41 +0000 (UTC) Cc: Kenichi Handa , 130397@bugs.debian.org, agustin.martin@hispalinux.es, lionel@mamane.lu, emacs-devel@gnu.org, juri@jurta.org, Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Jan 08 13:39:33 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1CnFsS-0000aH-00 for ; Sat, 08 Jan 2005 13:39:32 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1CnG3r-000373-JR for ged-emacs-devel@m.gmane.org; Sat, 08 Jan 2005 07:51:19 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1CnG2r-0002x4-ID for emacs-devel@gnu.org; Sat, 08 Jan 2005 07:50:17 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1CnG2S-0002nZ-SI for emacs-devel@gnu.org; Sat, 08 Jan 2005 07:49:57 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1CnG2R-0002kq-Bp for emacs-devel@gnu.org; Sat, 08 Jan 2005 07:49:51 -0500 Original-Received: from [134.173.42.59] (helo=mallet.cs.hmc.edu) by monty-python.gnu.org with esmtp (Exim 4.34) id 1CnFkW-0002GI-HJ for emacs-devel@gnu.org; Sat, 08 Jan 2005 07:31:20 -0500 Original-Received: from bow.cs.hmc.edu (bow-vpn.cs.hmc.edu [192.168.6.2]) by mallet.cs.hmc.edu (Postfix) with ESMTP id 7D182298290; Sat, 8 Jan 2005 04:31:18 -0800 (PST) Original-Received: by bow.cs.hmc.edu (Postfix, from userid 13409) id CFC8F249D5; Sat, 8 Jan 2005 13:31:11 +0100 (CET) Original-To: Ken Stevens In-Reply-To: <28878.1105029010@ichips.intel.com> Original-Lines: 47 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:32034 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:32034 Ken writes: > Geoff has a much better understanding of the underlying spell search > engine. Perhaps he can shed additional light on this topic. I just looked at the code to be sure my memory is correct. Here's the short rundown: in the '-a' interface, ispell interfaces with the outside world purely in a byte-indexed mode. It is perfectly capable of handling UTF-8 and similar multi-byte encodings, but when it reports the offsets of incorrect words, it does so as a byte offset, not a character offset. Does emacs provide an underlying byte-indexed interface to the buffer? If so, life should be easy: just have ispell.el use that interface. If not, I think life is going to be very, very difficult. It's possible that I could modify ispell to provide a display-width index rather than a byte index, but it's not trivial and there may be pitfalls. There's also the problem that--even if I get off my butt and produce a new release reasonably soon--there are lots of old copies of ispell out there that wouldn't support the new interface. Juri writes: > And while on this topic, I want to remind that many Emacs users suffer > from the inability of ispell.el to simultaneously check mixed multi-language > texts. So, whoever fixes ispell.el, please take that into account. > Such combining is quite easily doable for any disjoint alphabets, as well > as for alphabets where one alphabet is a superset of another, like e.g. > English and some other Latin-based alphabets. Even for overlapping > alphabets it would be possible with using the `w' syntax to get a word > and to feed it to different ispell instances for each dictionary. I'm not entirely sure what you mean here. For disjoint alphabets, it's certainly relatively easy to figure out which word should go to which ispell instance. For identical, superset, or overlapping alphabets, the problem is basically insoluable. For example, "fra" is a misspelling in English but legal in Italian. If it appears in a mixed passage, which dictionary should it be fed to? The only solution would seem to be to require the user to mark passages in some way, as is done in HTML. -- Geoff Kuenning geoff@cs.hmc.edu http://www.cs.hmc.edu/~geoff/ One could not be a successful scientist without realizing that, in contrast to the popular conception supported by newspapers and mothers of scientists, a goodly number of scientists are not only narrow-minded and dull, but also just stupid. -- James Watson