From mboxrd@z Thu Jan  1 00:00:00 1970
Path: main.gmane.org!not-for-mail
From: Geoff Kuenning <geoff@cs.hmc.edu>
Newsgroups: gmane.emacs.devel
Subject: Re: Bug 130397
Date: 08 Jan 2005 13:31:11 +0100
Message-ID: <pni4qhsje5s.fsf@bow.cs.hmc.edu>
References: <28878.1105029010@ichips.intel.com>
NNTP-Posting-Host: deer.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: sea.gmane.org 1105187981 4961 80.91.229.6 (8 Jan 2005 12:39:41 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Sat, 8 Jan 2005 12:39:41 +0000 (UTC)
Cc: Kenichi Handa <handa@m17n.org>, 130397@bugs.debian.org,
	agustin.martin@hispalinux.es, lionel@mamane.lu,
	emacs-devel@gnu.org, juri@jurta.org,
	Stefan Monnier <monnier@iro.umontreal.ca>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Jan 08 13:39:33 2005
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Original-Received: from lists.gnu.org ([199.232.76.165])
	by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 1CnFsS-0000aH-00
	for <ged-emacs-devel@m.gmane.org>; Sat, 08 Jan 2005 13:39:32 +0100
Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1CnG3r-000373-JR
	for ged-emacs-devel@m.gmane.org; Sat, 08 Jan 2005 07:51:19 -0500
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1CnG2r-0002x4-ID
	for emacs-devel@gnu.org; Sat, 08 Jan 2005 07:50:17 -0500
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1CnG2S-0002nZ-SI
	for emacs-devel@gnu.org; Sat, 08 Jan 2005 07:49:57 -0500
Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1CnG2R-0002kq-Bp
	for emacs-devel@gnu.org; Sat, 08 Jan 2005 07:49:51 -0500
Original-Received: from [134.173.42.59] (helo=mallet.cs.hmc.edu)
	by monty-python.gnu.org with esmtp (Exim 4.34) id 1CnFkW-0002GI-HJ
	for emacs-devel@gnu.org; Sat, 08 Jan 2005 07:31:20 -0500
Original-Received: from bow.cs.hmc.edu (bow-vpn.cs.hmc.edu [192.168.6.2])
	by mallet.cs.hmc.edu (Postfix) with ESMTP id 7D182298290;
	Sat,  8 Jan 2005 04:31:18 -0800 (PST)
Original-Received: by bow.cs.hmc.edu (Postfix, from userid 13409)
	id CFC8F249D5; Sat,  8 Jan 2005 13:31:11 +0100 (CET)
Original-To: Ken Stevens <kstevens@ichips.intel.com>
In-Reply-To: <28878.1105029010@ichips.intel.com>
Original-Lines: 47
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: main.gmane.org gmane.emacs.devel:32034
X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:32034

Ken writes:

> Geoff has a much better understanding of the underlying spell search
> engine.  Perhaps he can shed additional light on this topic.

I just looked at the code to be sure my memory is correct.  Here's the
short rundown: in the '-a' interface, ispell interfaces with the
outside world purely in a byte-indexed mode.  It is perfectly capable
of handling UTF-8 and similar multi-byte encodings, but when it
reports the offsets of incorrect words, it does so as a byte offset,
not a character offset.

Does emacs provide an underlying byte-indexed interface to the buffer?
If so, life should be easy: just have ispell.el use that interface.
If not, I think life is going to be very, very difficult.  It's
possible that I could modify ispell to provide a display-width index
rather than a byte index, but it's not trivial and there may be
pitfalls.  There's also the problem that--even if I get off my butt
and produce a new release reasonably soon--there are lots of old
copies of ispell out there that wouldn't support the new interface.

Juri writes:

> And while on this topic, I want to remind that many Emacs users suffer
> from the inability of ispell.el to simultaneously check mixed multi-language
> texts.  So, whoever fixes ispell.el, please take that into account.
> Such combining is quite easily doable for any disjoint alphabets, as well
> as for alphabets where one alphabet is a superset of another, like e.g.
> English and some other Latin-based alphabets.  Even for overlapping
> alphabets it would be possible with using the `w' syntax to get a word
> and to feed it to different ispell instances for each dictionary.

I'm not entirely sure what you mean here.  For disjoint alphabets,
it's certainly relatively easy to figure out which word should go to
which ispell instance.  For identical, superset, or overlapping
alphabets, the problem is basically insoluable.  For example, "fra" is
a misspelling in English but legal in Italian.  If it appears in a
mixed passage, which dictionary should it be fed to?  The only
solution would seem to be to require the user to mark passages in some
way, as is done in HTML.
-- 
    Geoff Kuenning   geoff@cs.hmc.edu   http://www.cs.hmc.edu/~geoff/

One could not be a successful scientist without realizing that, in contrast to
the popular conception supported by newspapers and mothers of scientists, a
goodly number of scientists are not only narrow-minded and dull, but also just
stupid. -- James Watson