unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Geoff Kuenning <geoff@cs.hmc.edu>
Cc: Kenichi Handa <handa@m17n.org>,
	130397@bugs.debian.org, agustin.martin@hispalinux.es,
	lionel@mamane.lu, emacs-devel@gnu.org, juri@jurta.org,
	Stefan Monnier <monnier@iro.umontreal.ca>
Subject: Re: Bug 130397
Date: 08 Jan 2005 13:31:11 +0100	[thread overview]
Message-ID: <pni4qhsje5s.fsf@bow.cs.hmc.edu> (raw)
In-Reply-To: <28878.1105029010@ichips.intel.com>

Ken writes:

> Geoff has a much better understanding of the underlying spell search
> engine.  Perhaps he can shed additional light on this topic.

I just looked at the code to be sure my memory is correct.  Here's the
short rundown: in the '-a' interface, ispell interfaces with the
outside world purely in a byte-indexed mode.  It is perfectly capable
of handling UTF-8 and similar multi-byte encodings, but when it
reports the offsets of incorrect words, it does so as a byte offset,
not a character offset.

Does emacs provide an underlying byte-indexed interface to the buffer?
If so, life should be easy: just have ispell.el use that interface.
If not, I think life is going to be very, very difficult.  It's
possible that I could modify ispell to provide a display-width index
rather than a byte index, but it's not trivial and there may be
pitfalls.  There's also the problem that--even if I get off my butt
and produce a new release reasonably soon--there are lots of old
copies of ispell out there that wouldn't support the new interface.

Juri writes:

> And while on this topic, I want to remind that many Emacs users suffer
> from the inability of ispell.el to simultaneously check mixed multi-language
> texts.  So, whoever fixes ispell.el, please take that into account.
> Such combining is quite easily doable for any disjoint alphabets, as well
> as for alphabets where one alphabet is a superset of another, like e.g.
> English and some other Latin-based alphabets.  Even for overlapping
> alphabets it would be possible with using the `w' syntax to get a word
> and to feed it to different ispell instances for each dictionary.

I'm not entirely sure what you mean here.  For disjoint alphabets,
it's certainly relatively easy to figure out which word should go to
which ispell instance.  For identical, superset, or overlapping
alphabets, the problem is basically insoluable.  For example, "fra" is
a misspelling in English but legal in Italian.  If it appears in a
mixed passage, which dictionary should it be fed to?  The only
solution would seem to be to require the user to mark passages in some
way, as is done in HTML.
-- 
    Geoff Kuenning   geoff@cs.hmc.edu   http://www.cs.hmc.edu/~geoff/

One could not be a successful scientist without realizing that, in contrast to
the popular conception supported by newspapers and mothers of scientists, a
goodly number of scientists are not only narrow-minded and dull, but also just
stupid. -- James Watson

  parent reply	other threads:[~2005-01-08 12:31 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <Pine.LNX.4.43.0305140821370.30166-100000@wr-linux02.rki.ivbb.bund.de>
     [not found] ` <m3addpd2ur.fsf@dionysos.nib>
     [not found]   ` <E19HNCh-0000tv-00@fencepost.gnu.org>
     [not found]     ` <20040517120658.GA6919@agmartin.aq.upm.es>
     [not found]       ` <E1BQ5z5-0000f4-5u@fencepost.gnu.org>
2004-05-19 11:44         ` Bug 130397 (Was: Emacs - Ispell problem with i[no]german dictionary) Agustin Martin
2004-05-21  8:01           ` Agustin Martin
2004-12-17 12:15       ` Agustin Martin
2004-12-22 12:37         ` Kenichi Handa
2004-12-22 17:13           ` Agustin Martin
2005-01-04 12:50             ` Kenichi Handa
2005-01-04 14:55               ` Bug 130397 Stefan
2005-01-05  2:00                 ` Kenichi Handa
2005-01-05  4:42                   ` Stefan Monnier
2005-01-05  5:50                     ` Kenichi Handa
2005-01-05 14:02                       ` Stefan Monnier
2005-01-06  0:44                         ` Kenichi Handa
2005-01-06 16:30                           ` Ken Stevens
2005-01-06 17:33                             ` Stefan Monnier
2005-01-07  0:39                               ` Kenichi Handa
2005-01-07 15:48                             ` Agustin Martin
2005-01-08 12:31                             ` Geoff Kuenning [this message]
2005-01-08 12:47                               ` David Kastrup
2005-01-08 13:29                                 ` Miles Bader
2005-01-08 17:15                                   ` Geoff Kuenning
2005-01-10  4:45                                   ` Eli Zaretskii
2005-01-10  9:09                                     ` David Kastrup
2005-01-10 20:16                                       ` Eli Zaretskii
2005-01-13  7:50                                       ` Kenichi Handa
2005-01-08 22:39                               ` Peter Heslin
2005-01-07 15:36                       ` Agustin Martin
2005-01-07 20:29                         ` Ken Stevens
2005-01-07 21:27                         ` Juri Linkov
2005-01-13  5:59                           ` Kenichi Handa
2005-01-18 10:44                             ` Juri Linkov
2005-01-18 13:57                               ` Geoff Kuenning
2005-01-19  7:34                                 ` Juri Linkov
2005-01-19 12:22                                   ` Geoff Kuenning
2005-04-29  0:29                                   ` Geoff Kuenning
2005-04-29  8:45                                     ` Thien-Thi Nguyen
2005-01-18 23:24                               ` Kenichi Handa
2005-01-19  7:43                                 ` Juri Linkov
2005-01-19 12:52                                   ` Kenichi Handa
2005-01-19 13:08                                     ` David Kastrup
2005-01-07 15:34               ` Bug 130397 (Was: Emacs - Ispell problem with i[no]german dictionary) Agustin Martin
2005-01-10 13:06             ` Lionel Elie Mamane
2005-01-10 17:16               ` Agustin Martin
2005-01-11  5:16                 ` Kenichi Handa
2005-01-11 19:56                   ` Agustin Martin
2005-01-11 21:39                     ` Lionel Elie Mamane
2005-01-12  7:37                     ` Kenichi Handa
2005-01-12 19:17                       ` Agustin Martin
2005-01-13  5:53                         ` Kenichi Handa
2005-01-11 14:29                 ` Richard Stallman
2005-01-12  7:45                   ` Kenichi Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pni4qhsje5s.fsf@bow.cs.hmc.edu \
    --to=geoff@cs.hmc.edu \
    --cc=130397@bugs.debian.org \
    --cc=agustin.martin@hispalinux.es \
    --cc=emacs-devel@gnu.org \
    --cc=handa@m17n.org \
    --cc=juri@jurta.org \
    --cc=lionel@mamane.lu \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).