all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Ken Stevens <kstevens@ichips.intel.com>
Cc: k.stevens@ieee.org, 130397@bugs.debian.org,
	agustin.martin@hispalinux.es, lionel@mamane.lu,
	emacs-devel@gnu.org, Stefan Monnier <monnier@iro.umontreal.ca>
Subject: Re: Bug 130397
Date: Thu, 06 Jan 2005 08:30:10 -0800	[thread overview]
Message-ID: <28878.1105029010@ichips.intel.com> (raw)
In-Reply-To: Your message of "Thu, 06 Jan 2005 09:44:28 +0900." <200501060044.JAA15682@etlken.m17n.org>

Kenichi Handa writes:


> In article <87llb8htbf.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:
>
>>>>  But ispell.el should be able to automatically check whether the
>>>>  chars can be safely encoded with the coding-system and if not (as
>>>>  in your example), ispell.el will know that the word can't be
>>>>  checked by ispell and should just be skipped (and maybe marked as
>>>>  "uncheckable").
>
>>>  That seems to be a good approach.  But, just checking
>>>  whether the chars is encodable with the coding-system is not
>>>  enough.  For instance, entry for "francais" dict doesn't
>>>  contain "ñ" in CASECHARS, but "español" is safely encodable
>>>  by iso-8859-1.  So, the same error happens.  For ispell.el
>>>  to know that "español" is uncheckable, we anyway need the
>>>  current database ispell-dictionary-alist.
>
>> Aaaahhhh.... I'm beginning to understand, thank you.  But I still
>> think ispell.el should not try to check "espa" and "ol".  So I now
>> agree that the CASECHARS table is needed, but it should be used after
>> encoding the word (rather than when determining what is a word), and
>> if some char is not in CASECHARS the word should be flagged as
>> uncheckable.
>
> Although I have not yet understood the detail, "if some char
> is not in CASECHARS" is not enough.  First of all, CASECHARS
> is a regular expression.  And NOT-CASECHARS, OTHERCHARS,
> MANU-OTHERCHARS-P should also be checked somehow.  If that
> is the way we are going to take, I'd like to ask maintainers
> of ispell.el to do such a change.

Remember that the internationalization of ispell was done long before the
MULE code was added to emacs.  The encoding of the character sets and
the interaction between ispell and emacs was embodied in the ispell code
and interactions.  In ispell.el, this has been controlled by the
CASECHARS, NOT-CASECHARS, OTHERCHARS, MANY-OTHERCHARS-P,
EXTENDED-CHARACER-MODE, and CHARACTER-SET.

The problem is more complicated than simply parsing what are word
characters.  There are differences in encoding when one uses latex as
the source with it's encoding of latin characters with escape sequences
versus a raw ISO character set.  For instance, the dictionary stores
information regarding compound words, possessives, etc. in the spell
checking routines.  Knowing that the "'" character is used as a
possessive, for instance, ispell knows that "Ken's" is a correct
spelling based on the root "Ken".

Most of this complication can be invisibly hidden in ispell.  The
problems mainly arise in two circumstances.

1. when spell checking a single word.
2. when an error occurs and the error is highlighted.

For instance, one of the major issues when MULE was implemented was the
fact that multiple bytes passed to ispell may only count as a single
byte or character on the display.

Here is where most of the hassles with libraries occur.  There may well
be a much better way of encoding the character sets and interactions
right now.  Perhaps we should investigate simplifying and possibly
removing the character set issues.  We would still minimally need to
communicate mode information to ispell.

Geoff has a much better understanding of the underlying spell search
engine.  Perhaps he can shed additional light on this topic.

regards		 -Ken

  reply	other threads:[~2005-01-06 16:30 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <Pine.LNX.4.43.0305140821370.30166-100000@wr-linux02.rki.ivbb.bund.de>
     [not found] ` <m3addpd2ur.fsf@dionysos.nib>
     [not found]   ` <E19HNCh-0000tv-00@fencepost.gnu.org>
     [not found]     ` <20040517120658.GA6919@agmartin.aq.upm.es>
     [not found]       ` <E1BQ5z5-0000f4-5u@fencepost.gnu.org>
2004-05-19 11:44         ` Bug 130397 (Was: Emacs - Ispell problem with i[no]german dictionary) Agustin Martin
2004-05-21  8:01           ` Agustin Martin
2004-12-17 12:15       ` Agustin Martin
2004-12-22 12:37         ` Kenichi Handa
2004-12-22 17:13           ` Agustin Martin
2005-01-04 12:50             ` Kenichi Handa
2005-01-04 14:55               ` Bug 130397 Stefan
2005-01-05  2:00                 ` Kenichi Handa
2005-01-05  4:42                   ` Stefan Monnier
2005-01-05  5:50                     ` Kenichi Handa
2005-01-05 14:02                       ` Stefan Monnier
2005-01-06  0:44                         ` Kenichi Handa
2005-01-06 16:30                           ` Ken Stevens [this message]
2005-01-06 17:33                             ` Stefan Monnier
2005-01-07  0:39                               ` Kenichi Handa
2005-01-07 15:48                             ` Agustin Martin
2005-01-08 12:31                             ` Geoff Kuenning
2005-01-08 12:47                               ` David Kastrup
2005-01-08 13:29                                 ` Miles Bader
2005-01-08 17:15                                   ` Geoff Kuenning
2005-01-10  4:45                                   ` Eli Zaretskii
2005-01-10  9:09                                     ` David Kastrup
2005-01-10 20:16                                       ` Eli Zaretskii
2005-01-13  7:50                                       ` Kenichi Handa
2005-01-08 22:39                               ` Peter Heslin
2005-01-07 15:36                       ` Agustin Martin
2005-01-07 20:29                         ` Ken Stevens
2005-01-07 21:27                         ` Juri Linkov
2005-01-13  5:59                           ` Kenichi Handa
2005-01-18 10:44                             ` Juri Linkov
2005-01-18 13:57                               ` Geoff Kuenning
2005-01-19  7:34                                 ` Juri Linkov
2005-01-19 12:22                                   ` Geoff Kuenning
2005-04-29  0:29                                   ` Geoff Kuenning
2005-04-29  8:45                                     ` Thien-Thi Nguyen
2005-01-18 23:24                               ` Kenichi Handa
2005-01-19  7:43                                 ` Juri Linkov
2005-01-19 12:52                                   ` Kenichi Handa
2005-01-19 13:08                                     ` David Kastrup
2005-01-07 15:34               ` Bug 130397 (Was: Emacs - Ispell problem with i[no]german dictionary) Agustin Martin
2005-01-10 13:06             ` Lionel Elie Mamane
2005-01-10 17:16               ` Agustin Martin
2005-01-11  5:16                 ` Kenichi Handa
2005-01-11 19:56                   ` Agustin Martin
2005-01-11 21:39                     ` Lionel Elie Mamane
2005-01-12  7:37                     ` Kenichi Handa
2005-01-12 19:17                       ` Agustin Martin
2005-01-13  5:53                         ` Kenichi Handa
2005-01-11 14:29                 ` Richard Stallman
2005-01-12  7:45                   ` Kenichi Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=28878.1105029010@ichips.intel.com \
    --to=kstevens@ichips.intel.com \
    --cc=130397@bugs.debian.org \
    --cc=agustin.martin@hispalinux.es \
    --cc=emacs-devel@gnu.org \
    --cc=k.stevens@ieee.org \
    --cc=lionel@mamane.lu \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.