From mboxrd@z Thu Jan  1 00:00:00 1970
Path: main.gmane.org!not-for-mail
From: Ken Stevens <kstevens@ichips.intel.com>
Newsgroups: gmane.emacs.devel
Subject: Re: Bug 130397
Date: Thu, 06 Jan 2005 08:30:10 -0800
Message-ID: <28878.1105029010@ichips.intel.com>
References: <200501060044.JAA15682@etlken.m17n.org>
NNTP-Posting-Host: deer.gmane.org
Mime-Version: 1.0 (generated by tm-edit 7.106)
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-Trace: sea.gmane.org 1105029891 27977 80.91.229.6 (6 Jan 2005 16:44:51 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Thu, 6 Jan 2005 16:44:51 +0000 (UTC)
Cc: k.stevens@ieee.org, 130397@bugs.debian.org, agustin.martin@hispalinux.es,
	lionel@mamane.lu, emacs-devel@gnu.org,
	Stefan Monnier <monnier@iro.umontreal.ca>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Jan 06 17:44:29 2005
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Original-Received: from lists.gnu.org ([199.232.76.165])
	by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 1CmaXS-0003T1-00
	for <ged-emacs-devel@m.gmane.org>; Thu, 06 Jan 2005 17:31:07 +0100
Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.33)
	id 1Cmail-0004VX-Rh
	for ged-emacs-devel@m.gmane.org; Thu, 06 Jan 2005 11:42:47 -0500
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33)
	id 1Cmaif-0004VR-6B
	for emacs-devel@gnu.org; Thu, 06 Jan 2005 11:42:41 -0500
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.33)
	id 1Cmaie-0004VF-NY
	for emacs-devel@gnu.org; Thu, 06 Jan 2005 11:42:40 -0500
Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.33) id 1Cmaie-0004VC-KP
	for emacs-devel@gnu.org; Thu, 06 Jan 2005 11:42:40 -0500
Original-Received: from [134.134.136.18] (helo=orsfmr004.jf.intel.com)
	by monty-python.gnu.org with esmtp (Exim 4.34) id 1CmaXA-0002yS-H1
	for emacs-devel@gnu.org; Thu, 06 Jan 2005 11:30:48 -0500
Original-Received: from orsfmr101.jf.intel.com (orsfmr101.jf.intel.com [10.7.209.17])
	by orsfmr004.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1
	2004/09/17 17:50:56 root Exp $) with ESMTP id j06GUFBU001085; 
	Thu, 6 Jan 2005 16:30:15 GMT
Original-Received: from ichips-ra.pdx.intel.com (ichips-ra-hme2.intel.com [10.7.5.35])
	by orsfmr101.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2
	2004/09/17 18:05:01 root Exp $) with ESMTP id j06GUFud003630; 
	Thu, 6 Jan 2005 16:30:15 GMT
Original-Received: from dttlx129.pdx.intel.com (dttlx129.pdx.intel.com [10.7.45.148])
	by ichips-ra.pdx.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id
	j06GUEgH027648; Thu, 6 Jan 2005 08:30:14 -0800 (PST)
Original-Received: from ichips.intel.com (localhost.pdx.intel.com [127.0.0.1])
	by dttlx129.pdx.intel.com (8.12.9/8.12.8/MailSET/client) with ESMTP id
	j06GUAgo028879; Thu, 6 Jan 2005 08:30:11 -0800
Original-To: Kenichi Handa <handa@m17n.org>, ispell-bugs@itcorp.com
In-Reply-To: Your message of "Thu, 06 Jan 2005 09:44:28 +0900."
	<200501060044.JAA15682@etlken.m17n.org> 
X-Scanned-By: MIMEDefang 2.44
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: main.gmane.org gmane.emacs.devel:31958
X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:31958

Kenichi Handa writes: 


> In article <87llb8htbf.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnie=
r@iro.umontreal.ca> writes:
> 
>>>>  But ispell.el should be able to automatically check whether the
>>>>  chars can be safely encoded with the coding-system and if not (as
>>>>  in your example), ispell.el will know that the word can't be
>>>>  checked by ispell and should just be skipped (and maybe marked as
>>>>  "uncheckable").
> 
>>>  That seems to be a good approach.  But, just checking
>>>  whether the chars is encodable with the coding-system is not
>>>  enough.  For instance, entry for "francais" dict doesn't
>>>  contain "=F1" in CASECHARS, but "espa=F1ol" is safely encodable
>>>  by iso-8859-1.  So, the same error happens.  For ispell.el
>>>  to know that "espa=F1ol" is uncheckable, we anyway need the
>>>  current database ispell-dictionary-alist.
> 
>> Aaaahhhh.... I'm beginning to understand, thank you.  But I still
>> think ispell.el should not try to check "espa" and "ol".  So I now
>> agree that the CASECHARS table is needed, but it should be used after
>> encoding the word (rather than when determining what is a word), and
>> if some char is not in CASECHARS the word should be flagged as
>> uncheckable.
> 
> Although I have not yet understood the detail, "if some char
> is not in CASECHARS" is not enough.  First of all, CASECHARS
> is a regular expression.  And NOT-CASECHARS, OTHERCHARS,
> MANU-OTHERCHARS-P should also be checked somehow.  If that
> is the way we are going to take, I'd like to ask maintainers
> of ispell.el to do such a change.

Remember that the internationalization of ispell was done long before the
MULE code was added to emacs.  The encoding of the character sets and
the interaction between ispell and emacs was embodied in the ispell code
and interactions.  In ispell.el, this has been controlled by the
CASECHARS, NOT-CASECHARS, OTHERCHARS, MANY-OTHERCHARS-P,
EXTENDED-CHARACER-MODE, and CHARACTER-SET.

The problem is more complicated than simply parsing what are word
characters.  There are differences in encoding when one uses latex as
the source with it's encoding of latin characters with escape sequences
versus a raw ISO character set.  For instance, the dictionary stores
information regarding compound words, possessives, etc. in the spell
checking routines.  Knowing that the "'" character is used as a
possessive, for instance, ispell knows that "Ken's" is a correct
spelling based on the root "Ken".

Most of this complication can be invisibly hidden in ispell.  The
problems mainly arise in two circumstances.

1. when spell checking a single word.
2. when an error occurs and the error is highlighted.

For instance, one of the major issues when MULE was implemented was the
fact that multiple bytes passed to ispell may only count as a single
byte or character on the display.

Here is where most of the hassles with libraries occur.  There may well
be a much better way of encoding the character sets and interactions
right now.  Perhaps we should investigate simplifying and possibly
removing the character set issues.  We would still minimally need to
communicate mode information to ispell.

Geoff has a much better understanding of the underlying spell search
engine.  Perhaps he can shed additional light on this topic.

regards=09=09 -Ken