From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Ken Stevens Newsgroups: gmane.emacs.devel Subject: Re: Bug 130397 Date: Thu, 06 Jan 2005 08:30:10 -0800 Message-ID: <28878.1105029010@ichips.intel.com> References: <200501060044.JAA15682@etlken.m17n.org> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 (generated by tm-edit 7.106) Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: sea.gmane.org 1105029891 27977 80.91.229.6 (6 Jan 2005 16:44:51 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Thu, 6 Jan 2005 16:44:51 +0000 (UTC) Cc: k.stevens@ieee.org, 130397@bugs.debian.org, agustin.martin@hispalinux.es, lionel@mamane.lu, emacs-devel@gnu.org, Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Jan 06 17:44:29 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1CmaXS-0003T1-00 for ; Thu, 06 Jan 2005 17:31:07 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1Cmail-0004VX-Rh for ged-emacs-devel@m.gmane.org; Thu, 06 Jan 2005 11:42:47 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33) id 1Cmaif-0004VR-6B for emacs-devel@gnu.org; Thu, 06 Jan 2005 11:42:41 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.33) id 1Cmaie-0004VF-NY for emacs-devel@gnu.org; Thu, 06 Jan 2005 11:42:40 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1Cmaie-0004VC-KP for emacs-devel@gnu.org; Thu, 06 Jan 2005 11:42:40 -0500 Original-Received: from [134.134.136.18] (helo=orsfmr004.jf.intel.com) by monty-python.gnu.org with esmtp (Exim 4.34) id 1CmaXA-0002yS-H1 for emacs-devel@gnu.org; Thu, 06 Jan 2005 11:30:48 -0500 Original-Received: from orsfmr101.jf.intel.com (orsfmr101.jf.intel.com [10.7.209.17]) by orsfmr004.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j06GUFBU001085; Thu, 6 Jan 2005 16:30:15 GMT Original-Received: from ichips-ra.pdx.intel.com (ichips-ra-hme2.intel.com [10.7.5.35]) by orsfmr101.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j06GUFud003630; Thu, 6 Jan 2005 16:30:15 GMT Original-Received: from dttlx129.pdx.intel.com (dttlx129.pdx.intel.com [10.7.45.148]) by ichips-ra.pdx.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j06GUEgH027648; Thu, 6 Jan 2005 08:30:14 -0800 (PST) Original-Received: from ichips.intel.com (localhost.pdx.intel.com [127.0.0.1]) by dttlx129.pdx.intel.com (8.12.9/8.12.8/MailSET/client) with ESMTP id j06GUAgo028879; Thu, 6 Jan 2005 08:30:11 -0800 Original-To: Kenichi Handa , ispell-bugs@itcorp.com In-Reply-To: Your message of "Thu, 06 Jan 2005 09:44:28 +0900." <200501060044.JAA15682@etlken.m17n.org> X-Scanned-By: MIMEDefang 2.44 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:31958 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:31958 Kenichi Handa writes: > In article <87llb8htbf.fsf-monnier+emacs@gnu.org>, Stefan Monnier writes: > >>>> But ispell.el should be able to automatically check whether the >>>> chars can be safely encoded with the coding-system and if not (as >>>> in your example), ispell.el will know that the word can't be >>>> checked by ispell and should just be skipped (and maybe marked as >>>> "uncheckable"). > >>> That seems to be a good approach. But, just checking >>> whether the chars is encodable with the coding-system is not >>> enough. For instance, entry for "francais" dict doesn't >>> contain "=F1" in CASECHARS, but "espa=F1ol" is safely encodable >>> by iso-8859-1. So, the same error happens. For ispell.el >>> to know that "espa=F1ol" is uncheckable, we anyway need the >>> current database ispell-dictionary-alist. > >> Aaaahhhh.... I'm beginning to understand, thank you. But I still >> think ispell.el should not try to check "espa" and "ol". So I now >> agree that the CASECHARS table is needed, but it should be used after >> encoding the word (rather than when determining what is a word), and >> if some char is not in CASECHARS the word should be flagged as >> uncheckable. > > Although I have not yet understood the detail, "if some char > is not in CASECHARS" is not enough. First of all, CASECHARS > is a regular expression. And NOT-CASECHARS, OTHERCHARS, > MANU-OTHERCHARS-P should also be checked somehow. If that > is the way we are going to take, I'd like to ask maintainers > of ispell.el to do such a change. Remember that the internationalization of ispell was done long before the MULE code was added to emacs. The encoding of the character sets and the interaction between ispell and emacs was embodied in the ispell code and interactions. In ispell.el, this has been controlled by the CASECHARS, NOT-CASECHARS, OTHERCHARS, MANY-OTHERCHARS-P, EXTENDED-CHARACER-MODE, and CHARACTER-SET. The problem is more complicated than simply parsing what are word characters. There are differences in encoding when one uses latex as the source with it's encoding of latin characters with escape sequences versus a raw ISO character set. For instance, the dictionary stores information regarding compound words, possessives, etc. in the spell checking routines. Knowing that the "'" character is used as a possessive, for instance, ispell knows that "Ken's" is a correct spelling based on the root "Ken". Most of this complication can be invisibly hidden in ispell. The problems mainly arise in two circumstances. 1. when spell checking a single word. 2. when an error occurs and the error is highlighted. For instance, one of the major issues when MULE was implemented was the fact that multiple bytes passed to ispell may only count as a single byte or character on the display. Here is where most of the hassles with libraries occur. There may well be a much better way of encoding the character sets and interactions right now. Perhaps we should investigate simplifying and possibly removing the character set issues. We would still minimally need to communicate mode information to ispell. Geoff has a much better understanding of the underlying spell search engine. Perhaps he can shed additional light on this topic. regards=09=09 -Ken