From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: Bug 130397 Date: Thu, 06 Jan 2005 12:33:11 -0500 Message-ID: References: <28878.1105029010@ichips.intel.com> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1105033367 7121 80.91.229.6 (6 Jan 2005 17:42:47 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Thu, 6 Jan 2005 17:42:47 +0000 (UTC) Cc: Kenichi Handa , k.stevens@ieee.org, 130397@bugs.debian.org, agustin.martin@hispalinux.es, lionel@mamane.lu, emacs-devel@gnu.org, ispell-bugs@itcorp.com Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Jan 06 18:42:29 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1CmbX9-00029a-00 for ; Thu, 06 Jan 2005 18:34:51 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1CmbiS-0006Kf-VX for ged-emacs-devel@m.gmane.org; Thu, 06 Jan 2005 12:46:33 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33) id 1CmbhP-0005Pw-D9 for emacs-devel@gnu.org; Thu, 06 Jan 2005 12:45:27 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.33) id 1CmbhN-0005OO-0l for emacs-devel@gnu.org; Thu, 06 Jan 2005 12:45:26 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1CmbhM-0005No-He for emacs-devel@gnu.org; Thu, 06 Jan 2005 12:45:24 -0500 Original-Received: from [132.204.24.67] (helo=mercure.iro.umontreal.ca) by monty-python.gnu.org with esmtp (Exim 4.34) id 1CmbVt-00011i-Ff for emacs-devel@gnu.org; Thu, 06 Jan 2005 12:33:33 -0500 Original-Received: from hidalgo.iro.umontreal.ca (hidalgo.iro.umontreal.ca [132.204.27.50]) by mercure.iro.umontreal.ca (Postfix) with ESMTP id 309C48282C3; Thu, 6 Jan 2005 12:33:33 -0500 (EST) Original-Received: from asado.iro.umontreal.ca (asado.iro.umontreal.ca [132.204.24.84]) by hidalgo.iro.umontreal.ca (Postfix) with ESMTP id B317D4AC279; Thu, 6 Jan 2005 12:33:16 -0500 (EST) Original-Received: by asado.iro.umontreal.ca (Postfix, from userid 20848) id 67CD38CA69; Thu, 6 Jan 2005 12:33:11 -0500 (EST) Original-To: Ken Stevens In-Reply-To: <28878.1105029010@ichips.intel.com> (Ken Stevens's message of "Thu, 06 Jan 2005 08:30:10 -0800") User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/21.3.50 (gnu/linux) X-DIRO-MailScanner-Information: Please contact the ISP for more information X-DIRO-MailScanner: Found to be clean X-DIRO-MailScanner-SpamCheck: n'est pas un polluriel, SpamAssassin (score=-4.665, requis 5, AWL 0.23, BAYES_00 -4.90, UPPERCASE_25_50 0.00) X-MailScanner-From: monnier@iro.umontreal.ca X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:31960 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:31960 > Remember that the internationalization of ispell was done long before the > MULE code was added to emacs. Actually, it's this understanding that leads me to think that CASECHARS, NOT-CASECHARS, OTHERCHARS, MANY-OTHERCHARS-P, EXTENDED-CHARACER-MODE, and CHARACTER-SET, should be used after encoding the word. Before MULE, Emacs only worked with single-byte coding systems (things like latin-1, but not iso-2022 or utf-8) and the exact same coding-system was used by ispell, so ispell.el's CASECHARS, NOT-CASECHARS, OTHERCHARS, MANY-OTHERCHARS-P, EXTENDED-CHARACER-MODE, and CHARACTER-SET applied to *encoded* text (i.e. text in latin-1 encoding, not in the internal encoding used in Emacs MULE). So it would seem to make sense (in order to simulate the pre-MULE behavior), to first encode the text (into latin-1 or somesuch singlebyte coding system) and then use CASECHARS, NOT-CASECHARS, OTHERCHARS, MANY-OTHERCHARS-P, EXTENDED-CHARACER-MODE, and CHARACTER-SET. Now encoding the whole text can't be realistically done, so we need to first recognize words, then encode them, then use those vars. I.e. the word-recogniztion code shouldn't use CASECHARS, NOT-CASECHARS, OTHERCHARS, MANY-OTHERCHARS-P, EXTENDED-CHARACER-MODE, and CHARACTER-SET. > For instance, one of the major issues when MULE was implemented was the > fact that multiple bytes passed to ispell may only count as a single > byte or character on the display. How/when can that happen? Can you give an example? Stefan