From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: Bug 130397 Date: Thu, 13 Jan 2005 16:50:20 +0900 (JST) Message-ID: <200501130750.QAA13140@etlken.m17n.org> References: <28878.1105029010@ichips.intel.com> <01c4f6cf$Blat.v2.2.2$5c4e1220@zahav.net.il> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1105649180 11795 80.91.229.6 (13 Jan 2005 20:46:20 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Thu, 13 Jan 2005 20:46:20 +0000 (UTC) Cc: geoff@cs.hmc.edu, 130397@bugs.debian.org, agustin.martin@hispalinux.es, lionel@mamane.lu, emacs-devel@gnu.org, kstevens@ichips.intel.com, eliz@gnu.org, snogglethorpe@gmail.com, miles@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Jan 13 21:46:03 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1CpBLn-0006KX-00 for ; Thu, 13 Jan 2005 21:13:47 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1CpBXS-00055M-4A for ged-emacs-devel@m.gmane.org; Thu, 13 Jan 2005 15:25:50 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Cozzf-0008V4-6X for emacs-devel@gnu.org; Thu, 13 Jan 2005 03:06:11 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1CozzN-0008NU-MP for emacs-devel@gnu.org; Thu, 13 Jan 2005 03:06:01 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1CozzL-0008Jr-5x for emacs-devel@gnu.org; Thu, 13 Jan 2005 03:05:51 -0500 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 4.34) id 1CozlH-00087l-0j; Thu, 13 Jan 2005 02:51:19 -0500 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2]) by tsukuba.m17n.org (8.12.3/8.12.3/Debian-7.1) with ESMTP id j0D7oNBG022433; Thu, 13 Jan 2005 16:50:23 +0900 Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by fs.m17n.org (8.11.6p2/8.11.6) with ESMTP id j0D7oKt23059; Thu, 13 Jan 2005 16:50:20 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id QAA13140; Thu, 13 Jan 2005 16:50:20 +0900 (JST) Original-To: David Kastrup In-reply-to: (message from David Kastrup on Mon, 10 Jan 2005 10:09:41 +0100) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3.50 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) X-Mailman-Approved-At: Thu, 13 Jan 2005 15:25:40 -0500 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:32205 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:32205 In article , David Kastrup writes: >>> If ispell wants utf-8, it's easy enough to convert each input line to >>> utf-8 and deal with offsets into that in the event of a mispelling; >> >> Or account for byte offsets by (variable) multibyte lenght of each >> character, which Emacs knows. I don't remember for the moment whether >> the multibyte length of the UTF-8 encoding can be gotten at by a Lisp >> program, but if not, we could add some primitive to do that. > Just encode the line to utf-8, find the correct point in the byte > string, cut off the line there, convert back and check the length of > the string. This works unless you are in the middle of a character. > But it would be much saner if our conversion facilities would preserve > markers (which they don't do right now): encode to utf-8, place a > marker at the right byte offset, undo the conversion. You can encode a text to utf-8, place several makers, encode regions between markers one by one. --- Ken'ichi HANDA handa@m17n.org