From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Ispell and unibyte characters Date: Thu, 12 Apr 2012 22:01:30 +0300 Message-ID: <83d37c4vw5.fsf@gnu.org> References: <83aa3f2hgh.fsf@gnu.org> <20120326173912.GA22306@agmartin.aq.upm.es> <20120328191821.GA6266@agmartin.aq.upm.es> <20120410190803.GA13517@agmartin.aq.upm.es> <83ty0r5rmd.fsf@gnu.org> <20120412143657.GA18352@agmartin.aq.upm.es> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: dough.gmane.org 1334257421 24562 80.91.229.3 (12 Apr 2012 19:03:41 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Thu, 12 Apr 2012 19:03:41 +0000 (UTC) Cc: emacs-devel@gnu.org To: Agustin Martin Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Apr 12 21:03:40 2012 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1SIPIx-0001s4-QC for ged-emacs-devel@m.gmane.org; Thu, 12 Apr 2012 21:03:39 +0200 Original-Received: from localhost ([::1]:59095 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SIPIx-0003Cf-2F for ged-emacs-devel@m.gmane.org; Thu, 12 Apr 2012 15:03:39 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:35464) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SIPIu-0003CM-AW for emacs-devel@gnu.org; Thu, 12 Apr 2012 15:03:37 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SIPIs-00065J-Ed for emacs-devel@gnu.org; Thu, 12 Apr 2012 15:03:35 -0400 Original-Received: from mtaout21.012.net.il ([80.179.55.169]:55650) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SIPIs-00065C-6M for emacs-devel@gnu.org; Thu, 12 Apr 2012 15:03:34 -0400 Original-Received: from conversion-daemon.a-mtaout21.012.net.il by a-mtaout21.012.net.il (HyperSendmail v2007.08) id <0M2D00D00RKELJ00@a-mtaout21.012.net.il> for emacs-devel@gnu.org; Thu, 12 Apr 2012 22:03:32 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([84.229.57.204]) by a-mtaout21.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0M2D00DQTRLVK3D0@a-mtaout21.012.net.il>; Thu, 12 Apr 2012 22:03:32 +0300 (IDT) In-reply-to: <20120412143657.GA18352@agmartin.aq.upm.es> X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 (beta) X-Received-From: 80.179.55.169 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:149617 Archived-At: > Date: Thu, 12 Apr 2012 16:36:57 +0200 > From: Agustin Martin > > I am still dealing with an open issue here. Some languages have non 7bit > wordchars, like Catalan middledot, and it should be converted to UTF-8 if > default communication language is changed to UTF-8. Sorry, I don't understand: do you mean "non 8-bit wordchars"? I don't think 7 bits is assumed anywhere. Assuming you did mean 8-bit, then why not use UTF-8 for Catalan from the get-go? Only some languages can use single-byte encodings, and evidently Catalan is not one of them. For that matter, why shouldn't aspell and hunspell use UTF-8 by default (something I already asked)? > I have looked at the encoding stuff and I am currently trying something > like > > (if ispell-encoding8-command > ;; Convert non 7bit otherchars to utf-8 if needed > (encode-coding-string > (decode-coding-string (nth 3 adict) (nth 7 adict)) > 'utf-8) > (nth 3 adict)) ; otherchars > > to get new UTF-8 string where > > (nth 7 adict) -> dict-coding-system > (nth 3 adict) -> Original otherchars > > but get a sgml-lexical-context error. Need to look more carefuly, so this > will take longer. I am far from expert in handling encodings, so comments > are welcome. I don't understand what are you trying to accomplish by encoding OTHERCHARS in UTF-8. What exactly is the problem with them being encoded in some 8-bit encoding? Please explain.