From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Ivan Zakharyaschev Newsgroups: gmane.emacs.devel Subject: bug & idea how to fix: ispell.el fails checking English in a buffer with Russian+English Date: Mon, 30 Sep 2002 11:04:33 +0400 (MSD) Sender: emacs-devel-admin@gnu.org Message-ID: NNTP-Posting-Host: localhost.gmane.org Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=koi8-r Content-Transfer-Encoding: quoted-printable X-Trace: main.gmane.org 1033369948 3920 127.0.0.1 (30 Sep 2002 07:12:28 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 30 Sep 2002 07:12:28 +0000 (UTC) Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 17vujB-00010w-00 for ; Mon, 30 Sep 2002 09:12:25 +0200 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 17vvSj-0006rQ-00 for ; Mon, 30 Sep 2002 09:59:30 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10) id 17vujL-0003J3-00; Mon, 30 Sep 2002 03:12:35 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10) id 17vuhN-0002md-00 for emacs-devel@gnu.org; Mon, 30 Sep 2002 03:10:33 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10) id 17vuhJ-0002kN-00 for emacs-devel@gnu.org; Mon, 30 Sep 2002 03:10:33 -0400 Original-Received: from [213.247.143.25] (helo=stat.lonet.ru) by monty-python.gnu.org with esmtp (Exim 4.10) id 17vuhI-0002hx-00 for emacs-devel@gnu.org; Mon, 30 Sep 2002 03:10:28 -0400 Original-Received: from arrakis.zephyrous (IDENT:root@[172.30.180.16]) by stat.lonet.ru (8.11.4/8.11.4) with ESMTP id g8U7AMf32473; Mon, 30 Sep 2002 11:10:22 +0400 Original-Received: from localhost (IDENT:ivan@arrakis.zephyrous [127.0.0.1]) by arrakis.zephyrous (8.12.3/8.8.7) with ESMTP id g8U74XMB004036; Mon, 30 Sep 2002 11:04:33 +0400 X-X-Sender: ivan@arrakis.zephyrous Original-To: ispell-el-bugs@itcorp.com, X-MIME-Autoconverted: from 8bit to quoted-printable by stat.lonet.ru id g8U7AMf32473 Errors-To: emacs-devel-admin@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.0.11 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.emacs.devel:8247 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:8247 Hello! Recently, there has been a report on ispell.el not working perfectly in GNU Emacs 21.2: The report: Consider a buffer with mixed languages (Russian and English). We try to check spelling of both languages. First, we set the distionary to Russian, and ispell works: checks Russian words and ignores English. But the other way round it deosn't: quite soon ispell.el reports "misalignment" and doesn't go on spelling. The minimal test case is like this: in a clear plain text buffer, write =CB=CF=DB=CB=C1 cat (The first word the Russian for "cat".) Then switch to "enghlish" dictionary and M-x ispell-buffer. We get this message in the minibuffer: Ispell misalignment: word 'LZ' point 3; probably incompatible versions The reason & possible solution: The first thing that can be noticed that changing the value of `coding-system' field for the English entry in `ispell-dictionary-alist' from `iso-8859-1' to `iso-safe' solves the problem. For example, after the substitution: ("british" ; British version "[A-Za-z]" "[^A-Za-z]" "[']" nil ("-B" "-d" "british") nil iso-safe) instead of ("british" ; British version "[A-Za-z]" "[^A-Za-z]" "[']" nil ("-B" "-d" "british") nil iso-8859-1) the test case example is checked successfully with dictionary "british". I see the reason for this problem in ispell.el, more precisely in `ispell-get-line' function (or at the points where it is called: particularly, `ispell-region'): it doesn't honour the information about the alphabet of the language/dictionary (specified in the corrsponding `ispell-dictionary-alist''s entry), but sends the whole line "as is" without extracting only those characters that are valid for the specified language/dictionary. And so the line gets sent to ispell process with the internal multibyte representation of the Russian characters. (Now we see why setting the `coding-system' to `iso-safe' helps: then the Russian characters get encoded with harmless question marks that get sent to the ispell process.) So, the reason i sin the work of the pair of functions `ispell-get-line'/`ispell-region'. There is a similar pair of function is ispell.el: `ispell-get-word'/`ispell-word'. It works much better: it extracts words consisting only of the characters from the specified distionary's alphabet. Another similar el-library -- flyspell.el -- also works better: `flyspell-word' extracts "good" words, and `flyspell-region' iterates over the region with succeeding calls to `flyspell-word': so only "good" words get sent to the ispell process. This could be a variant of the fix for ispell.el: `ispell-region' calls `ispell-word' or `ispell-get-word'. BTW, it is almost a year ispell.el 3.5 is out, but GNU Emacs still has 3.4 inside. References: if ypu can read Russian, here are the original messages discussing this problem: http://www.altlinux.ru/pipermail/sisyphus/2002-September/030603.html http://www.altlinux.ru/pipermail/sisyphus/2002-September/030610.html Regards, --=20 Ivan Zakharyaschev ALT Linux Team member, Sisyphus developer http://www.altlinux.ru, http://www.altlinux.com