unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Ivan Zakharyaschev <imz@altlinux.org>
Subject: bug & idea how to fix: ispell.el fails checking English in a buffer with Russian+English
Date: Mon, 30 Sep 2002 11:04:33 +0400 (MSD)	[thread overview]
Message-ID: <Pine.LNX.4.44L.0209301036030.2679-100000@arrakis.zephyrous> (raw)

	Hello!

Recently, there has been a report on ispell.el not working perfectly in
GNU Emacs 21.2:

The report:

Consider a buffer with mixed languages (Russian and English). We try to
check spelling of both languages. First, we set the distionary to
Russian, and ispell works: checks Russian words and ignores English. But
the other way round it deosn't: quite soon ispell.el reports
"misalignment" and doesn't go on spelling.

The minimal test case is like this: in a clear plain text buffer, write

кошка cat

(The first word the Russian for "cat".) Then switch to "enghlish"
dictionary and M-x ispell-buffer. We get this message in the minibuffer:

Ispell misalignment: word 'LZ' point 3; probably incompatible versions

The reason & possible solution:

The first thing that can be noticed that changing the value of
`coding-system' field for the English entry in `ispell-dictionary-alist'
from `iso-8859-1' to `iso-safe' solves the problem. For example, after
the substitution:

("british"                           ; British version
    "[A-Za-z]" "[^A-Za-z]" "[']" nil ("-B" "-d" "british") nil
iso-safe)

instead of

("british"                           ; British version
    "[A-Za-z]" "[^A-Za-z]" "[']" nil ("-B" "-d" "british") nil
iso-8859-1)

the test case example is checked successfully with dictionary "british".


I see the reason for this problem in ispell.el, more precisely in
`ispell-get-line' function (or at the points where it is called:
particularly, `ispell-region'): it doesn't honour the information about
the alphabet of the language/dictionary (specified in the corrsponding
`ispell-dictionary-alist''s entry), but sends the whole line "as is"
without extracting only those characters that are valid for the
specified language/dictionary. And so the line gets sent to ispell
process with the internal multibyte representation of the Russian
characters. (Now we see why setting the `coding-system' to `iso-safe'
helps: then the Russian characters get encoded with harmless question
marks that get sent to the ispell process.)

So, the reason i sin the work of the pair of functions
`ispell-get-line'/`ispell-region'.

There is a similar pair of function is ispell.el:
`ispell-get-word'/`ispell-word'. It works much better: it extracts words
consisting only of the characters from the specified distionary's
alphabet.

Another similar el-library -- flyspell.el -- also works better:
`flyspell-word' extracts "good" words, and `flyspell-region' iterates
over the region with succeeding calls to `flyspell-word': so only "good"
words get sent to the ispell process.

This could be a variant of the fix for ispell.el: `ispell-region' calls
`ispell-word' or `ispell-get-word'.


BTW, it is almost a year ispell.el 3.5 is out, but GNU Emacs still has
3.4 inside.

References: if ypu can read Russian, here are the original messages
discussing this problem:

http://www.altlinux.ru/pipermail/sisyphus/2002-September/030603.html

http://www.altlinux.ru/pipermail/sisyphus/2002-September/030610.html

Regards,

-- 
Ivan Zakharyaschev
ALT Linux Team member, Sisyphus developer
http://www.altlinux.ru, http://www.altlinux.com

                 reply	other threads:[~2002-09-30  7:04 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.44L.0209301036030.2679-100000@arrakis.zephyrous \
    --to=imz@altlinux.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).