all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Sergei <sergio.pokrovskij@gmail.com>
To: help-gnu-emacs@gnu.org
Subject: Re: Spellcheck against multiple dictionaries?
Date: Thu, 19 Mar 2009 02:30:35 -0700 (PDT)	[thread overview]
Message-ID: <957727de-bc65-4e5a-867f-32215d0896f8@b38g2000prf.googlegroups.com> (raw)
In-Reply-To: mailman.3552.1237450712.31690.help-gnu-emacs@gnu.org

----  martin:

>> I've downloaded speck.el file, but I'm not sure how do I use it.

>> I've created a test file containing mixed correct and incorrect
>> words, in Russian and English:

>> Test тест correct очепятка incorect верно

>> Then I've done M-x speck-mode. Emacs said that Speck-mode has been
>> activated and is using ru_RU dictionary, but nothing has changed in
>> the test buffer. From your description I was expecting that the
>> incorrect words would be highlighted somehow. Am I missing
>> something?

I do not know about speck-mode, but at least ispell.el would pick up
only what looks like a word in the currently enabled language; only
such words are recoded according to the current ispell dictionary
requirements and passed to the ispell process.

This means that "Test" is skipped in the Russian mode (just like
=%==!!.... etc); and conversely, очепятка and верно are skipped in a
Latin-alphabet context.  And this is really convenient.  (While the
users of Latin-alphabet languages should stumble at any foreign word.)

> I don't have a Russian spell-checking engine installed so I can't
> comment your example directly.  Suppose I have a file with the line

> Test Test correct Duckfehler incorect richtig

> Doing M-x speck-mode here starts an Aspell process checking with my
> default language which is English, flagging the last three words as
> incorrect.  I can now set the region around the word "Duckfehler"
> and type C-2 C-? to set the speck language text property of that
> word to German, which will still flag the word as incorrect but now
> with the appropriate German suggestions how to correct it.

There are some formal text (like html or xml) which allow for a
language markup.  Something like
,----
| correct <i lang="de">Duckfehler</i> incorect  <i lang="de">richtig</
i>
`----

>> I think that the ispell-ish behavior would indeed be nice. I've
>> looked through the ispell code, and it looks like Emacs raises some
>> kind of exception if the ispell process returns "invalid"
>> status. Do you think it is possible to fallback to another
>> dictionary on such an event?

> With my Aspell engine I can write (and bind) a trivial command like

> (defun ispell-check-word (arg)
>    (interactive "p")
>    (if (= arg 2)
>        (ispell-change-dictionary "de_DE")
>      (ispell-change-dictionary "en_US"))
>    (ispell-word))

> here and probably get what you want.  Note, however, that each time you
> change the language with this command, Emacs kills an old and spawns a
> new process of the Aspell engine.

Yes, because everything has to be changed: the filtering rules, the
affix grammar, the word provision.

> Changing `ispell-word' as you say seems hardly possible because in
> general there's no way to distinguish a word written incorrectly in
> language A from a word written correctly in language B.  For the
> special English/Russian case you could probably investigate the
> character properties at `point' and spark the appropriate
> word-checking process.

In principle one could create a combined grammar for Russian and
English; actually it would be a "direct sum" of the two grammars,
as the word spaces are completely disjoint because the alphabets are
disjoint.  Such a combined processor exists in TeX for a combined
English-Russian hyphenation.  It would be more efficient too, because
there would be no need to spawn a new process at every change from
Russian to English.

But presently it would be easier to use a two-pass approach:

1. check the Russian spelling (ignoring all Latin characters);
2. check the English spelling (ignoring all Cyrillic characters)

Both passes are faster then in a switching mode -- and no extra work
is required.  Besides, you could spellcheck the Russian+French or
Russian+German combinations (but not Russian+French+English, of
course; while Russian+German+Armenian is still possible).

--
Sergei






  parent reply	other threads:[~2009-03-19  9:30 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <49C09110.9010105@gmx.at>
     [not found] ` <5f0660120903181236g3714f647ia568e3d02ae4fe56@mail.gmail.com>
2009-03-19  8:17   ` Spellcheck against multiple dictionaries? martin rudalics
     [not found]   ` <mailman.3552.1237450712.31690.help-gnu-emacs@gnu.org>
2009-03-19  9:30     ` Sergei [this message]
2009-03-19 14:29       ` Miles Bader
2009-03-19 15:34         ` Lennart Borgman
     [not found]         ` <mailman.3577.1237476870.31690.help-gnu-emacs@gnu.org>
2009-03-20 19:54           ` cmr.Pent
2009-03-14 16:04 cmr.Pent
2009-03-18 19:36 ` cmr.Pent

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=957727de-bc65-4e5a-867f-32215d0896f8@b38g2000prf.googlegroups.com \
    --to=sergio.pokrovskij@gmail.com \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.