From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Sergei <sergio.pokrovskij@gmail.com>
Newsgroups: gmane.emacs.help
Subject: Re: Spellcheck against multiple dictionaries?
Date: Thu, 19 Mar 2009 02:30:35 -0700 (PDT)
Organization: http://groups.google.com
Message-ID: <957727de-bc65-4e5a-867f-32215d0896f8@b38g2000prf.googlegroups.com>
References: <49C09110.9010105@gmx.at>
	<5f0660120903181236g3714f647ia568e3d02ae4fe56@mail.gmail.com> 
	<mailman.3552.1237450712.31690.help-gnu-emacs@gnu.org>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Trace: ger.gmane.org 1237455810 27187 80.91.229.12 (19 Mar 2009 09:43:30 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Thu, 19 Mar 2009 09:43:30 +0000 (UTC)
To: help-gnu-emacs@gnu.org
Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Thu Mar 19 10:44:47 2009
Return-path: <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geh-help-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1LkEno-0002nd-RM
	for geh-help-gnu-emacs@m.gmane.org; Thu, 19 Mar 2009 10:44:41 +0100
Original-Received: from localhost ([127.0.0.1]:35509 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1LkEmS-0000DJ-D8
	for geh-help-gnu-emacs@m.gmane.org; Thu, 19 Mar 2009 05:43:16 -0400
Original-Path: news.stanford.edu!newsfeed.stanford.edu!postnews.google.com!b38g2000prf.googlegroups.com!not-for-mail
Original-Newsgroups: gnu.emacs.help
Original-Lines: 99
Original-NNTP-Posting-Host: 195.161.50.69
Original-X-Trace: posting.google.com 1237455036 3504 127.0.0.1 (19 Mar 2009 09:30:36
	GMT)
Original-X-Complaints-To: groups-abuse@google.com
Original-NNTP-Posting-Date: Thu, 19 Mar 2009 09:30:36 +0000 (UTC)
Complaints-To: groups-abuse@google.com
Injection-Info: b38g2000prf.googlegroups.com; posting-host=195.161.50.69; 
	posting-account=exrZLAoAAABFy4TCoZNdKd2oG1nld6Pb
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) 
	Gecko/2009021910 Firefox/3.0.7,gzip(gfe),gzip(gfe)
X-HTTP-Via: 1.1 msfwpr02.ims.intel.com:911 (squid/2.6.STABLE18)
Original-Xref: news.stanford.edu gnu.emacs.help:167788
X-BeenThere: help-gnu-emacs@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Users list for the GNU Emacs text editor <help-gnu-emacs.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/help-gnu-emacs>
List-Post: <mailto:help-gnu-emacs@gnu.org>
List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=subscribe>
Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.help:63080
Archived-At: <http://permalink.gmane.org/gmane.emacs.help/63080>

----  martin:

>> I've downloaded speck.el file, but I'm not sure how do I use it.

>> I've created a test file containing mixed correct and incorrect
>> words, in Russian and English:

>> Test =D1=82=D0=B5=D1=81=D1=82 correct =D0=BE=D1=87=D0=B5=D0=BF=D1=8F=D1=
=82=D0=BA=D0=B0 incorect =D0=B2=D0=B5=D1=80=D0=BD=D0=BE

>> Then I've done M-x speck-mode. Emacs said that Speck-mode has been
>> activated and is using ru_RU dictionary, but nothing has changed in
>> the test buffer. From your description I was expecting that the
>> incorrect words would be highlighted somehow. Am I missing
>> something?

I do not know about speck-mode, but at least ispell.el would pick up
only what looks like a word in the currently enabled language; only
such words are recoded according to the current ispell dictionary
requirements and passed to the ispell process.

This means that "Test" is skipped in the Russian mode (just like
=3D%=3D=3D!!.... etc); and conversely, =D0=BE=D1=87=D0=B5=D0=BF=D1=8F=D1=82=
=D0=BA=D0=B0 and =D0=B2=D0=B5=D1=80=D0=BD=D0=BE are skipped in a
Latin-alphabet context.  And this is really convenient.  (While the
users of Latin-alphabet languages should stumble at any foreign word.)

> I don't have a Russian spell-checking engine installed so I can't
> comment your example directly.  Suppose I have a file with the line

> Test Test correct Duckfehler incorect richtig

> Doing M-x speck-mode here starts an Aspell process checking with my
> default language which is English, flagging the last three words as
> incorrect.  I can now set the region around the word "Duckfehler"
> and type C-2 C-? to set the speck language text property of that
> word to German, which will still flag the word as incorrect but now
> with the appropriate German suggestions how to correct it.

There are some formal text (like html or xml) which allow for a
language markup.  Something like
,----
| correct <i lang=3D"de">Duckfehler</i> incorect  <i lang=3D"de">richtig</
i>
`----

>> I think that the ispell-ish behavior would indeed be nice. I've
>> looked through the ispell code, and it looks like Emacs raises some
>> kind of exception if the ispell process returns "invalid"
>> status. Do you think it is possible to fallback to another
>> dictionary on such an event?

> With my Aspell engine I can write (and bind) a trivial command like

> (defun ispell-check-word (arg)
>    (interactive "p")
>    (if (=3D arg 2)
>        (ispell-change-dictionary "de_DE")
>      (ispell-change-dictionary "en_US"))
>    (ispell-word))

> here and probably get what you want.  Note, however, that each time you
> change the language with this command, Emacs kills an old and spawns a
> new process of the Aspell engine.

Yes, because everything has to be changed: the filtering rules, the
affix grammar, the word provision.

> Changing `ispell-word' as you say seems hardly possible because in
> general there's no way to distinguish a word written incorrectly in
> language A from a word written correctly in language B.  For the
> special English/Russian case you could probably investigate the
> character properties at `point' and spark the appropriate
> word-checking process.

In principle one could create a combined grammar for Russian and
English; actually it would be a "direct sum" of the two grammars,
as the word spaces are completely disjoint because the alphabets are
disjoint.  Such a combined processor exists in TeX for a combined
English-Russian hyphenation.  It would be more efficient too, because
there would be no need to spawn a new process at every change from
Russian to English.

But presently it would be easier to use a two-pass approach:

1. check the Russian spelling (ignoring all Latin characters);
2. check the English spelling (ignoring all Cyrillic characters)

Both passes are faster then in a switching mode -- and no extra work
is required.  Besides, you could spellcheck the Russian+French or
Russian+German combinations (but not Russian+French+English, of
course; while Russian+German+Armenian is still possible).

--
Sergei