unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Changing dictionary while flyspell-buffer is running
@ 2019-02-19 22:58 Titus von der Malsburg
  2019-02-20 17:05 ` Eli Zaretskii
  2019-02-21  3:26 ` Richard Stallman
  0 siblings, 2 replies; 20+ messages in thread
From: Titus von der Malsburg @ 2019-02-19 22:58 UTC (permalink / raw)
  To: emacs-devel


Hi,

I wrote a package called guess-language that automatically detects the
language of what is being typed and then switches dictionaries for
spell-checking as needed.  The code relies on
fylspell-incorrect-hook.  Whenever this hook is triggered (i.e. when an
unknown word is detected) the language of the current paragraph is
checked and if it’s different from the currently configured language, the
dictionary is changed and the paragraph is rechecked.  This works really
nicely, especially with documents that contain paragraphs in multiple
languages.

Link to package: https://github.com/tmalsburg/guess-language.el/

My question relates to flyspell-buffer.  When flyspell-buffer detects an
incorrect word, language guessing is activated via
fylspell-incorrect-hook but it’s not clear to me how to proceed next.  I
think I’d have to kill the running ispell/aspell/hunspell process and
restart with the new dictionary from where we left off.  Is there any
infrastructure in Flyspell that makes this relatively easy and safe?  I
had a look at the code and it seems that there is quite a bit of state
in various places, so it’s probably not just a matter of killing a
process, and doing things cleanly might require better understanding of
the internals than I have.  Any advice one how to approach this would be
appreciated.

Best wishes,

  Titus






--
Dr. Titus von der Malsburg
Department of Linguistics
University of Potsdam, Germany
https://tmalsburg.github.io



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Changing dictionary while flyspell-buffer is running
  2019-02-19 22:58 Changing dictionary while flyspell-buffer is running Titus von der Malsburg
@ 2019-02-20 17:05 ` Eli Zaretskii
       [not found]   ` <87y36as44p.fsf@posteo.de>
  2019-02-21  3:26 ` Richard Stallman
  1 sibling, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2019-02-20 17:05 UTC (permalink / raw)
  To: Titus von der Malsburg; +Cc: emacs-devel

> From: Titus von der Malsburg <malsburg@posteo.de>
> Date: Tue, 19 Feb 2019 23:58:19 +0100
> 
> My question relates to flyspell-buffer.  When flyspell-buffer detects an
> incorrect word, language guessing is activated via
> fylspell-incorrect-hook but it’s not clear to me how to proceed next.  I
> think I’d have to kill the running ispell/aspell/hunspell process and
> restart with the new dictionary from where we left off.  Is there any
> infrastructure in Flyspell that makes this relatively easy and safe?  I
> had a look at the code and it seems that there is quite a bit of state
> in various places, so it’s probably not just a matter of killing a
> process, and doing things cleanly might require better understanding of
> the internals than I have.  Any advice one how to approach this would be
> appreciated.

There is already a command ispell-change-dictionary which does
everything that needs to be done when a dictionary is changed.  I
think you just need to invoke it.  If doing that during
flyspell-buffer somehow causes trouble, then do it after exiting
flyspell-buffer, then reinvoke flyspell-buffer or something.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Changing dictionary while flyspell-buffer is running
  2019-02-19 22:58 Changing dictionary while flyspell-buffer is running Titus von der Malsburg
  2019-02-20 17:05 ` Eli Zaretskii
@ 2019-02-21  3:26 ` Richard Stallman
  2019-02-21  3:46   ` Eli Zaretskii
  2019-02-21  8:29   ` Titus von der Malsburg
  1 sibling, 2 replies; 20+ messages in thread
From: Richard Stallman @ 2019-02-21  3:26 UTC (permalink / raw)
  To: Titus von der Malsburg; +Cc: emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > I wrote a package called guess-language that automatically detects the
  > language of what is being typed and then switches dictionaries for
  > spell-checking as needed.  The code relies on
  > fylspell-incorrect-hook.

I don't use flyspell, but I would appreciate this feature of
automatically selecting the proper dictionary.  Could you arrange
some other way to automatically run the program?
Perhaps a variable that would tell the user-level ispell commands
to call your package at the suitable times?

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Changing dictionary while flyspell-buffer is running
  2019-02-21  3:26 ` Richard Stallman
@ 2019-02-21  3:46   ` Eli Zaretskii
  2019-02-21  8:34     ` Titus von der Malsburg
  2019-02-21  8:29   ` Titus von der Malsburg
  1 sibling, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2019-02-21  3:46 UTC (permalink / raw)
  To: rms; +Cc: malsburg, emacs-devel

> From: Richard Stallman <rms@gnu.org>
> Date: Wed, 20 Feb 2019 22:26:32 -0500
> Cc: emacs-devel@gnu.org
> 
> I don't use flyspell, but I would appreciate this feature of
> automatically selecting the proper dictionary.

Mayb I suggest to install Hunspell, then?  It supports loading several
dictionaries at once, and will eliminate the need to switch a
dictionary while spell-checking a multi-lingual document.  Emacs
already supports that feature of Hunspell.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Changing dictionary while flyspell-buffer is running
  2019-02-21  3:26 ` Richard Stallman
  2019-02-21  3:46   ` Eli Zaretskii
@ 2019-02-21  8:29   ` Titus von der Malsburg
  2019-02-21 13:12     ` Clément Pit-Claudel
  2019-02-22  2:06     ` Richard Stallman
  1 sibling, 2 replies; 20+ messages in thread
From: Titus von der Malsburg @ 2019-02-21  8:29 UTC (permalink / raw)
  To: rms; +Cc: emacs-devel


On 2019-02-21 Thu 04:26, Richard Stallman wrote:
> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
> [[[ whether defending the US Constitution against all enemies,     ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
>   > I wrote a package called guess-language that automatically detects the
>   > language of what is being typed and then switches dictionaries for
>   > spell-checking as needed.  The code relies on
>   > fylspell-incorrect-hook.
>
> I don't use flyspell, but I would appreciate this feature of
> automatically selecting the proper dictionary.  Could you arrange
> some other way to automatically run the program?
> Perhaps a variable that would tell the user-level ispell commands
> to call your package at the suitable times?

Sure, that would work, but then we’d have to make changes in
ispell.el.  Alternatively, my code could defadvice ispell functions,
such that we guess the buffer language and set dictionaries before
spell-checking.

By the way, if there is interest, I’d be happy to contribute my package
to Emacs/Elpa.  (I did the FSF copyright paperwork some years ago.)

  Titus

>
> -- 
> Dr Richard Stallman
> President, Free Software Foundation (https://gnu.org, https://fsf.org)
> Internet Hall-of-Famer (https://internethalloffame.org)


-- 
Dr. Titus von der Malsburg
Department of Linguistics
University of Potsdam, Germany
https://tmalsburg.github.io



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Changing dictionary while flyspell-buffer is running
  2019-02-21  3:46   ` Eli Zaretskii
@ 2019-02-21  8:34     ` Titus von der Malsburg
  2019-02-21 14:53       ` Eli Zaretskii
  0 siblings, 1 reply; 20+ messages in thread
From: Titus von der Malsburg @ 2019-02-21  8:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: rms, emacs-devel


On 2019-02-21 Thu 04:46, Eli Zaretskii wrote:
>> From: Richard Stallman <rms@gnu.org>
>> Date: Wed, 20 Feb 2019 22:26:32 -0500
>> Cc: emacs-devel@gnu.org
>> 
>> I don't use flyspell, but I would appreciate this feature of
>> automatically selecting the proper dictionary.
>
> Mayb I suggest to install Hunspell, then?  It supports loading several
> dictionaries at once, and will eliminate the need to switch a
> dictionary while spell-checking a multi-lingual document.  Emacs
> already supports that feature of Hunspell.

Hunspell is a fantastic spell-checker.  But this approach has two
downsides: 1. Hunspell is slow compared to ispell/aspell, which can be a
problem with larger documents.  2. If you set multiple dictionaries, you
will get false negatives since a typo in one language might be a word
in another language.  Depending on the set of languages used, this could
be a real problem.

  Titus


-- 
Dr. Titus von der Malsburg
Department of Linguistics
University of Potsdam, Germany
https://tmalsburg.github.io



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Changing dictionary while flyspell-buffer is running
  2019-02-21  8:29   ` Titus von der Malsburg
@ 2019-02-21 13:12     ` Clément Pit-Claudel
  2019-02-22  2:06     ` Richard Stallman
  1 sibling, 0 replies; 20+ messages in thread
From: Clément Pit-Claudel @ 2019-02-21 13:12 UTC (permalink / raw)
  To: emacs-devel

On 21/02/2019 03.29, Titus von der Malsburg wrote:
> By the way, if there is interest, I’d be happy to contribute my package
> to Emacs/Elpa.  (I did the FSF copyright paperwork some years ago.)

I think this package is great work, and I'd love to see it in ELPA.  Integrating it closely with ispell in addition to flyspell would be ideal.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Changing dictionary while flyspell-buffer is running
  2019-02-21  8:34     ` Titus von der Malsburg
@ 2019-02-21 14:53       ` Eli Zaretskii
  2019-02-21 19:42         ` Joost Kremers
  0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2019-02-21 14:53 UTC (permalink / raw)
  To: Titus von der Malsburg; +Cc: rms, emacs-devel

> From: Titus von der Malsburg <malsburg@posteo.de>
> Cc: rms@gnu.org, emacs-devel@gnu.org
> Date: Thu, 21 Feb 2019 09:34:09 +0100
> 
> > Mayb I suggest to install Hunspell, then?  It supports loading several
> > dictionaries at once, and will eliminate the need to switch a
> > dictionary while spell-checking a multi-lingual document.  Emacs
> > already supports that feature of Hunspell.
> 
> Hunspell is a fantastic spell-checker.  But this approach has two
> downsides: 1. Hunspell is slow compared to ispell/aspell, which can be a
> problem with larger documents.

Hunspell is indeed about twice slower than Aspell, but:

  . both are very fast, so a factor of 2 doesn't matter in practice
  . speed only matters if you spell-check a large document with no
    misspellings at all -- as soon as there's a single misspelled
    word, marking it and selecting the correction will render any
    speed differences irrelevant.  And my typical use cases,
    spell-checking technical text, usually show quite a few "typos",
    words that are from jargon or abbreviations, which no speller will
    know about.

> 2. If you set multiple dictionaries, you
> will get false negatives since a typo in one language might be a word
> in another language.  Depending on the set of languages used, this could
> be a real problem.

It could be.  IME, it never is, as of the 3 languages I write fluently
and frequently each uses a different script, so no false negative is
ever possible.  Even when using several languages that use the same
script, say, Latin, the accented letters usually prevent false
negatives.

Of course, you also get false negatives when writing in a single
language, because some typos are actually a valid word.  The larger
the dictionary used by the speller, the higher your chances of getting
such false negatives.

Finally, guessing a language is also not 100% correct, especially when
short phrases from some language are inserted into text written in
another language, something that happens a lot in email
correspondence, for example.

The advantage of using several dictionaries simultaneously is that no
guessing is involved.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Changing dictionary while flyspell-buffer is running
       [not found]       ` <87h8cxsed3.fsf@posteo.de>
@ 2019-02-21 14:59         ` Eli Zaretskii
  0 siblings, 0 replies; 20+ messages in thread
From: Eli Zaretskii @ 2019-02-21 14:59 UTC (permalink / raw)
  To: Titus von der Malsburg; +Cc: emacs-devel

> From: Titus von der Malsburg <malsburg@posteo.de>
> Date: Thu, 21 Feb 2019 10:03:04 +0100
> 
> [Note: The last to message in this branch were off list.  My fault.]
> 
> On 2019-02-21 Thu 04:36, Eli Zaretskii wrote:
> > [Why personal email?]
> >
> >> From: Titus von der Malsburg <malsburg@posteo.de>
> >> Date: Wed, 20 Feb 2019 21:14:14 +0100
> >> 
> >> > Then why do you use flyspell-buffer and not flyspell-region, each time
> >> > starting the region from the word where you decided to switch to
> >> > another language?
> >> 
> >> That’s actually what happens when people just type: my package detects a
> >> new language and then just rechecks that paragraph.  However, some users
> >> of my package are used to doing flyspell-buffer on complete files, and
> >> when they do that, they don’t get the result they expect (which is that
> >> each paragraph is checked in its own language).  Checking a whole
> >> document with multiple languages does sound like a reasonable use case
> >> to me.
> >
> > Sorry, I still don't understand.  When the user runs flyspell-buffer,
> > and you find that the language was changed, invoke
> > ispell-change-dictionary and after that invoke flyspell-region to
> > continue spell-checking from that place to the end of the document.
> > Repeat as needed.  Wouldn't this algorithm work for your use case?
> 
> This algorithm does do the job, but when the language changes a lot in a
> document it would be inefficient.  Lets say you have 10 paragraphs, each
> in a different language, then you’d check the last paragraphs 10 times
> and only in the last pass with the correct language.  Guess-language was
> written primarily to facilitate work with such multilingual documents.
> (I’m a linguist.)  So that’s not a satisfying solution.

You don't have to end the region at the end of the document, you can
end it when your language guess changes.

> I can come up with a more efficient algorithm, no problem.  It’s just
> that it would be the easiest and most efficient solution, if I could
> just abort and restart spell-checking when a change in language is
> detected.  If flyspell doesn’t support this (aborting), tough luck.

Doesn't calling ispell-change-dictionary "abort" spell-checking
anyway?  If it doesn't, can you show a recipe to see this in action?

(I never use flyspell-buffer, so changing the dictionary is trivial,
and restarts the speller as side effect.)



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Changing dictionary while flyspell-buffer is running
  2019-02-21 14:53       ` Eli Zaretskii
@ 2019-02-21 19:42         ` Joost Kremers
  2019-02-21 20:09           ` Eli Zaretskii
  2019-02-23 15:24           ` Stefan Monnier
  0 siblings, 2 replies; 20+ messages in thread
From: Joost Kremers @ 2019-02-21 19:42 UTC (permalink / raw)
  To: emacs-devel; +Cc: Titus von der Malsburg, rms


On Thu, Feb 21 2019, Eli Zaretskii wrote:
>> From: Titus von der Malsburg <malsburg@posteo.de>
>> 2. If you set multiple dictionaries, you
>> will get false negatives since a typo in one language might be 
>> a word
>> in another language.  Depending on the set of languages used, 
>> this could
>> be a real problem.
>
> It could be.  IME, it never is, as of the 3 languages I write 
> fluently
> and frequently each uses a different script, so no false 
> negative is
> ever possible.  Even when using several languages that use the 
> same
> script, say, Latin, the accented letters usually prevent false
> negatives.

That really depends on the languages involved. Dutch and English, 
for example, use very few accents, and even German, which has ä ö 
ü and ß, doesn't use these letters that often. I would suspect 
that most Western-European languages use so few accents that 
confusions of the type Titus refers to are quite possible.

> Finally, guessing a language is also not 100% correct, 
> especially when
> short phrases from some language are inserted into text written 
> in
> another language, something that happens a lot in email
> correspondence, for example.

The method Titus' package employs is quite reliable, IME. (I'm a 
user of his package). Sometimes, when the relevant text is very 
short (a single line), German is recognized as Dutch, but usually, 
it works extremely well. (I have it configured for Dutch, German 
and English, which, due to these languages being closely related, 
are probably not easy to tell apart.)



-- 
Joost Kremers
Life has its moments



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Changing dictionary while flyspell-buffer is running
  2019-02-21 19:42         ` Joost Kremers
@ 2019-02-21 20:09           ` Eli Zaretskii
  2019-02-21 21:19             ` Titus von der Malsburg
  2019-02-23 15:24           ` Stefan Monnier
  1 sibling, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2019-02-21 20:09 UTC (permalink / raw)
  To: Joost Kremers; +Cc: malsburg, rms, emacs-devel

> From: Joost Kremers <joostkremers@fastmail.fm>
> Date: Thu, 21 Feb 2019 20:42:21 +0100
> Cc: Titus von der Malsburg <malsburg@posteo.de>, rms@gnu.org
> 
> > Finally, guessing a language is also not 100% correct, especially
> > when short phrases from some language are inserted into text
> > written in another language, something that happens a lot in email
> > correspondence, for example.
> 
> The method Titus' package employs is quite reliable, IME.

It needs at least 30 letters to guess right, which is quite a few.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Changing dictionary while flyspell-buffer is running
  2019-02-21 20:09           ` Eli Zaretskii
@ 2019-02-21 21:19             ` Titus von der Malsburg
  2019-02-22  7:10               ` Eli Zaretskii
  0 siblings, 1 reply; 20+ messages in thread
From: Titus von der Malsburg @ 2019-02-21 21:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Joost Kremers, rms, emacs-devel


On 2019-02-21 Thu 21:09, Eli Zaretskii wrote:
>> From: Joost Kremers <joostkremers@fastmail.fm>
>> Date: Thu, 21 Feb 2019 20:42:21 +0100
>> Cc: Titus von der Malsburg <malsburg@posteo.de>, rms@gnu.org
>>
>> > Finally, guessing a language is also not 100% correct, especially
>> > when short phrases from some language are inserted into text
>> > written in another language, something that happens a lot in email
>> > correspondence, for example.
>>
>> The method Titus' package employs is quite reliable, IME.
>
> It needs at least 30 letters to guess right, which is quite a few.

The number of letters depends on the configured languages, it could be
less than 30 when the scripts are different but for English, Dutch,
and German 30 works well in my experience and languages don’t get much
more similar than that (except if you want to distinguish between US
English and UK English).

Regarding Hunspell: I agree with Eli that we should consider alternative
solutions such as using Hunspell and multiple dictionaries.  If that
does the job, great!  However, I just tried it and noticed one downside:
Flyspell offers possible corrections for unknown words and when multiple
languages are configured, these suggestions come from all configured
dictionaries.  Many of them are of course not relevant because they are
not in the language of the paragraph.  Flyspell also has an
autocorrection feature (which I’m not using) and this feature would also
largely stop being useful with multiple dictionaries.  I think that this
makes the Hunspell solution less appealing.

  Titus



--
Dr. Titus von der Malsburg
Department of Linguistics
University of Potsdam, Germany
https://tmalsburg.github.io



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Changing dictionary while flyspell-buffer is running
  2019-02-21  8:29   ` Titus von der Malsburg
  2019-02-21 13:12     ` Clément Pit-Claudel
@ 2019-02-22  2:06     ` Richard Stallman
  2019-02-22  9:27       ` Titus von der Malsburg
  2019-02-28 12:36       ` Titus von der Malsburg
  1 sibling, 2 replies; 20+ messages in thread
From: Richard Stallman @ 2019-02-22  2:06 UTC (permalink / raw)
  To: Titus von der Malsburg; +Cc: emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > I don't use flyspell, but I would appreciate this feature of
  > > automatically selecting the proper dictionary.  Could you arrange
  > > some other way to automatically run the program?
  > > Perhaps a variable that would tell the user-level ispell commands
  > > to call your package at the suitable times?

  > Sure, that would work, but then we’d have to make changes in
  > ispell.el.

Would adding hooks in ispell.el be enough?

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Changing dictionary while flyspell-buffer is running
  2019-02-21 21:19             ` Titus von der Malsburg
@ 2019-02-22  7:10               ` Eli Zaretskii
  2019-02-22  9:57                 ` Titus von der Malsburg
  0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2019-02-22  7:10 UTC (permalink / raw)
  To: Titus von der Malsburg; +Cc: joostkremers, rms, emacs-devel

> From: Titus von der Malsburg <malsburg@posteo.de>
> Cc: Joost Kremers <joostkremers@fastmail.fm>, emacs-devel@gnu.org, rms@gnu.org
> Date: Thu, 21 Feb 2019 22:19:53 +0100
> 
> > It needs at least 30 letters to guess right, which is quite a few.
> 
> The number of letters depends on the configured languages, it could be
> less than 30 when the scripts are different but for English, Dutch,
> and German 30 works well in my experience and languages don’t get much
> more similar than that (except if you want to distinguish between US
> English and UK English).

The minimum number also depends on the expected reliability of
language detection, of course.

> I just tried it and noticed one downside: Flyspell offers possible
> corrections for unknown words and when multiple languages are
> configured, these suggestions come from all configured dictionaries.

Of course, but what would you expect?  And how is that a downside?
Hunspell doesn't try to guess the language at all, it just looks in
all loaded dictionaries one by one.

> Many of them are of course not relevant because they are not in the
> language of the paragraph.

There's no "language of the paragraph" in this method, you can freely
mix words from different languages in the same paragraph.  There are
important use cases for that, like editing a message translation
catalog or text that that explains in-line the meaning of words in
another language.

> Flyspell also has an autocorrection feature (which I’m not using)
> and this feature would also largely stop being useful with multiple
> dictionaries.

It will only become less useful if the first correction is off in a
significant number of cases.  Which is not at all expected, certainly
not when each language uses a different script.

> I think that this makes the Hunspell solution less appealing.

I think you are slightly biased ;-).  As am I, most probably.  Both
solutions have their advantages and disadvantages, and the user should
choose which one better suits his/her needs in each case.

I mentioned Hunspell because I think few people even know about this
feature, which is quite unique among spellers supported by Emacs.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Changing dictionary while flyspell-buffer is running
  2019-02-22  2:06     ` Richard Stallman
@ 2019-02-22  9:27       ` Titus von der Malsburg
  2019-02-28 12:36       ` Titus von der Malsburg
  1 sibling, 0 replies; 20+ messages in thread
From: Titus von der Malsburg @ 2019-02-22  9:27 UTC (permalink / raw)
  To: rms; +Cc: emacs-devel


On 2019-02-22 Fri 03:06, Richard Stallman wrote:
> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
> [[[ whether defending the US Constitution against all enemies,     ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
>   > > I don't use flyspell, but I would appreciate this feature of
>   > > automatically selecting the proper dictionary.  Could you arrange
>   > > some other way to automatically run the program?
>   > > Perhaps a variable that would tell the user-level ispell commands
>   > > to call your package at the suitable times?
>
>   > Sure, that would work, but then we’d have to make changes in
>   > ispell.el.
>
> Would adding hooks in ispell.el be enough?

Likely yes.  I will investigate this and report back.  May take a couple
of days, though.

>
> --
> Dr Richard Stallman
> President, Free Software Foundation (https://gnu.org, https://fsf.org)
> Internet Hall-of-Famer (https://internethalloffame.org)


--
Dr. Titus von der Malsburg
Department of Linguistics
University of Potsdam, Germany
https://tmalsburg.github.io



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Changing dictionary while flyspell-buffer is running
  2019-02-22  7:10               ` Eli Zaretskii
@ 2019-02-22  9:57                 ` Titus von der Malsburg
  2019-02-22 10:32                   ` Eli Zaretskii
  0 siblings, 1 reply; 20+ messages in thread
From: Titus von der Malsburg @ 2019-02-22  9:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: joostkremers, rms, emacs-devel


On 2019-02-22 Fri 08:10, Eli Zaretskii wrote:
>> From: Titus von der Malsburg <malsburg@posteo.de>
>> Cc: Joost Kremers <joostkremers@fastmail.fm>, emacs-devel@gnu.org, rms@gnu.org
>> Date: Thu, 21 Feb 2019 22:19:53 +0100
>>
>> > It needs at least 30 letters to guess right, which is quite a few.
>>
>> The number of letters depends on the configured languages, it could be
>> less than 30 when the scripts are different but for English, Dutch,
>> and German 30 works well in my experience and languages don’t get much
>> more similar than that (except if you want to distinguish between US
>> English and UK English).
>
> The minimum number also depends on the expected reliability of
> language detection, of course.

Of course.  I should say that I didn’t come up with the algorithm.  It’s
a standard approach to language detection used in many contexts.  Its
selling points are high accuracy, low computational complexity, and that
only a small amount of language data is required.  For most languages,
we need only 1.2Kb of data.

[More below.]

>> I just tried it and noticed one downside: Flyspell offers possible
>> corrections for unknown words and when multiple languages are
>> configured, these suggestions come from all configured dictionaries.
>
> Of course, but what would you expect?

I would expect to get only suggestions from the language that I’m
currently typing in.

> And how is that a downside?

If I have to pick the correct word from a list that contains many
irrelevant words, it will take more time.  The suggestions are just less
relevant on average.

> Hunspell doesn't try to guess the language at all, it just looks in
> all loaded dictionaries one by one.

That’s the problem. :)

>> Many of them are of course not relevant because they are not in the
>> language of the paragraph.
>
> There's no "language of the paragraph" in this method, you can freely
> mix words from different languages in the same paragraph.  There are
> important use cases for that, like editing a message translation
> catalog or text that that explains in-line the meaning of words in
> another language.

If the use case is working with paragraphs that mix languages, the user
is free to use Hunspell.  However, there is the other use case, the one
that I’m interested in, where the document contains whole paragraphs
each in its own language.  Plus the use case where the document is in
just one language and I don’t want the spell-checker to suggest words in
some other language.

Note, that automatic language detection has other applications beyond
changing dictionaries for spell checkers.  For instance, it allows to
automatically switch the voice used by the Festival speech synthesizer,
which is useful for blind people working with text in multiple
languages.  It can also switch the typographical conventions used by
type-mode (e.g., use quote symbols that are appropriate for the current
language).  It could also switch the language of dictionaries, thesauri,
and text completion packages such as company-ngrams.

>> Flyspell also has an autocorrection feature (which I’m not using)
>> and this feature would also largely stop being useful with multiple
>> dictionaries.
>
> It will only become less useful if the first correction is off in a
> significant number of cases.  Which is not at all expected, certainly
> not when each language uses a different script.
>
>> I think that this makes the Hunspell solution less appealing.
>
> I think you are slightly biased ;-).  As am I, most probably.  Both
> solutions have their advantages and disadvantages, and the user should
> choose which one better suits his/her needs in each case.

Exactly, and that’s why I never said that people should be prevented
from using Hunspell with multiple dictionaries if that’s the best
solution for them.  :)

> I mentioned Hunspell because I think few people even know about this
> feature, which is quite unique among spellers supported by Emacs.

That is true.  I certainly didn’t know about that feature.  Hunspell is
fairly impressive, especially for languages like German that can freely
compose new words.  Following this conversation, I might actually switch.

In sum, I don’t want to push my package to anyone.  I said I would be
happy to contribute it to Emacs/Elpa /if/ there is interest.  But I’m
perfectly happy with keeping it in Melpa where it currently lives.

Regarding my initial question: I had a closer look at how flyspell-buffer works internally and I’m afraid there is no easy way to make it switch languages half-way through the document.  The hook for incorrect words is called only when the spell-checker has already finished its work.  It will be necessary to write a new function that processes the document paragraph by paragraph.

Thanks for all the suggestions.

  Titus




--
Dr. Titus von der Malsburg
Department of Linguistics
University of Potsdam, Germany
https://tmalsburg.github.io



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Changing dictionary while flyspell-buffer is running
  2019-02-22  9:57                 ` Titus von der Malsburg
@ 2019-02-22 10:32                   ` Eli Zaretskii
  0 siblings, 0 replies; 20+ messages in thread
From: Eli Zaretskii @ 2019-02-22 10:32 UTC (permalink / raw)
  To: Titus von der Malsburg; +Cc: joostkremers, rms, emacs-devel

> From: Titus von der Malsburg <malsburg@posteo.de>
> Cc: joostkremers@fastmail.fm, emacs-devel@gnu.org, rms@gnu.org
> Date: Fri, 22 Feb 2019 10:57:29 +0100
> 
> Regarding my initial question: I had a closer look at how flyspell-buffer works internally and I’m afraid there is no easy way to make it switch languages half-way through the document.  The hook for incorrect words is called only when the spell-checker has already finished its work.  It will be necessary to write a new function that processes the document paragraph by paragraph.

That's why I mentioned flyspell-region up-thread.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Changing dictionary while flyspell-buffer is running
  2019-02-21 19:42         ` Joost Kremers
  2019-02-21 20:09           ` Eli Zaretskii
@ 2019-02-23 15:24           ` Stefan Monnier
  1 sibling, 0 replies; 20+ messages in thread
From: Stefan Monnier @ 2019-02-23 15:24 UTC (permalink / raw)
  To: emacs-devel

> That really depends on the languages involved.

Agreed, as a regular user of English, French, and Spanish, my experience
with using a combined dictionary is that it leads to a fair bit of
missed typos.

Another form of mistakes I noticed is that it fails to warn me when I'm
using a word from the wrong language (it can be difficult to keep those
nearby languages separated in one's head).


        Stefan




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Changing dictionary while flyspell-buffer is running
  2019-02-22  2:06     ` Richard Stallman
  2019-02-22  9:27       ` Titus von der Malsburg
@ 2019-02-28 12:36       ` Titus von der Malsburg
  2019-02-28 17:50         ` Eli Zaretskii
  1 sibling, 1 reply; 20+ messages in thread
From: Titus von der Malsburg @ 2019-02-28 12:36 UTC (permalink / raw)
  To: rms; +Cc: emacs-devel


On 2019-02-22 Fri 03:06, Richard Stallman wrote:
> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
> [[[ whether defending the US Constitution against all enemies,     ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
>   > > I don't use flyspell, but I would appreciate this feature of
>   > > automatically selecting the proper dictionary.  Could you arrange
>   > > some other way to automatically run the program?
>   > > Perhaps a variable that would tell the user-level ispell commands
>   > > to call your package at the suitable times?
>
>   > Sure, that would work, but then we’d have to make changes in
>   > ispell.el.
>
> Would adding hooks in ispell.el be enough?

Below is a patch for ispell.el that adds a hook that is run before
`ispell-set-spellchecker-params'.  Functions on this hook of course need
to be careful not to trigger an infinite recursion when they call Ispell
functions like `ispell-change-dictionary'.  There is a note on that in
the documentation of the hook.

To make automatic language detection work with Ispell, the following
function could be added to this hook:

(defun guess-language-ispell-set-buffer-language ()
  "Detects the buffer language and changes the dictionary for Ispell
if the language has changed since the last detection."
  (let* ((lang (guess-language-buffer))
        (new-dictionary (cadr (assq lang guess-language-langcodes)))
        (ispell-before-setting-spellchecker-params-hook
          (delete #'guess-language-ispell-set-buffer-language
                  (copy-sequence ispell-before-setting-spellchecker-params-hook))))
    (ispell-change-dictionary new-dictionary)))

(add-hook 'ispell-before-setting-spellchecker-params-hook
          #'guess-language-ispell-set-buffer-language)

  Titus


diff --git a/lisp/textmodes/ispell.el b/lisp/textmodes/ispell.el
index 237997d41d..87e369ebd7 100644
--- a/lisp/textmodes/ispell.el
+++ b/lisp/textmodes/ispell.el
@@ -211,6 +211,7 @@ ispell-program-name
   :set (lambda (symbol value)
          (set-default symbol value)
          (if (featurep 'ispell)
+             (run-hooks 'ispell-before-setting-spellchecker-params-hook)
              (ispell-set-spellchecker-params)))
   :group 'ispell)
 
@@ -1259,6 +1260,13 @@ ispell-last-program-name
 ;; advertised in the doc string of ispell-initialize-spellchecker-hook.
 (defvar ispell-base-dicts-override-alist)
 
+(defvar ispell-before-setting-spellchecker-params-hook nil
+  "Normal hook run before setting spellchecker parameters.
+This hook can be used for automatic spellchecker configuration,
+e.g. automatic detection of the buffer language.  Hooked
+functions need to be careful when calling Ispell functions as
+this could cause infinite recursion.")
+
 (defvar ispell-initialize-spellchecker-hook nil
   "Normal hook run on spellchecker initialization.
 This hook is run when a spellchecker is used for the first
@@ -1394,8 +1402,9 @@ ispell-valid-dictionary-list
 The variable `ispell-library-directory' defines their location."
   ;; Initialize variables and dictionaries alists for desired spellchecker.
   ;; Make sure ispell.el is loaded to avoid some autoload loops.
-  (if (featurep 'ispell)
-      (ispell-set-spellchecker-params))
+  (when (featurep 'ispell)
+    (run-hooks 'ispell-before-setting-spellchecker-params-hook)
+    (ispell-set-spellchecker-params))
 
   (let ((dicts (append ispell-local-dictionary-alist ispell-dictionary-alist))
 	(dict-list (cons "default" nil))
@@ -1935,6 +1944,7 @@ ispell-word
     (ispell-region (region-beginning) (region-end)))
    (continue (ispell-continue))
    (t
+    (run-hooks 'ispell-before-setting-spellchecker-params-hook)
     (ispell-set-spellchecker-params)    ; Initialize variables and dicts alists
     (ispell-accept-buffer-local-defs)	; use the correct dictionary
     (let ((cursor-location (point))	; retain cursor location
@@ -2035,6 +2045,7 @@ ispell-get-word
 
 Word syntax is controlled by the definition of the chosen dictionary,
 which is in `ispell-local-dictionary-alist' or `ispell-dictionary-alist'."
+  (run-hooks 'ispell-before-setting-spellchecker-params-hook)
   (ispell-set-spellchecker-params)    ; Initialize variables and dicts alists
   (let* ((ispell-casechars (ispell-get-casechars))
 	 (ispell-not-casechars (ispell-get-not-casechars))
@@ -2989,6 +3000,7 @@ ispell-change-dictionary
 	       (mapcar #'list (ispell-valid-dictionary-list)))
 	  nil t)
 	 current-prefix-arg))
+  (run-hooks 'ispell-before-setting-spellchecker-params-hook)
   (ispell-set-spellchecker-params) ; Initialize variables and dicts alists
   (unless arg (ispell-buffer-local-dict 'no-reload))
   (if (equal dict "default") (setq dict nil))
@@ -3047,6 +3059,7 @@ ispell-region
 Return nil if spell session was terminated, otherwise returns shift offset
 amount for last line processed."
   (interactive "r")			; Don't flag errors on read-only bufs.
+  (run-hooks 'ispell-before-setting-spellchecker-params-hook)
   (ispell-set-spellchecker-params)      ; Initialize variables and dicts alists
   (if (not recheckp)
       (ispell-accept-buffer-local-defs)) ; set up dictionary, local words, etc.




^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: Changing dictionary while flyspell-buffer is running
  2019-02-28 12:36       ` Titus von der Malsburg
@ 2019-02-28 17:50         ` Eli Zaretskii
  0 siblings, 0 replies; 20+ messages in thread
From: Eli Zaretskii @ 2019-02-28 17:50 UTC (permalink / raw)
  To: Titus von der Malsburg; +Cc: rms, emacs-devel

> From: Titus von der Malsburg <malsburg@posteo.de>
> Date: Thu, 28 Feb 2019 13:36:03 +0100
> Cc: emacs-devel@gnu.org
> 
> Below is a patch for ispell.el that adds a hook that is run before
> `ispell-set-spellchecker-params'.  Functions on this hook of course need
> to be careful not to trigger an infinite recursion when they call Ispell
> functions like `ispell-change-dictionary'.  There is a note on that in
> the documentation of the hook.

Thanks, please accompany this with a suitable change in NEWS which
announces this new feature.

> +(defvar ispell-before-setting-spellchecker-params-hook nil
> +  "Normal hook run before setting spellchecker parameters.

This single line doesn't tell in enough detail when the hook is run.
"Before setting spellchecker parameters" does not necessarily explain
itself clearly enough, because the reader doesn't necessarily know
where these parameters are set in the sequence of actions performed
for spell-checking.  Please add more explanations about that.  In
particular, I think it's important to understand whether this is
called before each invocation of the spellchecker, or just some of
them.



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2019-02-28 17:50 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-02-19 22:58 Changing dictionary while flyspell-buffer is running Titus von der Malsburg
2019-02-20 17:05 ` Eli Zaretskii
     [not found]   ` <87y36as44p.fsf@posteo.de>
     [not found]     ` <83o976gvd2.fsf@gnu.org>
     [not found]       ` <87h8cxsed3.fsf@posteo.de>
2019-02-21 14:59         ` Eli Zaretskii
2019-02-21  3:26 ` Richard Stallman
2019-02-21  3:46   ` Eli Zaretskii
2019-02-21  8:34     ` Titus von der Malsburg
2019-02-21 14:53       ` Eli Zaretskii
2019-02-21 19:42         ` Joost Kremers
2019-02-21 20:09           ` Eli Zaretskii
2019-02-21 21:19             ` Titus von der Malsburg
2019-02-22  7:10               ` Eli Zaretskii
2019-02-22  9:57                 ` Titus von der Malsburg
2019-02-22 10:32                   ` Eli Zaretskii
2019-02-23 15:24           ` Stefan Monnier
2019-02-21  8:29   ` Titus von der Malsburg
2019-02-21 13:12     ` Clément Pit-Claudel
2019-02-22  2:06     ` Richard Stallman
2019-02-22  9:27       ` Titus von der Malsburg
2019-02-28 12:36       ` Titus von der Malsburg
2019-02-28 17:50         ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).