unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#52179: Highlighting a word in `ispell' using `enchant'
@ 2021-11-29 14:44 Tor Kringeland
  2021-11-29 14:51 ` Eli Zaretskii
  2022-05-24 15:49 ` Tor Kringeland
  0 siblings, 2 replies; 14+ messages in thread
From: Tor Kringeland @ 2021-11-29 14:44 UTC (permalink / raw)
  To: 52179

Using `ispell' with `enchant' on macOS yields the following problem.  If
a word contains some non-ASCII character, said character will not be
considered part of the word and will split it (like a digit would).  For
example in "naïve" both "na" and "ve" are considered two words.  This
does not happen if I use `aspell' instead of `enchant', and if I run

  echo -n "naïve" | enchant-2 -a

it registers that this is one word, and that it is valid (using an
English dictionary).

I'm using Enchant version 2.3.1 and an Emacs 29 build from 24 November
on macOS Catalina.





^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#52179: Highlighting a word in `ispell' using `enchant'
  2021-11-29 14:44 bug#52179: Highlighting a word in `ispell' using `enchant' Tor Kringeland
@ 2021-11-29 14:51 ` Eli Zaretskii
  2021-11-29 20:46   ` Tor Kringeland
  2022-05-24 15:49 ` Tor Kringeland
  1 sibling, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2021-11-29 14:51 UTC (permalink / raw)
  To: Tor Kringeland; +Cc: 52179

> From: Tor Kringeland <tor.a.s.kringeland@ntnu.no>
> Date: Mon, 29 Nov 2021 15:44:39 +0100
> 
> Using `ispell' with `enchant' on macOS yields the following problem.  If
> a word contains some non-ASCII character, said character will not be
> considered part of the word and will split it (like a digit would).  For
> example in "naïve" both "na" and "ve" are considered two words.  This
> does not happen if I use `aspell' instead of `enchant', and if I run
> 
>   echo -n "naïve" | enchant-2 -a
> 
> it registers that this is one word, and that it is valid (using an
> English dictionary).
> 
> I'm using Enchant version 2.3.1 and an Emacs 29 build from 24 November
> on macOS Catalina.

Which dictionary do you use, and what encoding does that dictionary
require?





^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#52179: Highlighting a word in `ispell' using `enchant'
  2021-11-29 14:51 ` Eli Zaretskii
@ 2021-11-29 20:46   ` Tor Kringeland
  2021-11-30  3:22     ` Eli Zaretskii
  0 siblings, 1 reply; 14+ messages in thread
From: Tor Kringeland @ 2021-11-29 20:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 52179

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Tor Kringeland <tor.a.s.kringeland@ntnu.no>
>> Date: Mon, 29 Nov 2021 15:44:39 +0100
>> 
>> Using `ispell' with `enchant' on macOS yields the following problem.  If
>> a word contains some non-ASCII character, said character will not be
>> considered part of the word and will split it (like a digit would).  For
>> example in "naïve" both "na" and "ve" are considered two words.  This
>> does not happen if I use `aspell' instead of `enchant', and if I run
>> 
>>   echo -n "naïve" | enchant-2 -a
>> 
>> it registers that this is one word, and that it is valid (using an
>> English dictionary).
>> 
>> I'm using Enchant version 2.3.1 and an Emacs 29 build from 24 November
>> on macOS Catalina.
>
> Which dictionary do you use, and what encoding does that dictionary
> require?

In Emacs, the relevant entry in `ispell-dictionary-alist' is

  ("en" "[[:alpha:]]" "[^[:alpha:]]" "" t nil nil utf-8)

I installed `aspell' and `enchant' from Homebrew.  The installation of
`aspell' included a bunch of dictionaries downloaded from gnu.org.  In
particular, the "en" dictionary is downloaded from [1].  It is in some
kind of binary format after installation (see [2] for details).

The weird part is that it works fine in a command line, and switching
`ispell-program-name' to use `aspell' fixes the issue, so the problem
seems to be somehow in how Emacs interacts with the `enchant-2' binary.
It's doing the same for non-ASCII characters as one would expect from
numbers: the string "one0two" is valid, as "one" and "two" are treated
as separate words and "0" is ignored.

- [1] https://ftp.gnu.org/gnu/aspell/dict/en/aspell6-en-2018.04.16-0.tar.bz2

- [2] https://github.com/Homebrew/homebrew-core/blob/master/Formula/aspell.rb





^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#52179: Highlighting a word in `ispell' using `enchant'
  2021-11-29 20:46   ` Tor Kringeland
@ 2021-11-30  3:22     ` Eli Zaretskii
  0 siblings, 0 replies; 14+ messages in thread
From: Eli Zaretskii @ 2021-11-30  3:22 UTC (permalink / raw)
  To: Tor Kringeland; +Cc: 52179

> From: Tor Kringeland <tor.a.s.kringeland@ntnu.no>
> Cc: 52179@debbugs.gnu.org
> Date: Mon, 29 Nov 2021 21:46:05 +0100
> 
> >> I'm using Enchant version 2.3.1 and an Emacs 29 build from 24 November
> >> on macOS Catalina.
> >
> > Which dictionary do you use, and what encoding does that dictionary
> > require?
> 
> In Emacs, the relevant entry in `ispell-dictionary-alist' is
> 
>   ("en" "[[:alpha:]]" "[^[:alpha:]]" "" t nil nil utf-8)
> 
> I installed `aspell' and `enchant' from Homebrew.  The installation of
> `aspell' included a bunch of dictionaries downloaded from gnu.org.  In
> particular, the "en" dictionary is downloaded from [1].  It is in some
> kind of binary format after installation (see [2] for details).
> 
> The weird part is that it works fine in a command line, and switching
> `ispell-program-name' to use `aspell' fixes the issue, so the problem
> seems to be somehow in how Emacs interacts with the `enchant-2' binary.
> It's doing the same for non-ASCII characters as one would expect from
> numbers: the string "one0two" is valid, as "one" and "two" are treated
> as separate words and "0" is ignored.

I suspect that the problem is with the LC_* and LANG environment
variables in the problematic case, which confuse the speller with
regards to the language and encoding used.





^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#52179: Highlighting a word in `ispell' using `enchant'
  2021-11-29 14:44 bug#52179: Highlighting a word in `ispell' using `enchant' Tor Kringeland
  2021-11-29 14:51 ` Eli Zaretskii
@ 2022-05-24 15:49 ` Tor Kringeland
  2022-05-24 19:11   ` Eli Zaretskii
  1 sibling, 1 reply; 14+ messages in thread
From: Tor Kringeland @ 2022-05-24 15:49 UTC (permalink / raw)
  To: 52179@debbugs.gnu.org

[-- Attachment #1: Type: text/plain, Size: 943 bytes --]

I had another look at it and managed to solve the problem. By default `ispell-dictionary-base-alist' is conservative in setting the regexes for what counts as a word in a given language. I had set `ispell-current-dictionary' to nil so enchant would choose the dictionary on its own. The nil entry in the aforementioned alist says a word is a string with the letters A-Z or a-z and possibly containing '. Changing it to [[:alpha:]] solved the issue.

The restrictive regexes are set in `ispell-dictionary-base-alist' but in `ispell-dictionary-alist' (at least for me) most of the regexes are changed to [[:alpha:]]. I'm not sure how that is done but probably there's some good logic behind it. "british" and "american" (as well as the nil entry) still have the restrictive regexes, though ...

Could we change the nil entry to have the [[:alpha:]] regex? So, if the user doesn't explicitly set their dictionary, it works more as expected?

[-- Attachment #2: Type: text/html, Size: 1750 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#52179: Highlighting a word in `ispell' using `enchant'
  2022-05-24 15:49 ` Tor Kringeland
@ 2022-05-24 19:11   ` Eli Zaretskii
  2022-05-24 19:27     ` Tor Kringeland
  0 siblings, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2022-05-24 19:11 UTC (permalink / raw)
  To: Tor Kringeland; +Cc: 52179

> From: Tor Kringeland <tor.kringeland@ntnu.no>
> Date: Tue, 24 May 2022 15:49:45 +0000
> 
> Could we change the nil entry to have the [[:alpha:]] regex? So, if the user doesn't explicitly set their dictionary,
> it works more as expected?

Aren't we already doing that?  See ispell-find-enchant-dictionaries
and similar functions for Aspell and Hunspell.  I don't think we can
make that the default because Ispell doesn't support the full
[:alpha:] character class, it only supports a single unibyte encoding.

Or am I missing something?





^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#52179: Highlighting a word in `ispell' using `enchant'
  2022-05-24 19:11   ` Eli Zaretskii
@ 2022-05-24 19:27     ` Tor Kringeland
  2022-05-24 19:36       ` Eli Zaretskii
  0 siblings, 1 reply; 14+ messages in thread
From: Tor Kringeland @ 2022-05-24 19:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 52179@debbugs.gnu.org

Aspell works for me, but enchant doesn't. When I change `ispell-program-name' to "aspell" ispell starts recognizing full words.

Looking into the functions you mentioned, it seems like `ispell-find-aspell-dictionaries' explicitly adds an entry for nil, while `ispell-find-enchant-dictionaries' doesn't. This is what causes the bug for me since I have set `ispell-dictionary' to nil/haven't changed it so the resulting regex recognizing words is too strict for my use.

Might there be a reason why `ispell-find-enchant-dictionaries' doesn't set a nil entry? For sure I would think it could handle whatever input aspell can.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#52179: Highlighting a word in `ispell' using `enchant'
  2022-05-24 19:27     ` Tor Kringeland
@ 2022-05-24 19:36       ` Eli Zaretskii
  2022-05-24 21:34         ` Reuben Thomas via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2022-05-24 19:36 UTC (permalink / raw)
  To: Tor Kringeland, Reuben Thomas; +Cc: 52179

> From: Tor Kringeland <tor.kringeland@ntnu.no>
> CC: "52179@debbugs.gnu.org" <52179@debbugs.gnu.org>
> Date: Tue, 24 May 2022 19:27:58 +0000
> 
> Aspell works for me, but enchant doesn't. When I change `ispell-program-name' to "aspell" ispell starts recognizing full words.
> 
> Looking into the functions you mentioned, it seems like `ispell-find-aspell-dictionaries' explicitly adds an entry for nil, while `ispell-find-enchant-dictionaries' doesn't. This is what causes the bug for me since I have set `ispell-dictionary' to nil/haven't changed it so the resulting regex recognizing words is too strict for my use.

Then maybe ispell-find-enchant-dictionaries should be improved?

> Might there be a reason why `ispell-find-enchant-dictionaries' doesn't set a nil entry? For sure I would think it could handle whatever input aspell can.

I don't know.  CC'ing Reuben, who might know better.





^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#52179: Highlighting a word in `ispell' using `enchant'
  2022-05-24 19:36       ` Eli Zaretskii
@ 2022-05-24 21:34         ` Reuben Thomas via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-05-25  2:28           ` Eli Zaretskii
  0 siblings, 1 reply; 14+ messages in thread
From: Reuben Thomas via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-05-24 21:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Tor Kringeland, 52179

[-- Attachment #1: Type: text/plain, Size: 1949 bytes --]

On Tue, 24 May 2022 at 20:36, Eli Zaretskii <eliz@gnu.org> wrote:

> > From: Tor Kringeland <tor.kringeland@ntnu.no>
> > CC: "52179@debbugs.gnu.org" <52179@debbugs.gnu.org>
> > Date: Tue, 24 May 2022 19:27:58 +0000
> >
> > Aspell works for me, but enchant doesn't. When I change
> `ispell-program-name' to "aspell" ispell starts recognizing full words.
> >
> > Looking into the functions you mentioned, it seems like
> `ispell-find-aspell-dictionaries' explicitly adds an entry for nil, while
> `ispell-find-enchant-dictionaries' doesn't. This is what causes the bug for
> me since I have set `ispell-dictionary' to nil/haven't changed it so the
> resulting regex recognizing words is too strict for my use.
>
> Then maybe ispell-find-enchant-dictionaries should be improved?
>
> > Might there be a reason why `ispell-find-enchant-dictionaries' doesn't
> set a nil entry? For sure I would think it could handle whatever input
> aspell can.
>
> I don't know.  CC'ing Reuben, who might know better.
>

I had a look at the code. I think what is happening is that the default
dictionary in ispell-dictionary-base-alist uses only [A-Za-z] for word
chars and [^A-Za-z] for non-word chars. The assumption is that this works
for ispell (the program). Then, [[:alpha:]] and [^[:alpha:]] are used in
the default 'nil'-keyed entry for aspell. As far as I can tell from the
rather hairy hunspell code, it too does not set a default entry.

Since ispell is the only spellchecker Emacs supports that can't cope with
[[:alpha:]], it would seem more sensible to have a default (nil-keyed)
setting in ispell-dictionary-base-alist, and to overwrite the default with
[A-Za-z] only if the spellchecker is really ispell.

This way, duplicate code can be removed and future spellcheckers will not
need to rediscover this problem.

(I never came across this problem because I have customized
ispell-local-dictionary-alist with my own nil entry!)

-- 
https://rrt.sc3d.org

[-- Attachment #2: Type: text/html, Size: 3487 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#52179: Highlighting a word in `ispell' using `enchant'
  2022-05-24 21:34         ` Reuben Thomas via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-05-25  2:28           ` Eli Zaretskii
  2022-05-25  7:39             ` Reuben Thomas via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2022-05-25  2:28 UTC (permalink / raw)
  To: Reuben Thomas; +Cc: tor.kringeland, 52179

> From: Reuben Thomas <rrt@sc3d.org>
> Date: Tue, 24 May 2022 22:34:14 +0100
> Cc: Tor Kringeland <tor.kringeland@ntnu.no>, 52179@debbugs.gnu.org
> 
> Since ispell is the only spellchecker Emacs supports that can't cope with [[:alpha:]], it would seem more
> sensible to have a default (nil-keyed) setting in ispell-dictionary-base-alist, and to overwrite the default with
> [A-Za-z] only if the spellchecker is really ispell.
> 
> This way, duplicate code can be removed and future spellcheckers will not need to rediscover this problem.

That might be okay, but how does it help us get the Enchant support in
ispell.el DTRT?  We'd still need to detect when Ispell is used as the
back-end speller, no?

The issue here is that the current code already "patches" the nil case
to use [:alpha:] when Aspell or Hunspell are the spell-checker, and we
need to do that for Enchant.  If you are saying that Enchant doesn't
need to worry about the Ispell case, we could just modify
ispell-find-enchant-dictionaries to always "patch" the nil entry to
use [:alpha:], and leave the rest of the code, which already works,
intact.  If Enchant does need to cater to Ispell, we need a more
complex change, but again it's local th the Enchant support code.





^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#52179: Highlighting a word in `ispell' using `enchant'
  2022-05-25  2:28           ` Eli Zaretskii
@ 2022-05-25  7:39             ` Reuben Thomas via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-05-25 13:23               ` Eli Zaretskii
  0 siblings, 1 reply; 14+ messages in thread
From: Reuben Thomas via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-05-25  7:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: tor.kringeland, 52179

[-- Attachment #1: Type: text/plain, Size: 966 bytes --]

On Wed, 25 May 2022 at 03:28, Eli Zaretskii <eliz@gnu.org> wrote:

> > From: Reuben Thomas <rrt@sc3d.org>
> > Date: Tue, 24 May 2022 22:34:14 +0100
> > Cc: Tor Kringeland <tor.kringeland@ntnu.no>, 52179@debbugs.gnu.org
> >
> > Since ispell is the only spellchecker Emacs supports that can't cope
> with [[:alpha:]], it would seem more
> > sensible to have a default (nil-keyed) setting in
> ispell-dictionary-base-alist, and to overwrite the default with
> > [A-Za-z] only if the spellchecker is really ispell.
> >
> > This way, duplicate code can be removed and future spellcheckers will
> not need to rediscover this problem.
>
> That might be okay, but how does it help us get the Enchant support in
> ispell.el DTRT?  We'd still need to detect when Ispell is used as the
> back-end speller, no?
>

Versions of Enchant compatible with Emacs do not support Ispell as the
back-end speller (I removed this support in version 2 of Enchant).

-- 
https://rrt.sc3d.org

[-- Attachment #2: Type: text/html, Size: 1862 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#52179: Highlighting a word in `ispell' using `enchant'
  2022-05-25  7:39             ` Reuben Thomas via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-05-25 13:23               ` Eli Zaretskii
  2022-05-27 13:45                 ` Tor Kringeland
  0 siblings, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2022-05-25 13:23 UTC (permalink / raw)
  To: Reuben Thomas; +Cc: tor.kringeland, 52179

> From: Reuben Thomas <rrt@sc3d.org>
> Date: Wed, 25 May 2022 08:39:49 +0100
> Cc: tor.kringeland@ntnu.no, 52179@debbugs.gnu.org
> 
> 
> [1:text/plain Show]
> 
> 
> [2:text/html Hide Save:noname (1kB)]
> 
> On Wed, 25 May 2022 at 03:28, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>  > From: Reuben Thomas <rrt@sc3d.org>
>  > Date: Tue, 24 May 2022 22:34:14 +0100
>  > Cc: Tor Kringeland <tor.kringeland@ntnu.no>, 52179@debbugs.gnu.org
>  > 
>  > Since ispell is the only spellchecker Emacs supports that can't cope with [[:alpha:]], it would seem
>  more
>  > sensible to have a default (nil-keyed) setting in ispell-dictionary-base-alist, and to overwrite the default
>  with
>  > [A-Za-z] only if the spellchecker is really ispell.
>  > 
>  > This way, duplicate code can be removed and future spellcheckers will not need to rediscover this
>  problem.
> 
>  That might be okay, but how does it help us get the Enchant support in
>  ispell.el DTRT?  We'd still need to detect when Ispell is used as the
>  back-end speller, no?
> 
> Versions of Enchant compatible with Emacs do not support Ispell as the back-end speller (I removed this
> support in version 2 of Enchant).

So we could just modify ispell-find-enchant-dictionaries to always
patch the nil entry, like the other two backends do.  Right?





^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#52179: Highlighting a word in `ispell' using `enchant'
  2022-05-25 13:23               ` Eli Zaretskii
@ 2022-05-27 13:45                 ` Tor Kringeland
  2022-05-27 14:22                   ` Eli Zaretskii
  0 siblings, 1 reply; 14+ messages in thread
From: Tor Kringeland @ 2022-05-27 13:45 UTC (permalink / raw)
  To: Eli Zaretskii, Reuben Thomas; +Cc: 52179@debbugs.gnu.org

Eli Zaretskii <eliz@gnu.org> writes:

> So we could just modify ispell-find-enchant-dictionaries to always
> patch the nil entry, like the other two backends do.  Right?

Yes, it's the easiest way to fix it.  But maybe we should document this
behavior explicitly?  So if support for some other spell-checker is
added it's clear that the nil entry requires special attention.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#52179: Highlighting a word in `ispell' using `enchant'
  2022-05-27 13:45                 ` Tor Kringeland
@ 2022-05-27 14:22                   ` Eli Zaretskii
  0 siblings, 0 replies; 14+ messages in thread
From: Eli Zaretskii @ 2022-05-27 14:22 UTC (permalink / raw)
  To: Tor Kringeland; +Cc: 52179, rrt

> From: Tor Kringeland <tor.kringeland@ntnu.no>
> CC: "52179@debbugs.gnu.org" <52179@debbugs.gnu.org>
> Date: Fri, 27 May 2022 13:45:14 +0000
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > So we could just modify ispell-find-enchant-dictionaries to always
> > patch the nil entry, like the other two backends do.  Right?
> 
> Yes, it's the easiest way to fix it.  But maybe we should document this
> behavior explicitly?  So if support for some other spell-checker is
> added it's clear that the nil entry requires special attention.

Sure, adding comments for this is fine.





^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-05-27 14:22 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-29 14:44 bug#52179: Highlighting a word in `ispell' using `enchant' Tor Kringeland
2021-11-29 14:51 ` Eli Zaretskii
2021-11-29 20:46   ` Tor Kringeland
2021-11-30  3:22     ` Eli Zaretskii
2022-05-24 15:49 ` Tor Kringeland
2022-05-24 19:11   ` Eli Zaretskii
2022-05-24 19:27     ` Tor Kringeland
2022-05-24 19:36       ` Eli Zaretskii
2022-05-24 21:34         ` Reuben Thomas via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-05-25  2:28           ` Eli Zaretskii
2022-05-25  7:39             ` Reuben Thomas via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-05-25 13:23               ` Eli Zaretskii
2022-05-27 13:45                 ` Tor Kringeland
2022-05-27 14:22                   ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).