* 27.0.60; ispell ignores syntax/category tables word boundaries
@ 2020-02-07 6:16 Paul W. Rankin
2020-02-07 8:57 ` Eli Zaretskii
2020-02-07 15:43 ` Paul W. Rankin
0 siblings, 2 replies; 3+ messages in thread
From: Paul W. Rankin @ 2020-02-07 6:16 UTC (permalink / raw)
To: help-gnu-emacs
Hello,
It appears that the function `ispell-get-word' makes its own judgements
on word boundaries, ignoring the buffer's syntax tables and character
categories. This becomes a problem with using `electric-quote-mode' and
ispell, because contractions are parsed as separate words. e.g. Calling
ispell word for "doesn’t" returns:
T is correct
To reproduce:
1. emacs -Q
2. (in *scratch*) M-x text-mode RET
3. enter text "doesn’t" (i.e. "doesn" C-x 8 ] "t")
4. M-: (modify-syntax-entry ?’ "w")
5. M-: (modify-category-entry ?’ ?^)
6. M-$ | ispell-word
Expected results:
Given the above syntax and category tables, M-f | forward-word and M-b |
backward-word now consider "doesn’t" as a single word, and so should
should be passed to the `ispell-program-name' and produce the same
result as when checked on the command line:
% echo "doesn’t" | aspell -a
@(#) International Ispell Version 3.1.20 (but really Aspell 0.60.8)
*
% echo "doesn’t" | enchant-2 -a
@(#) International Ispell Version 3.1.20 (but really Enchant 2.2.7)
*
Actual results:
The word "doesn’t" is parsed as "t":
T is correct
Attempts at workarounds:
I've tried altering slot 3 of the corresponding `ispell-dictionary-base-alist'
entries from "[']" to "['’]" to no avail.
Setup:
GNU Emacs 27.0.60 (build 2, x86_64-apple-darwin19.3.0, NS appkit-1894.30
Version 10.15.3 (Build 19D76)) of 2020-02-05
--
Paul W. Rankin
https://www.paulwrankin.com
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: 27.0.60; ispell ignores syntax/category tables word boundaries
2020-02-07 6:16 27.0.60; ispell ignores syntax/category tables word boundaries Paul W. Rankin
@ 2020-02-07 8:57 ` Eli Zaretskii
2020-02-07 15:43 ` Paul W. Rankin
1 sibling, 0 replies; 3+ messages in thread
From: Eli Zaretskii @ 2020-02-07 8:57 UTC (permalink / raw)
To: help-gnu-emacs
> From: "Paul W. Rankin" <hello@paulwrankin.com>
> Date: Fri, 07 Feb 2020 16:16:54 +1000
>
> It appears that the function `ispell-get-word' makes its own judgements
> on word boundaries, ignoring the buffer's syntax tables and character
> categories.
That is true. And I don't really see how it can be any different,
since ispell.el must have the same notion of a word as the underlying
dictionary, otherwise you will have false positives and/or false
negatives, right?
ispell.el looks up the word characters and non-word characters in
its database, and the doc string of ispell-dictionary-base-alist
explains how.
> This becomes a problem with using `electric-quote-mode' and
> ispell, because contractions are parsed as separate words. e.g. Calling
> ispell word for "doesn’t" returns:
>
> T is correct
>
> To reproduce:
>
> 1. emacs -Q
> 2. (in *scratch*) M-x text-mode RET
> 3. enter text "doesn’t" (i.e. "doesn" C-x 8 ] "t")
> 4. M-: (modify-syntax-entry ?’ "w")
> 5. M-: (modify-category-entry ?’ ?^)
> 6. M-$ | ispell-word
The buffer syntax table has no effect on ispell.el, and shouldn't have
any effect on it.
> Attempts at workarounds:
>
> I've tried altering slot 3 of the corresponding `ispell-dictionary-base-alist'
> entries from "[']" to "['’]" to no avail.
That's the right direction, but you didn't follow it far enough.
First, ispell-dictionary-base-alist is the default value, and is used
to produce ispell-dictionary-alist, which is one you should change
(alternatively, customize ispell-local-dictionary-alist). More
importantly, the definitions of each dictionary include more than just
one character set: there are 3 character sets there and one parameter
for encoding the string passed to the spell-checker, and you should be
sure to set them all as appropriate for the dictionary you use.
My suggestion is to step with Edebug through ispell-get-word and see
why it doesn't consider "doesn’t" as a single word in your case.
> Setup:
>
> GNU Emacs 27.0.60 (build 2, x86_64-apple-darwin19.3.0, NS appkit-1894.30
> Version 10.15.3 (Build 19D76)) of 2020-02-05
This omits crucial information, like the dictionary in use and the
locale-dependent settings that affect encoding. (In any case, I don't
think this list is the right place of discussing this issue.)
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: 27.0.60; ispell ignores syntax/category tables word boundaries
2020-02-07 6:16 27.0.60; ispell ignores syntax/category tables word boundaries Paul W. Rankin
2020-02-07 8:57 ` Eli Zaretskii
@ 2020-02-07 15:43 ` Paul W. Rankin
1 sibling, 0 replies; 3+ messages in thread
From: Paul W. Rankin @ 2020-02-07 15:43 UTC (permalink / raw)
To: help-gnu-emacs
> On 7 Feb 2020, at 4:16 pm, Paul W. Rankin <hello@paulwrankin.com> wrote:
>
> Hello,
>
> It appears that the function `ispell-get-word' makes its own judgements
> on word boundaries...
I'm sorry for the noise, this was supposed to go to the bugs list
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-02-07 15:43 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-02-07 6:16 27.0.60; ispell ignores syntax/category tables word boundaries Paul W. Rankin
2020-02-07 8:57 ` Eli Zaretskii
2020-02-07 15:43 ` Paul W. Rankin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).