unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#39483: 27.0.60; ispell ignores syntax/category tables word boundaries
@ 2020-02-07 15:44 Paul W. Rankin
  2020-02-07 18:23 ` Eli Zaretskii
  0 siblings, 1 reply; 6+ messages in thread
From: Paul W. Rankin @ 2020-02-07 15:44 UTC (permalink / raw)
  To: 39483

Hello,

It appears that the function `ispell-get-word' makes its own judgements
on word boundaries, ignoring the buffer's syntax tables and character
categories. This becomes a problem with using `electric-quote-mode' and
ispell, because contractions are parsed as separate words. e.g. Calling
`ispell-word' for "doesn’t" returns:

    T is correct

To reproduce:

1. emacs -Q
2. (in *scratch*) M-x text-mode RET
3. enter text "doesn’t" (i.e. "doesn" C-x 8 ] "t")
4. M-: (modify-syntax-entry ?’ "w")
5. M-: (modify-category-entry ?’ ?^)
6. M-$ | ispell-word

Expected results:

Given the above syntax and category tables, M-f | forward-word and M-b |
backward-word now consider "doesn’t" as a single word, and so should
should be passed to the `ispell-program-name' and produce the same
result as when checked on the command line:

    % echo "doesn’t" | aspell -a
    @(#) International Ispell Version 3.1.20 (but really Aspell 0.60.8)
    *
    % echo "doesn’t" | enchant-2 -a
    @(#) International Ispell Version 3.1.20 (but really Enchant 2.2.7)
    *

Actual results:

The word "doesn’t" is parsed as "t":
    T is correct

Attempts at workarounds:

I've tried altering slot 3 of the corresponding `ispell-dictionary-base-alist'
entries from "[']" to "['’]" to no avail.

Setup:

GNU Emacs 27.0.60 (build 2, x86_64-apple-darwin19.3.0, NS appkit-1894.30
Version 10.15.3 (Build 19D76)) of 2020-02-05

-- 
Paul W. Rankin
https://www.paulwrankin.com





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#39483: 27.0.60; ispell ignores syntax/category tables word boundaries
  2020-02-07 15:44 bug#39483: 27.0.60; ispell ignores syntax/category tables word boundaries Paul W. Rankin
@ 2020-02-07 18:23 ` Eli Zaretskii
  2020-02-08  5:47   ` Paul W. Rankin
  0 siblings, 1 reply; 6+ messages in thread
From: Eli Zaretskii @ 2020-02-07 18:23 UTC (permalink / raw)
  To: Paul W. Rankin; +Cc: 39483

> From: "Paul W. Rankin" <hello@paulwrankin.com>
> Date: Sat, 08 Feb 2020 01:44:52 +1000
> 
> It appears that the function `ispell-get-word' makes its own judgements
> on word boundaries, ignoring the buffer's syntax tables and character
> categories.

That is true.  And I don't really see how it can be any different,
since ispell.el must have the same notion of a word as the underlying
dictionary, otherwise you will have false positives and/or false
negatives, right?

ispell.el looks up the word characters and non-word characters in
its database, and the doc string of ispell-dictionary-base-alist
explains how.

> This becomes a problem with using `electric-quote-mode' and
> ispell, because contractions are parsed as separate words. e.g. Calling
> ispell word for "doesn’t" returns:
> 
>     T is correct
> 
> To reproduce:
> 
> 1. emacs -Q
> 2. (in *scratch*) M-x text-mode RET
> 3. enter text "doesn’t" (i.e. "doesn" C-x 8 ] "t")
> 4. M-: (modify-syntax-entry ?’ "w")
> 5. M-: (modify-category-entry ?’ ?^)
> 6. M-$ | ispell-word

The buffer syntax table has no effect on ispell.el, and shouldn't have
any effect on it.

> Attempts at workarounds:
> 
> I've tried altering slot 3 of the corresponding `ispell-dictionary-base-alist'
> entries from "[']" to "['’]" to no avail.

That's the right direction, but you didn't follow it far enough.
First, ispell-dictionary-base-alist is the default value, and is used
to produce ispell-dictionary-alist, which is one you should change
(alternatively, customize ispell-local-dictionary-alist).  More
importantly, the definitions of each dictionary include more than just
one character set: there are 3 character sets there and one parameter
for encoding the string passed to the spell-checker, and you should be
sure to set them all as appropriate for the dictionary you use.

My suggestion is to step with Edebug through ispell-get-word and see
why it doesn't consider "doesn’t" as a single word in your case.

> Setup:
> 
> GNU Emacs 27.0.60 (build 2, x86_64-apple-darwin19.3.0, NS appkit-1894.30
> Version 10.15.3 (Build 19D76)) of 2020-02-05

This omits crucial information, like the dictionary in use and the
locale-dependent settings that affect encoding.  (In any case, I don't
think this list is the right place of discussing this issue.)





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#39483: 27.0.60; ispell ignores syntax/category tables word boundaries
  2020-02-07 18:23 ` Eli Zaretskii
@ 2020-02-08  5:47   ` Paul W. Rankin
  2020-02-08  8:18     ` Eli Zaretskii
  0 siblings, 1 reply; 6+ messages in thread
From: Paul W. Rankin @ 2020-02-08  5:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 39483


On Sat, Feb 08 2020, Eli Zaretskii wrote:

>> From: "Paul W. Rankin" <hello@paulwrankin.com>
>> Attempts at workarounds:
>> 
>> I've tried altering slot 3 of the corresponding `ispell-dictionary-base-alist'
>> entries from "[']" to "['’]" to no avail.
>
> That's the right direction, but you didn't follow it far enough.
> First, ispell-dictionary-base-alist is the default value, and is used
> to produce ispell-dictionary-alist, which is one you should change
> (alternatively, customize ispell-local-dictionary-alist).

Thanks, that got it.

I'd discussed this on #emacs IRC and the consensus was to report. Lead
astray!!





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#39483: 27.0.60; ispell ignores syntax/category tables word boundaries
  2020-02-08  5:47   ` Paul W. Rankin
@ 2020-02-08  8:18     ` Eli Zaretskii
  2020-02-08  9:28       ` Paul W. Rankin
  0 siblings, 1 reply; 6+ messages in thread
From: Eli Zaretskii @ 2020-02-08  8:18 UTC (permalink / raw)
  To: Paul W. Rankin; +Cc: 39483

> From: "Paul W. Rankin" <hello@paulwrankin.com>
> Cc: 39483@debbugs.gnu.org
> Date: Sat, 08 Feb 2020 15:47:27 +1000
> 
> On Sat, Feb 08 2020, Eli Zaretskii wrote:
> 
> >> From: "Paul W. Rankin" <hello@paulwrankin.com>
> >> Attempts at workarounds:
> >> 
> >> I've tried altering slot 3 of the corresponding `ispell-dictionary-base-alist'
> >> entries from "[']" to "['’]" to no avail.
> >
> > That's the right direction, but you didn't follow it far enough.
> > First, ispell-dictionary-base-alist is the default value, and is used
> > to produce ispell-dictionary-alist, which is one you should change
> > (alternatively, customize ispell-local-dictionary-alist).
> 
> Thanks, that got it.

I'd be interested to see your solution in full, for the record.

Thanks.





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#39483: 27.0.60; ispell ignores syntax/category tables word boundaries
  2020-02-08  8:18     ` Eli Zaretskii
@ 2020-02-08  9:28       ` Paul W. Rankin
  2020-02-08 10:06         ` Eli Zaretskii
  0 siblings, 1 reply; 6+ messages in thread
From: Paul W. Rankin @ 2020-02-08  9:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 39483


On Sat, Feb 08 2020, Eli Zaretskii wrote:
>> >> From: "Paul W. Rankin" <hello@paulwrankin.com>
>> >> Attempts at workarounds:
>> >>
>> >> I've tried altering slot 3 of the corresponding `ispell-dictionary-base-alist'
>> >> entries from "[']" to "['’]" to no avail.
>> >
>> > That's the right direction, but you didn't follow it far enough.
>> > First, ispell-dictionary-base-alist is the default value, and is used
>> > to produce ispell-dictionary-alist, which is one you should change
>> > (alternatively, customize ispell-local-dictionary-alist).
>>
>> Thanks, that got it.
>
> I'd be interested to see your solution in full, for the record.

I went down the wrong path with syntax tables when I saw M-f/M-b was
stepping through the word like doesn|’|t| so I figured it was about word
boundaries. Searching through the manual I couldn't find anything in
"(emacs) Quotation Marks" or "(emacs) Spelling" but found the references
to syntax tables regarding word boundaries in "(elisp) Word Motion".

As it turns out it was just a case of customising
ispell-local-dictionary-alist and adding both a default and "en_US"
entry with OTHERCHARS regexp as "['’]" pretty much exactly as the
docstring on ispell-dictionary-alist says.





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#39483: 27.0.60; ispell ignores syntax/category tables word boundaries
  2020-02-08  9:28       ` Paul W. Rankin
@ 2020-02-08 10:06         ` Eli Zaretskii
  0 siblings, 0 replies; 6+ messages in thread
From: Eli Zaretskii @ 2020-02-08 10:06 UTC (permalink / raw)
  To: Paul W. Rankin; +Cc: 39483-done

> From: "Paul W. Rankin" <hello@paulwrankin.com>
> Cc: 39483@debbugs.gnu.org
> Date: Sat, 08 Feb 2020 19:28:40 +1000
> 
> As it turns out it was just a case of customising
> ispell-local-dictionary-alist and adding both a default and "en_US"
> entry with OTHERCHARS regexp as "['’]" pretty much exactly as the
> docstring on ispell-dictionary-alist says.

OK, thanks.  With that, I'm closing the bug report.





^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-02-08 10:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-07 15:44 bug#39483: 27.0.60; ispell ignores syntax/category tables word boundaries Paul W. Rankin
2020-02-07 18:23 ` Eli Zaretskii
2020-02-08  5:47   ` Paul W. Rankin
2020-02-08  8:18     ` Eli Zaretskii
2020-02-08  9:28       ` Paul W. Rankin
2020-02-08 10:06         ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).