unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#17437: 24.3; ispell uses typographically correct apostrophe as word boundary
@ 2014-05-08 12:15 Tobias Getzner
  2014-05-08 17:38 ` Agustin Martin
  2014-07-22  9:42 ` bug#17437: 24.3; ispell uses typographically correct apostrophe Tobias Getzner
  0 siblings, 2 replies; 5+ messages in thread
From: Tobias Getzner @ 2014-05-08 12:15 UTC (permalink / raw)
  To: 17437


When using the typographically correct apostrophe (“right single
quotation mark” U+2019), ispell will mark-up parts of words as typos.
E.g., in “doesn’t”, the part before the apostrophe will be highlighted
as a typo even if the spell-checker supports the apostrophe.

This bug occurs irrespective of the spell-checker, so I suppose that
ispell does its own tokenization and uses the apostrophe as a word
boundary. Instead, the apostrophe should correctly be treated as
word-internal punctuation and handed on to the actual spell-checker
program.

Best regards,
Tobias
 





^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#17437: 24.3; ispell uses typographically correct apostrophe as word boundary
  2014-05-08 12:15 bug#17437: 24.3; ispell uses typographically correct apostrophe as word boundary Tobias Getzner
@ 2014-05-08 17:38 ` Agustin Martin
  2014-05-09  5:07   ` Tobias Getzner
  2014-07-22  9:42 ` bug#17437: 24.3; ispell uses typographically correct apostrophe Tobias Getzner
  1 sibling, 1 reply; 5+ messages in thread
From: Agustin Martin @ 2014-05-08 17:38 UTC (permalink / raw)
  To: Tobias Getzner, 17437

On Thu, May 08, 2014 at 02:15:17PM +0200, Tobias Getzner wrote:
> 
> When using the typographically correct apostrophe (“right single
> quotation mark” U+2019), ispell will mark-up parts of words as typos.
> E.g., in “doesn’t”, the part before the apostrophe will be highlighted
> as a typo even if the spell-checker supports the apostrophe.
> 
> This bug occurs irrespective of the spell-checker, so I suppose that
> ispell does its own tokenization and uses the apostrophe as a word
> boundary. Instead, the apostrophe should correctly be treated as
> word-internal punctuation and handed on to the actual spell-checker
> program.

Which language are you using? Whether the apostrophe is or not a wordchar
depends on the language. By the way, "doesn't" is working well here with
aspell+american.

-- 
Agustin





^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#17437: 24.3; ispell uses typographically correct apostrophe as word boundary
  2014-05-08 17:38 ` Agustin Martin
@ 2014-05-09  5:07   ` Tobias Getzner
  0 siblings, 0 replies; 5+ messages in thread
From: Tobias Getzner @ 2014-05-09  5:07 UTC (permalink / raw)
  To: Agustin Martin; +Cc: 17437

Hello Agustin,

> Which language are you using? Whether the apostrophe is or not a
> wordchar depends on the language. By the way, "doesn't" is working
> well here with aspell+american.

Please note that the bug is not about the single quote apostrophe,
U+0027, but concerns the typographically correct apostrophe, U+2019.

Both hunspell and aspell support it in recent versions, but Emacs
fails to correctly hand over words containing the typographical
apostrophe.

Regards,
Tobias





^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#17437: 24.3; ispell uses typographically correct apostrophe
  2014-05-08 12:15 bug#17437: 24.3; ispell uses typographically correct apostrophe as word boundary Tobias Getzner
  2014-05-08 17:38 ` Agustin Martin
@ 2014-07-22  9:42 ` Tobias Getzner
  2019-12-06 10:59   ` Alan Third
  1 sibling, 1 reply; 5+ messages in thread
From: Tobias Getzner @ 2014-07-22  9:42 UTC (permalink / raw)
  To: 17437

> More details are needed here, e.g., what language is the reporter using?

I suppose you are referring to the selected ispell dictionary? I am not
explicitly setting the ispell dictionary, so (I presume) ispell.el will
not pass a dictionary to hunspell, which will accordingly use the
default one for my locale, i. e., en_US. While some dictionaries have
issues with U+2019 (and in fact most still are encoded in latin-1 :-/),
I have added this character to WORDCHARS in my hunspell en_US
dictionary; hunspell now correctly recognize words using this character
when invoking hunspell in a terminal. Sadly, it seems ispell.el sadly
still won’t handle these, however.

The problem seems to be that ispell.el still thinks that U+2019 is a
word boundary and doesn’t pass the whole word on to the spell checker.
Is this likely? Looking at ispell.el, it looks like it is doing word
boundary parsing on its own‽ If so, U+2019 should be treated as a
word-character when it appears in the context of two alphabetical
characters (at least for most western languages).

Best regards,
Tobias







^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#17437: 24.3; ispell uses typographically correct apostrophe
  2014-07-22  9:42 ` bug#17437: 24.3; ispell uses typographically correct apostrophe Tobias Getzner
@ 2019-12-06 10:59   ` Alan Third
  0 siblings, 0 replies; 5+ messages in thread
From: Alan Third @ 2019-12-06 10:59 UTC (permalink / raw)
  To: Tobias Getzner; +Cc: 17437-done

Tobias Getzner <tobias.getzner@gmx.de> writes:

> I have added this character to WORDCHARS in my hunspell en_US
> dictionary; hunspell now correctly recognize words using this character
> when invoking hunspell in a terminal. Sadly, it seems ispell.el sadly
> still won’t handle these, however.

I believe this has been fixed since this bug report was raised. Emacs is
now able to scan hunspell's dictionary files and make use of WORDCHARS.
I can't remember if it's fixed in Emacs 25, but it's definitely fixed in
26.

I'll close this bug report, but if you're still experiencing the problem
please reply and we can reopen it.
-- 
Alan Third





^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-12-06 10:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-08 12:15 bug#17437: 24.3; ispell uses typographically correct apostrophe as word boundary Tobias Getzner
2014-05-08 17:38 ` Agustin Martin
2014-05-09  5:07   ` Tobias Getzner
2014-07-22  9:42 ` bug#17437: 24.3; ispell uses typographically correct apostrophe Tobias Getzner
2019-12-06 10:59   ` Alan Third

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).