unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#19653: ispell misalignment with hunspell when Unicode apostrophe is used
@ 2015-01-22 14:40 Tobias Getzner
  2015-01-22 17:41 ` Eli Zaretskii
  2015-12-26 16:53 ` Lars Ingebrigtsen
  0 siblings, 2 replies; 7+ messages in thread
From: Tobias Getzner @ 2015-01-22 14:40 UTC (permalink / raw)
  To: 19653

Hello,

I’ve noticed that when ispell.el (Emacs 24.4.1) is using hunspell (v.
1.3.3) to spell-check a buffer containing the typographically correct
apostrophe («’»; U+2019), ispell will error out with the message
«ispell misalignment».

The problem can be reproduced by setting ispell-program-name to
«hunspell», and spell-checking a buffer containing the string «abc’s
zzz». This yields the following error:

> ispell-process-line: Ispell misalignment: word `zzz' point 9; probably incompatible versions

This seems to be a regression from 24.3, where hunspell support was
working (with the caveat that the apostrophe had to be manually added
to the dictionary’s «OTHERCHARS»).

Best regards,
Tobias







^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#19653: ispell misalignment with hunspell when Unicode apostrophe is used
  2015-01-22 14:40 bug#19653: ispell misalignment with hunspell when Unicode apostrophe is used Tobias Getzner
@ 2015-01-22 17:41 ` Eli Zaretskii
  2015-12-26 16:53 ` Lars Ingebrigtsen
  1 sibling, 0 replies; 7+ messages in thread
From: Eli Zaretskii @ 2015-01-22 17:41 UTC (permalink / raw)
  To: Tobias Getzner; +Cc: 19653

> From: Tobias Getzner <tobias.getzner@gmx.de>
> Date: Thu, 22 Jan 2015 15:40:05 +0100
> 
> I’ve noticed that when ispell.el (Emacs 24.4.1) is using hunspell (v.
> 1.3.3) to spell-check a buffer containing the typographically correct
> apostrophe («’»; U+2019), ispell will error out with the message
> «ispell misalignment».
> 
> The problem can be reproduced by setting ispell-program-name to
> «hunspell», and spell-checking a buffer containing the string «abc’s
> zzz». This yields the following error:
> 
> > ispell-process-line: Ispell misalignment: word `zzz' point 9; probably incompatible versions

I cannot reproduce this with Emacs 24.4 and Hunspell 1.3.2 (heavily
patched to fix known problems in Hunspell).  You didn't provide enough
information for me to be sure I did the same as you, so here are the
possible explanations for the different experience:

 . I use a different version of Hunspell, and yours has a bug.
   Hunspell is known to have a problem with reporting mis-spelled
   words with byte offsets, whereas Emacs expects character offsets,
   so dictionaries encoded in UTF-8 cause symptoms similar to those
   you report.  My Hunspell is patched to avoid this problem.

 . I didn't change OTHERCHARS.  Frankly, I think doing this asks for
   trouble, since the speller still uses the characters recorded in
   the .aff file.

 . You didn't tell which dictionary you used.  I tried en_US and
   de_DE, and none of them produced these problems.  Maybe this is
   specific to some dictionary you used.  In particular, the encoding
   of that dictionary is important vs the encoding you tell ispell.el
   to use (if you customized that part).

> This seems to be a regression from 24.3, where hunspell support was
> working (with the caveat that the apostrophe had to be manually added
> to the dictionary’s «OTHERCHARS»).

Are you saying that the same version of Hunspell with the same
dictionary worked in Emacs 24.3, where Emacs 24.4 doesn't?  If so,
please try to eliminate or at least minimize your ispell-related
customizations, and try again.  If the problem persists, please show
the minimal set of customizations to reproduce the problem.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#19653: ispell misalignment with hunspell when Unicode apostrophe is used
  2015-01-22 14:40 bug#19653: ispell misalignment with hunspell when Unicode apostrophe is used Tobias Getzner
  2015-01-22 17:41 ` Eli Zaretskii
@ 2015-12-26 16:53 ` Lars Ingebrigtsen
  1 sibling, 0 replies; 7+ messages in thread
From: Lars Ingebrigtsen @ 2015-12-26 16:53 UTC (permalink / raw)
  To: Tobias Getzner; +Cc: 19653

Tobias Getzner <tobias.getzner@gmx.de> writes:

> I’ve noticed that when ispell.el (Emacs 24.4.1) is using hunspell (v.
> 1.3.3) to spell-check a buffer containing the typographically correct
> apostrophe («’»; U+2019), ispell will error out with the message
> «ispell misalignment».

There was an earlier similar report where the conclusion was that
hunspell was buggy, but a new version of hunspell fixed the problem...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#19653: ispell misalignment with hunspell when Unicode apostrophe is used
       [not found] <8660om1en7.fsf@phe.ftfl.ca>
@ 2016-10-21  5:04 ` Joseph Mingrone
  2016-10-21  7:33   ` Eli Zaretskii
  0 siblings, 1 reply; 7+ messages in thread
From: Joseph Mingrone @ 2016-10-21  5:04 UTC (permalink / raw)
  To: 19653

[-- Attachment #1: Type: text/plain, Size: 780 bytes --]

This still seems to be a problem with hunspell version 1.3.3.

The problem can be reproduced by spell checking a file with this one line.

alsdk ✅ sdfkjdsf sldksdfkjsfd

During spell checking, the process list shows:

ispell run -- -- /usr/local/bin/hunspell -a -d en_CA -i UTF-8

The error Emacs (version 25.1.1) reports is:

ispell-process-line: Ispell misalignment: word ‘sdfkjdsf’ point 11; probably incompatible versions

Hunspell skips over the special character when it is run at a terminal prompt.  This is the initial output.

### begin hunspell output ###
        alsdk           File: test.txt

alsdk \~E sdfkjdsf sldksdfkjsfd

 0: Alaska

[SPACE] R)epl A)ccept I)nsert U)ncap S)tem Q)uit e(X)it or ? for help
### end hunspell output ###

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 930 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#19653: ispell misalignment with hunspell when Unicode apostrophe is used
  2016-10-21  5:04 ` Joseph Mingrone
@ 2016-10-21  7:33   ` Eli Zaretskii
  2016-10-21 12:59     ` Joseph Mingrone
  0 siblings, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2016-10-21  7:33 UTC (permalink / raw)
  To: Joseph Mingrone; +Cc: 19653

> From: Joseph Mingrone <jrm@ftfl.ca>
> Date: Fri, 21 Oct 2016 02:04:58 -0300
> 
> This still seems to be a problem with hunspell version 1.3.3.
> 
> The problem can be reproduced by spell checking a file with this one line.
> 
> alsdk ✅ sdfkjdsf sldksdfkjsfd
> 
> During spell checking, the process list shows:
> 
> ispell run -- -- /usr/local/bin/hunspell -a -d en_CA -i UTF-8
> 
> The error Emacs (version 25.1.1) reports is:
> 
> ispell-process-line: Ispell misalignment: word ‘sdfkjdsf’ point 11; probably incompatible versions

Did Hunspell ever fix the problem whereby it reported byte offsets of
the misspelled words, as opposed to character offsets?  If not, that
is your problem, and Hunspell should finally get its act together.

To see whether this is the problem, invoke Hunspell like this:

  /usr/local/bin/hunspell -a -d en_CA -i UTF-8 < test.txt

and see what Hunspell emits.  It should emit something like this (the
below is taken from my system, and I don't have the en_CA dictionary,
so your output might be slightly different):

  @(#) International Ispell Version 3.2.06 (but really Hunspell 1.3.2)
  & alsdk 3 0: Alaska, elastic, Alston
  & sdfkjdsf 2 8: artefact's, postfix
  & sldksdfkjsfd 2 17: justification, staphylococcus

The second number after each misspelled word is the offset of that
word's beginning, measured in characters, from the start of the line.
Hunspell used to report this in bytes instead of characters; if it
still does, you will have to patch it to fix that bug.  AFAIR, the
Hunspell issue tracker includes several patches for this bug.  Or
maybe the latest Hunspell 1.4.1 already fixes this, in which case
please upgrade.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#19653: ispell misalignment with hunspell when Unicode apostrophe is used
  2016-10-21  7:33   ` Eli Zaretskii
@ 2016-10-21 12:59     ` Joseph Mingrone
  2016-10-21 14:52       ` Eli Zaretskii
  0 siblings, 1 reply; 7+ messages in thread
From: Joseph Mingrone @ 2016-10-21 12:59 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 19653

[-- Attachment #1: Type: text/plain, Size: 2111 bytes --]

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Joseph Mingrone <jrm@ftfl.ca>
>> Date: Fri, 21 Oct 2016 02:04:58 -0300

>> This still seems to be a problem with hunspell version 1.3.3.

>> The problem can be reproduced by spell checking a file with this one line.

>> alsdk ✅ sdfkjdsf sldksdfkjsfd

>> During spell checking, the process list shows:

>> ispell run -- -- /usr/local/bin/hunspell -a -d en_CA -i UTF-8

>> The error Emacs (version 25.1.1) reports is:

>> ispell-process-line: Ispell misalignment: word ‘sdfkjdsf’ point 11; probably incompatible versions

> Did Hunspell ever fix the problem whereby it reported byte offsets of
> the misspelled words, as opposed to character offsets?  If not, that
> is your problem, and Hunspell should finally get its act together.

> To see whether this is the problem, invoke Hunspell like this:

>   /usr/local/bin/hunspell -a -d en_CA -i UTF-8 < test.txt

> and see what Hunspell emits.  It should emit something like this (the
> below is taken from my system, and I don't have the en_CA dictionary,
> so your output might be slightly different):

>   @(#) International Ispell Version 3.2.06 (but really Hunspell 1.3.2)
>   & alsdk 3 0: Alaska, elastic, Alston
>   & sdfkjdsf 2 8: artefact's, postfix
>   & sldksdfkjsfd 2 17: justification, staphylococcus

> The second number after each misspelled word is the offset of that
> word's beginning, measured in characters, from the start of the line.
> Hunspell used to report this in bytes instead of characters; if it
> still does, you will have to patch it to fix that bug.  AFAIR, the
> Hunspell issue tracker includes several patches for this bug.  Or
> maybe the latest Hunspell 1.4.1 already fixes this, in which case
> please upgrade.

It's still a problem with hunspell.

% echo "é startingCharTwo" | hunspell -a -d en_CA -i UTF-8
@(#) International Ispell Version 3.2.06 (but really Hunspell 1.3.3)
& é 15 0: e, s, i, a, n, r, t, o, l, c, d, u, g, m, p
& startingCharTwo 1 3: nonparticipating

https://github.com/hunspell/hunspell/issues/418

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 930 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#19653: ispell misalignment with hunspell when Unicode apostrophe is used
  2016-10-21 12:59     ` Joseph Mingrone
@ 2016-10-21 14:52       ` Eli Zaretskii
  0 siblings, 0 replies; 7+ messages in thread
From: Eli Zaretskii @ 2016-10-21 14:52 UTC (permalink / raw)
  To: Joseph Mingrone; +Cc: 19653

> From: Joseph Mingrone <jrm@ftfl.ca>
> Cc: 19653@debbugs.gnu.org
> Date: Fri, 21 Oct 2016 09:59:57 -0300
> 
> >   @(#) International Ispell Version 3.2.06 (but really Hunspell 1.3.2)
> >   & alsdk 3 0: Alaska, elastic, Alston
> >   & sdfkjdsf 2 8: artefact's, postfix
> >   & sldksdfkjsfd 2 17: justification, staphylococcus
> 
> > The second number after each misspelled word is the offset of that
> > word's beginning, measured in characters, from the start of the line.
> > Hunspell used to report this in bytes instead of characters; if it
> > still does, you will have to patch it to fix that bug.  AFAIR, the
> > Hunspell issue tracker includes several patches for this bug.  Or
> > maybe the latest Hunspell 1.4.1 already fixes this, in which case
> > please upgrade.
> 
> It's still a problem with hunspell.
> 
> % echo "é startingCharTwo" | hunspell -a -d en_CA -i UTF-8
> @(#) International Ispell Version 3.2.06 (but really Hunspell 1.3.3)
> & é 15 0: e, s, i, a, n, r, t, o, l, c, d, u, g, m, p
> & startingCharTwo 1 3: nonparticipating
> 
> https://github.com/hunspell/hunspell/issues/418

Thanks for checking.






^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-10-21 14:52 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-22 14:40 bug#19653: ispell misalignment with hunspell when Unicode apostrophe is used Tobias Getzner
2015-01-22 17:41 ` Eli Zaretskii
2015-12-26 16:53 ` Lars Ingebrigtsen
     [not found] <8660om1en7.fsf@phe.ftfl.ca>
2016-10-21  5:04 ` Joseph Mingrone
2016-10-21  7:33   ` Eli Zaretskii
2016-10-21 12:59     ` Joseph Mingrone
2016-10-21 14:52       ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).