* bug#19653: ispell misalignment with hunspell when Unicode apostrophe is used
@ 2015-01-22 14:40 Tobias Getzner
2015-01-22 17:41 ` Eli Zaretskii
2015-12-26 16:53 ` Lars Ingebrigtsen
0 siblings, 2 replies; 7+ messages in thread
From: Tobias Getzner @ 2015-01-22 14:40 UTC (permalink / raw)
To: 19653
Hello,
I’ve noticed that when ispell.el (Emacs 24.4.1) is using hunspell (v.
1.3.3) to spell-check a buffer containing the typographically correct
apostrophe («’»; U+2019), ispell will error out with the message
«ispell misalignment».
The problem can be reproduced by setting ispell-program-name to
«hunspell», and spell-checking a buffer containing the string «abc’s
zzz». This yields the following error:
> ispell-process-line: Ispell misalignment: word `zzz' point 9; probably incompatible versions
This seems to be a regression from 24.3, where hunspell support was
working (with the caveat that the apostrophe had to be manually added
to the dictionary’s «OTHERCHARS»).
Best regards,
Tobias
^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#19653: ispell misalignment with hunspell when Unicode apostrophe is used 2015-01-22 14:40 bug#19653: ispell misalignment with hunspell when Unicode apostrophe is used Tobias Getzner @ 2015-01-22 17:41 ` Eli Zaretskii 2015-12-26 16:53 ` Lars Ingebrigtsen 1 sibling, 0 replies; 7+ messages in thread From: Eli Zaretskii @ 2015-01-22 17:41 UTC (permalink / raw) To: Tobias Getzner; +Cc: 19653 > From: Tobias Getzner <tobias.getzner@gmx.de> > Date: Thu, 22 Jan 2015 15:40:05 +0100 > > I’ve noticed that when ispell.el (Emacs 24.4.1) is using hunspell (v. > 1.3.3) to spell-check a buffer containing the typographically correct > apostrophe («’»; U+2019), ispell will error out with the message > «ispell misalignment». > > The problem can be reproduced by setting ispell-program-name to > «hunspell», and spell-checking a buffer containing the string «abc’s > zzz». This yields the following error: > > > ispell-process-line: Ispell misalignment: word `zzz' point 9; probably incompatible versions I cannot reproduce this with Emacs 24.4 and Hunspell 1.3.2 (heavily patched to fix known problems in Hunspell). You didn't provide enough information for me to be sure I did the same as you, so here are the possible explanations for the different experience: . I use a different version of Hunspell, and yours has a bug. Hunspell is known to have a problem with reporting mis-spelled words with byte offsets, whereas Emacs expects character offsets, so dictionaries encoded in UTF-8 cause symptoms similar to those you report. My Hunspell is patched to avoid this problem. . I didn't change OTHERCHARS. Frankly, I think doing this asks for trouble, since the speller still uses the characters recorded in the .aff file. . You didn't tell which dictionary you used. I tried en_US and de_DE, and none of them produced these problems. Maybe this is specific to some dictionary you used. In particular, the encoding of that dictionary is important vs the encoding you tell ispell.el to use (if you customized that part). > This seems to be a regression from 24.3, where hunspell support was > working (with the caveat that the apostrophe had to be manually added > to the dictionary’s «OTHERCHARS»). Are you saying that the same version of Hunspell with the same dictionary worked in Emacs 24.3, where Emacs 24.4 doesn't? If so, please try to eliminate or at least minimize your ispell-related customizations, and try again. If the problem persists, please show the minimal set of customizations to reproduce the problem. ^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#19653: ispell misalignment with hunspell when Unicode apostrophe is used 2015-01-22 14:40 bug#19653: ispell misalignment with hunspell when Unicode apostrophe is used Tobias Getzner 2015-01-22 17:41 ` Eli Zaretskii @ 2015-12-26 16:53 ` Lars Ingebrigtsen 1 sibling, 0 replies; 7+ messages in thread From: Lars Ingebrigtsen @ 2015-12-26 16:53 UTC (permalink / raw) To: Tobias Getzner; +Cc: 19653 Tobias Getzner <tobias.getzner@gmx.de> writes: > I’ve noticed that when ispell.el (Emacs 24.4.1) is using hunspell (v. > 1.3.3) to spell-check a buffer containing the typographically correct > apostrophe («’»; U+2019), ispell will error out with the message > «ispell misalignment». There was an earlier similar report where the conclusion was that hunspell was buggy, but a new version of hunspell fixed the problem... -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <8660om1en7.fsf@phe.ftfl.ca>]
* bug#19653: ispell misalignment with hunspell when Unicode apostrophe is used [not found] <8660om1en7.fsf@phe.ftfl.ca> @ 2016-10-21 5:04 ` Joseph Mingrone 2016-10-21 7:33 ` Eli Zaretskii 0 siblings, 1 reply; 7+ messages in thread From: Joseph Mingrone @ 2016-10-21 5:04 UTC (permalink / raw) To: 19653 [-- Attachment #1: Type: text/plain, Size: 780 bytes --] This still seems to be a problem with hunspell version 1.3.3. The problem can be reproduced by spell checking a file with this one line. alsdk ✅ sdfkjdsf sldksdfkjsfd During spell checking, the process list shows: ispell run -- -- /usr/local/bin/hunspell -a -d en_CA -i UTF-8 The error Emacs (version 25.1.1) reports is: ispell-process-line: Ispell misalignment: word ‘sdfkjdsf’ point 11; probably incompatible versions Hunspell skips over the special character when it is run at a terminal prompt. This is the initial output. ### begin hunspell output ### alsdk File: test.txt alsdk \~E sdfkjdsf sldksdfkjsfd 0: Alaska [SPACE] R)epl A)ccept I)nsert U)ncap S)tem Q)uit e(X)it or ? for help ### end hunspell output ### [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 930 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#19653: ispell misalignment with hunspell when Unicode apostrophe is used 2016-10-21 5:04 ` Joseph Mingrone @ 2016-10-21 7:33 ` Eli Zaretskii 2016-10-21 12:59 ` Joseph Mingrone 0 siblings, 1 reply; 7+ messages in thread From: Eli Zaretskii @ 2016-10-21 7:33 UTC (permalink / raw) To: Joseph Mingrone; +Cc: 19653 > From: Joseph Mingrone <jrm@ftfl.ca> > Date: Fri, 21 Oct 2016 02:04:58 -0300 > > This still seems to be a problem with hunspell version 1.3.3. > > The problem can be reproduced by spell checking a file with this one line. > > alsdk ✅ sdfkjdsf sldksdfkjsfd > > During spell checking, the process list shows: > > ispell run -- -- /usr/local/bin/hunspell -a -d en_CA -i UTF-8 > > The error Emacs (version 25.1.1) reports is: > > ispell-process-line: Ispell misalignment: word ‘sdfkjdsf’ point 11; probably incompatible versions Did Hunspell ever fix the problem whereby it reported byte offsets of the misspelled words, as opposed to character offsets? If not, that is your problem, and Hunspell should finally get its act together. To see whether this is the problem, invoke Hunspell like this: /usr/local/bin/hunspell -a -d en_CA -i UTF-8 < test.txt and see what Hunspell emits. It should emit something like this (the below is taken from my system, and I don't have the en_CA dictionary, so your output might be slightly different): @(#) International Ispell Version 3.2.06 (but really Hunspell 1.3.2) & alsdk 3 0: Alaska, elastic, Alston & sdfkjdsf 2 8: artefact's, postfix & sldksdfkjsfd 2 17: justification, staphylococcus The second number after each misspelled word is the offset of that word's beginning, measured in characters, from the start of the line. Hunspell used to report this in bytes instead of characters; if it still does, you will have to patch it to fix that bug. AFAIR, the Hunspell issue tracker includes several patches for this bug. Or maybe the latest Hunspell 1.4.1 already fixes this, in which case please upgrade. ^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#19653: ispell misalignment with hunspell when Unicode apostrophe is used 2016-10-21 7:33 ` Eli Zaretskii @ 2016-10-21 12:59 ` Joseph Mingrone 2016-10-21 14:52 ` Eli Zaretskii 0 siblings, 1 reply; 7+ messages in thread From: Joseph Mingrone @ 2016-10-21 12:59 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 19653 [-- Attachment #1: Type: text/plain, Size: 2111 bytes --] Eli Zaretskii <eliz@gnu.org> writes: >> From: Joseph Mingrone <jrm@ftfl.ca> >> Date: Fri, 21 Oct 2016 02:04:58 -0300 >> This still seems to be a problem with hunspell version 1.3.3. >> The problem can be reproduced by spell checking a file with this one line. >> alsdk ✅ sdfkjdsf sldksdfkjsfd >> During spell checking, the process list shows: >> ispell run -- -- /usr/local/bin/hunspell -a -d en_CA -i UTF-8 >> The error Emacs (version 25.1.1) reports is: >> ispell-process-line: Ispell misalignment: word ‘sdfkjdsf’ point 11; probably incompatible versions > Did Hunspell ever fix the problem whereby it reported byte offsets of > the misspelled words, as opposed to character offsets? If not, that > is your problem, and Hunspell should finally get its act together. > To see whether this is the problem, invoke Hunspell like this: > /usr/local/bin/hunspell -a -d en_CA -i UTF-8 < test.txt > and see what Hunspell emits. It should emit something like this (the > below is taken from my system, and I don't have the en_CA dictionary, > so your output might be slightly different): > @(#) International Ispell Version 3.2.06 (but really Hunspell 1.3.2) > & alsdk 3 0: Alaska, elastic, Alston > & sdfkjdsf 2 8: artefact's, postfix > & sldksdfkjsfd 2 17: justification, staphylococcus > The second number after each misspelled word is the offset of that > word's beginning, measured in characters, from the start of the line. > Hunspell used to report this in bytes instead of characters; if it > still does, you will have to patch it to fix that bug. AFAIR, the > Hunspell issue tracker includes several patches for this bug. Or > maybe the latest Hunspell 1.4.1 already fixes this, in which case > please upgrade. It's still a problem with hunspell. % echo "é startingCharTwo" | hunspell -a -d en_CA -i UTF-8 @(#) International Ispell Version 3.2.06 (but really Hunspell 1.3.3) & é 15 0: e, s, i, a, n, r, t, o, l, c, d, u, g, m, p & startingCharTwo 1 3: nonparticipating https://github.com/hunspell/hunspell/issues/418 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 930 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#19653: ispell misalignment with hunspell when Unicode apostrophe is used 2016-10-21 12:59 ` Joseph Mingrone @ 2016-10-21 14:52 ` Eli Zaretskii 0 siblings, 0 replies; 7+ messages in thread From: Eli Zaretskii @ 2016-10-21 14:52 UTC (permalink / raw) To: Joseph Mingrone; +Cc: 19653 > From: Joseph Mingrone <jrm@ftfl.ca> > Cc: 19653@debbugs.gnu.org > Date: Fri, 21 Oct 2016 09:59:57 -0300 > > > @(#) International Ispell Version 3.2.06 (but really Hunspell 1.3.2) > > & alsdk 3 0: Alaska, elastic, Alston > > & sdfkjdsf 2 8: artefact's, postfix > > & sldksdfkjsfd 2 17: justification, staphylococcus > > > The second number after each misspelled word is the offset of that > > word's beginning, measured in characters, from the start of the line. > > Hunspell used to report this in bytes instead of characters; if it > > still does, you will have to patch it to fix that bug. AFAIR, the > > Hunspell issue tracker includes several patches for this bug. Or > > maybe the latest Hunspell 1.4.1 already fixes this, in which case > > please upgrade. > > It's still a problem with hunspell. > > % echo "é startingCharTwo" | hunspell -a -d en_CA -i UTF-8 > @(#) International Ispell Version 3.2.06 (but really Hunspell 1.3.3) > & é 15 0: e, s, i, a, n, r, t, o, l, c, d, u, g, m, p > & startingCharTwo 1 3: nonparticipating > > https://github.com/hunspell/hunspell/issues/418 Thanks for checking. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-10-21 14:52 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-01-22 14:40 bug#19653: ispell misalignment with hunspell when Unicode apostrophe is used Tobias Getzner 2015-01-22 17:41 ` Eli Zaretskii 2015-12-26 16:53 ` Lars Ingebrigtsen [not found] <8660om1en7.fsf@phe.ftfl.ca> 2016-10-21 5:04 ` Joseph Mingrone 2016-10-21 7:33 ` Eli Zaretskii 2016-10-21 12:59 ` Joseph Mingrone 2016-10-21 14:52 ` Eli Zaretskii
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.