unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Joseph Mingrone <jrm@ftfl.ca>
Cc: 19653@debbugs.gnu.org
Subject: bug#19653: ispell misalignment with hunspell when Unicode apostrophe is used
Date: Fri, 21 Oct 2016 10:33:10 +0300	[thread overview]
Message-ID: <83mvhyrwax.fsf@gnu.org> (raw)
In-Reply-To: <86wph2z405.fsf@phe.ftfl.ca> (message from Joseph Mingrone on Fri, 21 Oct 2016 02:04:58 -0300)

> From: Joseph Mingrone <jrm@ftfl.ca>
> Date: Fri, 21 Oct 2016 02:04:58 -0300
> 
> This still seems to be a problem with hunspell version 1.3.3.
> 
> The problem can be reproduced by spell checking a file with this one line.
> 
> alsdk ✅ sdfkjdsf sldksdfkjsfd
> 
> During spell checking, the process list shows:
> 
> ispell run -- -- /usr/local/bin/hunspell -a -d en_CA -i UTF-8
> 
> The error Emacs (version 25.1.1) reports is:
> 
> ispell-process-line: Ispell misalignment: word ‘sdfkjdsf’ point 11; probably incompatible versions

Did Hunspell ever fix the problem whereby it reported byte offsets of
the misspelled words, as opposed to character offsets?  If not, that
is your problem, and Hunspell should finally get its act together.

To see whether this is the problem, invoke Hunspell like this:

  /usr/local/bin/hunspell -a -d en_CA -i UTF-8 < test.txt

and see what Hunspell emits.  It should emit something like this (the
below is taken from my system, and I don't have the en_CA dictionary,
so your output might be slightly different):

  @(#) International Ispell Version 3.2.06 (but really Hunspell 1.3.2)
  & alsdk 3 0: Alaska, elastic, Alston
  & sdfkjdsf 2 8: artefact's, postfix
  & sldksdfkjsfd 2 17: justification, staphylococcus

The second number after each misspelled word is the offset of that
word's beginning, measured in characters, from the start of the line.
Hunspell used to report this in bytes instead of characters; if it
still does, you will have to patch it to fix that bug.  AFAIR, the
Hunspell issue tracker includes several patches for this bug.  Or
maybe the latest Hunspell 1.4.1 already fixes this, in which case
please upgrade.





  reply	other threads:[~2016-10-21  7:33 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <8660om1en7.fsf@phe.ftfl.ca>
2016-10-21  5:04 ` bug#19653: ispell misalignment with hunspell when Unicode apostrophe is used Joseph Mingrone
2016-10-21  7:33   ` Eli Zaretskii [this message]
2016-10-21 12:59     ` Joseph Mingrone
2016-10-21 14:52       ` Eli Zaretskii
2015-01-22 14:40 Tobias Getzner
2015-01-22 17:41 ` Eli Zaretskii
2015-12-26 16:53 ` Lars Ingebrigtsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83mvhyrwax.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=19653@debbugs.gnu.org \
    --cc=jrm@ftfl.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).