From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#19653: ispell misalignment with hunspell when Unicode apostrophe is used Date: Fri, 21 Oct 2016 10:33:10 +0300 Message-ID: <83mvhyrwax.fsf@gnu.org> References: <8660om1en7.fsf@phe.ftfl.ca> <86wph2z405.fsf@phe.ftfl.ca> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1477035263 1620 195.159.176.226 (21 Oct 2016 07:34:23 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 21 Oct 2016 07:34:23 +0000 (UTC) Cc: 19653@debbugs.gnu.org To: Joseph Mingrone Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Fri Oct 21 09:34:15 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bxULC-0007SM-LA for geb-bug-gnu-emacs@m.gmane.org; Fri, 21 Oct 2016 09:34:10 +0200 Original-Received: from localhost ([::1]:59297 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bxULE-0008Rz-VQ for geb-bug-gnu-emacs@m.gmane.org; Fri, 21 Oct 2016 03:34:12 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:50375) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bxUL8-0008Rj-Jh for bug-gnu-emacs@gnu.org; Fri, 21 Oct 2016 03:34:07 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bxUL4-0004v9-MV for bug-gnu-emacs@gnu.org; Fri, 21 Oct 2016 03:34:06 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:56252) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1bxUL4-0004v2-Ih for bug-gnu-emacs@gnu.org; Fri, 21 Oct 2016 03:34:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1bxUL4-0006km-Ch for bug-gnu-emacs@gnu.org; Fri, 21 Oct 2016 03:34:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 21 Oct 2016 07:34:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 19653 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: moreinfo Original-Received: via spool by 19653-submit@debbugs.gnu.org id=B19653.147703521225919 (code B ref 19653); Fri, 21 Oct 2016 07:34:02 +0000 Original-Received: (at 19653) by debbugs.gnu.org; 21 Oct 2016 07:33:32 +0000 Original-Received: from localhost ([127.0.0.1]:43418 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bxUKa-0006jz-7M for submit@debbugs.gnu.org; Fri, 21 Oct 2016 03:33:32 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:59324) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bxUKY-0006jk-PP for 19653@debbugs.gnu.org; Fri, 21 Oct 2016 03:33:30 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bxUKP-0004gZ-Ho for 19653@debbugs.gnu.org; Fri, 21 Oct 2016 03:33:25 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:32777) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bxUKP-0004gL-F2; Fri, 21 Oct 2016 03:33:21 -0400 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:2657 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1bxUKO-0006Jg-0h; Fri, 21 Oct 2016 03:33:20 -0400 In-reply-to: <86wph2z405.fsf@phe.ftfl.ca> (message from Joseph Mingrone on Fri, 21 Oct 2016 02:04:58 -0300) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:124762 Archived-At: > From: Joseph Mingrone > Date: Fri, 21 Oct 2016 02:04:58 -0300 > > This still seems to be a problem with hunspell version 1.3.3. > > The problem can be reproduced by spell checking a file with this one line. > > alsdk ✅ sdfkjdsf sldksdfkjsfd > > During spell checking, the process list shows: > > ispell run -- -- /usr/local/bin/hunspell -a -d en_CA -i UTF-8 > > The error Emacs (version 25.1.1) reports is: > > ispell-process-line: Ispell misalignment: word ‘sdfkjdsf’ point 11; probably incompatible versions Did Hunspell ever fix the problem whereby it reported byte offsets of the misspelled words, as opposed to character offsets? If not, that is your problem, and Hunspell should finally get its act together. To see whether this is the problem, invoke Hunspell like this: /usr/local/bin/hunspell -a -d en_CA -i UTF-8 < test.txt and see what Hunspell emits. It should emit something like this (the below is taken from my system, and I don't have the en_CA dictionary, so your output might be slightly different): @(#) International Ispell Version 3.2.06 (but really Hunspell 1.3.2) & alsdk 3 0: Alaska, elastic, Alston & sdfkjdsf 2 8: artefact's, postfix & sldksdfkjsfd 2 17: justification, staphylococcus The second number after each misspelled word is the offset of that word's beginning, measured in characters, from the start of the line. Hunspell used to report this in bytes instead of characters; if it still does, you will have to patch it to fix that bug. AFAIR, the Hunspell issue tracker includes several patches for this bug. Or maybe the latest Hunspell 1.4.1 already fixes this, in which case please upgrade.