From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Joseph Mingrone Newsgroups: gmane.emacs.bugs Subject: bug#19653: ispell misalignment with hunspell when Unicode apostrophe is used Date: Fri, 21 Oct 2016 09:59:57 -0300 Message-ID: <86shrpzwky.fsf@phe.ftfl.ca> References: <8660om1en7.fsf@phe.ftfl.ca> <86wph2z405.fsf@phe.ftfl.ca> <83mvhyrwax.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha512; protocol="application/pgp-signature" X-Trace: blaine.gmane.org 1477054891 32693 195.159.176.226 (21 Oct 2016 13:01:31 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 21 Oct 2016 13:01:31 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (berkeley-unix) Cc: 19653@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Fri Oct 21 15:01:24 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bxZRi-0006fg-KE for geb-bug-gnu-emacs@m.gmane.org; Fri, 21 Oct 2016 15:01:14 +0200 Original-Received: from localhost ([::1]:60672 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bxZRk-0007tn-Jn for geb-bug-gnu-emacs@m.gmane.org; Fri, 21 Oct 2016 09:01:16 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:44591) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bxZRc-0007tF-LE for bug-gnu-emacs@gnu.org; Fri, 21 Oct 2016 09:01:12 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bxZRW-0003Ax-OX for bug-gnu-emacs@gnu.org; Fri, 21 Oct 2016 09:01:08 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:57102) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1bxZRW-0003At-LN for bug-gnu-emacs@gnu.org; Fri, 21 Oct 2016 09:01:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1bxZRW-0002uF-7n for bug-gnu-emacs@gnu.org; Fri, 21 Oct 2016 09:01:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Joseph Mingrone Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 21 Oct 2016 13:01:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 19653 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: moreinfo Original-Received: via spool by 19653-submit@debbugs.gnu.org id=B19653.147705480911102 (code B ref 19653); Fri, 21 Oct 2016 13:01:02 +0000 Original-Received: (at 19653) by debbugs.gnu.org; 21 Oct 2016 13:00:09 +0000 Original-Received: from localhost ([127.0.0.1]:44265 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bxZQd-0002sy-ML for submit@debbugs.gnu.org; Fri, 21 Oct 2016 09:00:09 -0400 Original-Received: from mail-qt0-f178.google.com ([209.85.216.178]:34416) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bxZQb-0002r6-Mn for 19653@debbugs.gnu.org; Fri, 21 Oct 2016 09:00:06 -0400 Original-Received: by mail-qt0-f178.google.com with SMTP id q7so85379717qtq.1 for <19653@debbugs.gnu.org>; Fri, 21 Oct 2016 06:00:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ftfl.ca; s=google; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=jLyNz510VhgVqSzrmaBcs197K6gZ/rpVjW+oPah6FA8=; b=gYnyOVVX+nN5BdJgiNJc/DJvI0KYN1kK25E4J9NZKARMowXOMzFbVqVUg59CvtCVTm AadX/Bvd4LX/CRDcD2tYyh+JCgZ6XW51gPzLfwK1jNR5pe47F2ufGvd726kH/DBpFYzO xeEvJNULGCNgvJ0inH4l7zSZPVotkqg4XjVr4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version; bh=jLyNz510VhgVqSzrmaBcs197K6gZ/rpVjW+oPah6FA8=; b=iKPpwRjRAeRYlW9p91NmQOE1JnnJYsn+hBA9u125yYLQaHl56Jj49CcD4htZdckD6H 1/M8I4eJPlk63gQ6a43ePRdWsMNgR/iBKyp24gWRRfLR2NuwJG5T7bwHtvMFIa4Cxe6L uP6qXoenyoF5gficZ7u1X2ohX/4reJz6koRaKWY4Z++3aPfYrAqCzwv3rBNhhLD1RUr2 fewDsc63HK6hJQ/BivzU6xCjixxqUr50bW65AcjSqjZ60KQTN70Y41Di42ZdoraJY9+5 xQa9rSiyJNte/tlnkmJl5qrCDUoBONzkNNgYf2BXA6nkj7/GHeOKRamtVxI/zX3MDpHC w6CA== X-Gm-Message-State: ABUngvc0ShdEZXE0XGRhmTvoo0cjIjoqMhRxSvVC/Jtkgr7gBRkIuXpLuENdaeZe01jTcg== X-Received: by 10.237.41.196 with SMTP id o62mr887466qtd.122.1477054800053; Fri, 21 Oct 2016 06:00:00 -0700 (PDT) Original-Received: from phe.ftfl.ca.ftfl.ca (hlfxns017vw-142068082201.dhcp-dynamic.FibreOp.ns.bellaliant.net. [142.68.82.201]) by smtp.gmail.com with ESMTPSA id h6sm1256561qkd.38.2016.10.21.05.59.59 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 21 Oct 2016 05:59:59 -0700 (PDT) In-Reply-To: <83mvhyrwax.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 21 Oct 2016 10:33:10 +0300") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:124769 Archived-At: --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Eli Zaretskii writes: >> From: Joseph Mingrone >> Date: Fri, 21 Oct 2016 02:04:58 -0300 >> This still seems to be a problem with hunspell version 1.3.3. >> The problem can be reproduced by spell checking a file with this one lin= e. >> alsdk =E2=9C=85 sdfkjdsf sldksdfkjsfd >> During spell checking, the process list shows: >> ispell run -- -- /usr/local/bin/hunspell -a -d en_CA -i UTF-8 >> The error Emacs (version 25.1.1) reports is: >> ispell-process-line: Ispell misalignment: word =E2=80=98sdfkjdsf=E2=80= =99 point 11; probably incompatible versions > Did Hunspell ever fix the problem whereby it reported byte offsets of > the misspelled words, as opposed to character offsets? If not, that > is your problem, and Hunspell should finally get its act together. > To see whether this is the problem, invoke Hunspell like this: > /usr/local/bin/hunspell -a -d en_CA -i UTF-8 < test.txt > and see what Hunspell emits. It should emit something like this (the > below is taken from my system, and I don't have the en_CA dictionary, > so your output might be slightly different): > @(#) International Ispell Version 3.2.06 (but really Hunspell 1.3.2) > & alsdk 3 0: Alaska, elastic, Alston > & sdfkjdsf 2 8: artefact's, postfix > & sldksdfkjsfd 2 17: justification, staphylococcus > The second number after each misspelled word is the offset of that > word's beginning, measured in characters, from the start of the line. > Hunspell used to report this in bytes instead of characters; if it > still does, you will have to patch it to fix that bug. AFAIR, the > Hunspell issue tracker includes several patches for this bug. Or > maybe the latest Hunspell 1.4.1 already fixes this, in which case > please upgrade. It's still a problem with hunspell. % echo "=C3=A9 startingCharTwo" | hunspell -a -d en_CA -i UTF-8 @(#) International Ispell Version 3.2.06 (but really Hunspell 1.3.3) & =C3=A9 15 0: e, s, i, a, n, r, t, o, l, c, d, u, g, m, p & startingCharTwo 1 3: nonparticipating https://github.com/hunspell/hunspell/issues/418 --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQJ8BAEBCgBmBQJYChFNXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQ1NUIwOTNBNzI2QzM4ODU1NzEyMkJBRDUz NkE0MEM4M0IwRDZFRjlFAAoJEDakDIOw1u+eQmAP/3nNLN/6v7lbh3dz9onnYmRl W3RZnc8hOVwNcy5ja7crpw/c3FldB3I+zfa5Vo7Go3SDJc2V1NJ55pcpHsUtMw3H 94Zqe2WsEyeqlCiAyQTXb1JghEyBWvd7pccnQYn42qxHFy6zFC0lybadzJKXLCYy WpXqewPja8mOQqOBTzmespYooufez2aV2Y7Frk+LGVGltb/KylkNBpBwqGod90H/ ufFUpVR3CQaTmd27b94234KomFR4EFQHOCLfI1ENE9RiEdKKf+e5E1KM5oUr9JUP KxuOeF3eD5uhljRjtOYMQZT/lkCSP6UEfb0la+v683J0RPABvoAg6t32+f3Ol0BW RNOl8H7SWZHnnj/VcXfHh5m58WZ9dwRbxxEuyPe3f/D1QX/JlRe5ss8mtWYeyk3Q /fFD0B10vguCSPRGpgY7TlsG/2xKxXSGRAYsZ0tgE9helYsKasFkO96qcsb86VUX 96MaRgmVrUorflBM5aD6nbEwevjhLN1F7MLIV1BTbGZR/lqrlSVzTqupk5lS2K0t lnzvhUTH565a4SbAVv/yuNIwGJuwfbsLvxqyyMDudIhyc2+HS4R46eQDS8UYOF8J b1pWPdE3edRsqE5/+duWJaPRecoP8cIDjI4+2HrSyL0M8mem4nWWRzUT5wvbFbfj AkzK2M/1Hw4voqxDOp9K =wyZO -----END PGP SIGNATURE----- --=-=-=--