From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Dmitry Alexandrov Newsgroups: gmane.emacs.help Subject: Re: Hunspell and contractions with apostrophes Date: Wed, 27 May 2020 03:23:16 +0300 Message-ID: References: <87y2pelh8t.fsf@ericabrahamsen.net> Mime-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="129136"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) Cc: help-gnu-emacs@gnu.org To: Eric Abrahamsen Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Wed May 27 02:25:22 2020 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jdjsm-000XY6-GJ for geh-help-gnu-emacs@m.gmane-mx.org; Wed, 27 May 2020 02:25:20 +0200 Original-Received: from localhost ([::1]:38386 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jdjsk-0002AS-Hw for geh-help-gnu-emacs@m.gmane-mx.org; Tue, 26 May 2020 20:25:18 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:54946) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jdjqz-0002AJ-5v for help-gnu-emacs@gnu.org; Tue, 26 May 2020 20:23:29 -0400 Original-Received: from relay-1.mailobj.net ([213.182.54.6]:43514) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jdjqy-0002Q5-1F for help-gnu-emacs@gnu.org; Tue, 26 May 2020 20:23:28 -0400 Original-Received: from v-1c.localdomain (unknown [192.168.90.161]) by relay-1.mailobj.net (Postfix) with SMTP id 1A71312D1; Wed, 27 May 2020 02:23:21 +0200 (CEST) Original-Received: by mail-1.net-c.com [213.182.54.15] with ESMTP Wed, 27 May 2020 02:23:20 +0200 (CEST) X-EA-Auth: vt7u5K2jXJU0Ae33+g+oMAjYR8S+wVC+dAqjRYod31CTzUcu67rcmUqy1wvDeTwR7LxfUuIRhpiPIAtJHB8LfHGihg/X+KjV In-Reply-To: <87y2pelh8t.fsf@ericabrahamsen.net> (Eric Abrahamsen's message of "Tue, 26 May 2020 10:48:34 -0700") Received-SPF: pass client-ip=213.182.54.6; envelope-from=dag@gnui.org; helo=relay-1.mailobj.net X-detected-operating-system: by eggs.gnu.org: First seen = 2020/05/26 20:23:22 X-ACL-Warn: Detected OS = Linux 3.1-3.10 X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.io gmane.emacs.help:123134 Archived-At: --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Eric Abrahamsen wrote: > I've battled with this for years now: Hunspell marks any contraction with= an apostrophe (eg the "I've" that starts this sentence) as misspelled. I=CA=BCd say that this is an obvious bug in the dictionary, that should be = reported. FWIW, it is present in Debian 10 as well: $ HOME=3D/tmp DICPATH=3D'' hunspell -d en_US Hunspell 1.7.0 I've * & ve 15 2: be, v, e, eve, vie, ave, vet, veg, Eve, Ave, vs, vi, re, me, he and I=CA=BCve never noticed it only because I use en_GB dictionary, which i= s fine: $ HOME=3D/tmp DICPATH=3D'' hunspell -d en_GB Hunspell 1.7.0 I've * > It used to be that I could edit /usr/share/hunspell/en_US.aff and add the= apostrophe to WORDCHARS (and also "ICONV =E2=80=99 '") First and foremost, your =E2=80=99 is *not* an apostrophe, it=CA=BCs a righ= t quote. Apostrophe is =CA=BC. This does matter, just check how do word-moving commands act on weird =E2= =80=9CI=E2=80=99ve=E2=80=9D vs proper =E2=80=9CI=CA=BCve=E2=80=9D and ascii= (but no less proper) =E2=80=9CI've=E2=80=9D. > and that would do it. Until the next time the hunspell package updated, a= nd over-wrote its config files (I'm running Arch linux), and I would have t= o do it again. Sure. You are not supposed to tamper with files under package management. = Put your customized dictionaries somewhere else (in /etc, in your home dir= ectory). I do not remember, whether hunspell(1) have any non-/usr paths ha= rdcoded, but these lines in my ~/.profile suggest, that it does not: if [ -z "$DICPATH" ]; then if [ -d '/usr/share/hunspell' ]; then DICPATH=3D'/usr/share/hunspell' fi fi =09 if [ -d "$HOME/.share/hunspell" ]; then DICPATH=3D"$HOME/.share/hunspell:$DICPATH" fi =09 if [ -d "$HOME/.local/share/hunspell" ]; then DICPATH=3D"$HOME/.local/share/hunspell:$DICPATH" fi =09 export DICPATH > As of six months or a year or so ago, that trick no longer works. It seems, that the question have changed in meanwhile. Whether a right sin= gle quote is recognized as apostrophe is orthogonal to whether =E2=80=9CI'v= e=E2=80=9D is recognized as a correct English word. Things like =E2=80=9CI've=E2=80=9D or =E2=80=9CI'm=E2=80=9D are normally ex= plicitly mentioned in the dictionary (since something like =E2=80=9Cpoint= =CA=BCve=E2=80=9D is not entirely okay, afaiu). If they are there, then double-check, that the used affix file does have ap= ostrophe among WORDCHARS: WORDCHARS 0123456789' (That=CA=BCs what is wrong with en_US.aff in Debian.) If =E2=80=9CI've=E2=80=9D had still been recognized as a mistake, that woul= d be pretty odd. Now to Unicode. Make sure, that the affix file correctly declares its encoding: SET UTF-8 Make Unicode apostrophe recognized as apostrophe: ICONV 1 ICONV =CA=BC ' Optionally, make Unicode preferred to ASCII one for suggestions: OCONV 1 OCONV ' =CA=BC Now to Emacs. If all of the above works with hunspell(1) itself, no configurations beside= s (setq ispell-program-name "hunspell") should be required. I you insist on using right single quote as apostrophe, though, I have no i= dea, how to make ispell.el pass it to hunspell(1) as a part of a word. Nei= ther why ever do that. --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iIMEARYIACsWIQRSX35grYEsI2F1K7TIsPhUjufz5wUCXs2y9A0cZGFnQGdudWku b3JnAAoJEMiw+FSO5/Pn3+gA/Rn/McLqKQTh30ILWBu2rXrVujz1XXm30aht+1hB iB7qAPwP4lyjY5dfPQJTLqxRQPZNke8F9o31DX3V6TEVYNt5Dg== =tPaX -----END PGP SIGNATURE----- --=-=-=--