From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Reuben Thomas Newsgroups: gmane.emacs.bugs Subject: bug#17742: Acknowledgement (Support for enchant?) Date: Tue, 20 Dec 2016 21:43:32 +0000 Message-ID: References: <834m2hjbmr.fsf@gnu.org> <83bmwfbxaf.fsf@gnu.org> <837f73bqwv.fsf@gnu.org> <838trb6h7s.fsf@gnu.org> <834m1y4nj7.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=94eb2c05a83081663205441dedf1 X-Trace: blaine.gmane.org 1482270258 9609 195.159.176.226 (20 Dec 2016 21:44:18 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 20 Dec 2016 21:44:18 +0000 (UTC) Cc: 17742@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Dec 20 22:44:14 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cJSCi-0001ZP-4I for geb-bug-gnu-emacs@m.gmane.org; Tue, 20 Dec 2016 22:44:12 +0100 Original-Received: from localhost ([::1]:53595 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cJSCm-0006l6-A7 for geb-bug-gnu-emacs@m.gmane.org; Tue, 20 Dec 2016 16:44:16 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:57854) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cJSCe-0006kv-Np for bug-gnu-emacs@gnu.org; Tue, 20 Dec 2016 16:44:11 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cJSCY-0001Eg-1i for bug-gnu-emacs@gnu.org; Tue, 20 Dec 2016 16:44:08 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:33980) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cJSCX-0001ES-V9 for bug-gnu-emacs@gnu.org; Tue, 20 Dec 2016 16:44:01 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1cJSCX-00069Y-PA for bug-gnu-emacs@gnu.org; Tue, 20 Dec 2016 16:44:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Reuben Thomas Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 20 Dec 2016 21:44:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 17742 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 17742-submit@debbugs.gnu.org id=B17742.148227022123601 (code B ref 17742); Tue, 20 Dec 2016 21:44:01 +0000 Original-Received: (at 17742) by debbugs.gnu.org; 20 Dec 2016 21:43:41 +0000 Original-Received: from localhost ([127.0.0.1]:49378 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cJSCC-00068b-UU for submit@debbugs.gnu.org; Tue, 20 Dec 2016 16:43:41 -0500 Original-Received: from mail-qk0-f179.google.com ([209.85.220.179]:35646) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cJSCA-00068M-P9 for 17742@debbugs.gnu.org; Tue, 20 Dec 2016 16:43:39 -0500 Original-Received: by mail-qk0-f179.google.com with SMTP id u25so52188258qki.2 for <17742@debbugs.gnu.org>; Tue, 20 Dec 2016 13:43:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sc3d.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=HKDMTv2+nJdz17/rvSeVYUP4F/5R+fxEnTY1ii47azY=; b=1MwyVrAeuz6t3/3FyQe7BPt/4v080o/e5lYyrd6t7Fvx5JR3cc+UsnyhDMm7o/hu/1 T9xH6j82lh5e0dFjsgIsqPOyGOg4JWlML4kqVD+XNdpMxy3axSpw81MbM0Ym6knuRYNp 5qHa6Xm3iIDK5BcYJESmPHe//ktzB+YWgOpFw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=HKDMTv2+nJdz17/rvSeVYUP4F/5R+fxEnTY1ii47azY=; b=FJvotSVGTCJ4/tJuz61Jxz2j5TtJQ7xIUFCOjAXuFwNejAc1XJlc00ccox1lc6T0UI 8Uyj5rX3CwMp8OMvJjqbordFLXUQVb45DfEZuouvy3n7d3AAw9pyZuFLiPwLmOKh5OR/ 4wxW2YLoH3S+VdXIB8MM9bB0+nd065ybqap1jj9hjRWWtAa/NgrWGR+aXUEZWJuMBuSC a4ggl0nRjwVYN7FxxE6tRu2XEm5F6m808nGDWLsShQdbLYzBg6gtkbFLhjSWaqcwcvF1 Cvujjk6pTBtb1yrmKvr0vaUc6Q7gnV+6Xcz0/M2KCUpKdrkLG860C/GAUmiMbcJvwmBM 6f9w== X-Gm-Message-State: AIkVDXJtdV1tInioufx/1ANulq0xuMSNoOFh3DnFufXEzTbuSIR4sJ3yajIYAogjv2YT6VjtxC0tjK3OSyrgJAL/ X-Received: by 10.55.113.69 with SMTP id m66mr1635762qkc.186.1482270212966; Tue, 20 Dec 2016 13:43:32 -0800 (PST) Original-Received: by 10.140.88.51 with HTTP; Tue, 20 Dec 2016 13:43:32 -0800 (PST) In-Reply-To: <834m1y4nj7.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:127270 Archived-At: --94eb2c05a83081663205441dedf1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 20 December 2016 at 15:40, Eli Zaretskii wrote: > > From: Reuben Thomas > > Date: Mon, 19 Dec 2016 21:47:42 +0000 > > Cc: 17742@debbugs.gnu.org > > > > neither GNU Aspell nor hunspell offer any way to get this > information (about character classes of dictionaries) via their APIs. > > > > They provide this information in the dictionaries, and we glean it > > from there. See ispell-parse-hunspell-affix-file and > > ispell-aspell-find-dictionary. > > > > =E2=80=8BThe dictionaries are not part of the API (even where the forma= t is > documented, the location may not be fixed), so it's not a good idea to re= ly > on them. > > If there's no better way, then I see no problem in relying on the > dictionaries, and de-facto the results are satisfactory. > =E2=80=8BAgreed.=E2=80=8B > > =E2=80=8BHaving discovered that Aspell does not provide this informatio= n (I > checked again, and ispell-aspell-find-dictionary does not find this > information in the dictionaries, except for limited information about > otherchars; for casechars and not-casechars it defaults to [:alpha:]), I > shall investigate with the hunspell maintainers.=E2=80=8B > > Aspell provides some of that, and there's no reason to ignore what it > does provide. > =E2=80=8BAgreed.=E2=80=8B =E2=80=8B=E2=80=8B > Whether it's good enough depends on the dictionary and on what "(XP)" > means. It could be that "(XP)", including the parentheses, is a word > the dictionary recognizes, something akin to "(C)", i.e. copyright > sign. =E2=80=8BThanks, that's a good example.=E2=80=8B So, if "(C)" is in the dictionary, then with [:graph:] as casechars, if I run ispell-word with point anywhere in "(C)", Emacs will send "(C)", and it will come back as correct. If casechars were only [:alpha:], then Emacs would send "C", and it would come back as wrong. Conversely, if "C" is in the dictionary, then if I run ispell-word with casechars set to [:graph:] then Emacs will send "(C)" and it will come back as correct (because Hunspell will ignore the non-wordchars characters). It would also work with casechars set to [:alpha:]. So with casechars set to [:graph:], there's no false positive or false negative. > I don't see why it would be fragile with Enchant when it isn't with > its back-ends. =E2=80=8BBecause there's no guarantee that Enchant will continue to use bac= kends in the same way as at present.=E2=80=8B And avoiding even fragile methods is worse than using > them, when there's no better way of gleaning the same information, and > the information is important (as it is in this case). > =E2=80=8BAgreed. > I think you are drawing too radical conclusions from trying that with > a single word and a single dictionary. Which string was sent to the > speller in this case, "=E2=80=8B(XP)"=E2=80=8B and is that the string you expected to be sent? > =E2=80=8BI don't have strong feelings about that.=E2=80=8B > Moreover, even when we send entire lines to the speller, we want to > > skip lines that include only non-word characters. > > > > =E2=80=8BWhy?=E2=80=8B > > To avoid false positives and false negatives, as explained above. > =E2=80=8BBut such characters will be ignored by the spellchecker (unless pe= rhaps they occur in the personal word list). So I'm not sure how they would generate false positives or negatives.=E2=80=8B First, Enchant could be using Hunspell as its engine, right? > =E2=80=8BSure.=E2=80=8B And second, AFAIU this discussion started by you proposing to get rid > of CASECHARS etc., for all spellers, not just for Enchant, something > that will definitely cause degradation. > =E2=80=8BI didn't mean to propose that. I'm sorry if I gave that impression= .=E2=80=8B I'm just saying I don't want to put in the work now to add that support for Enchant. I have not changed (and do not propose to change) the support for Hunspell. It sounds like the important part of our disagreement is in the last > sentence. If so, I hope I've succeeded to change your mind. Failing > that, all I can suggest is to study the spelling rules of modern > speller, such as Hunspell, and see how this information is used there. > =E2=80=8BAs I already said, Hunspell does not provide this information to applications. So consumers of Hunspell have two choices: 1. Use side channels (as Emacs does). 2. Have some arbitrary idea of what constitutes a word. The fact that an API to get the wordchars from hunspell is only now being considered for addition suggests to me that neither the maintainers of hunspell nor the developers of hunspell-using programs have thought this was particularly important.=E2=80=8B I tried to explain that above: you will get falses and/or irrelevant > or missing corrections from the speller. For example, if you send > "foo.bar", and the speller doesn't support '.' as a word-constituent > character, you will get separate suggestions for "foo" and "bar", and > won't get "foobar". > What happens at the moment (with my Enchant patch) is I get the error "Ispell and its process have different character maps". I wouldn't expect "foobar" in any case, if "." is not a constituent character, though I might be surprised to get a correction for a word I thought I wasn't pointing at (but I could be surprised in this way in any case, if the dictionary has a surprising set of wordchars). > I also don't understand why you want to remove this information, that > is already there, is not harder to get with Enchant than it is without > it, and the code which supports it is already there? > =E2=80=8BI'm not proposing to remove this information. I am proposing not t= o add it for Enchant yet (because that will require extra work and code), and I am hoping to end up with a simpler way to get it, via the API. --=20 http://rrt.sc3d.org --94eb2c05a83081663205441dedf1 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
On = 20 December 2016 at 15:40, Eli Zaretskii <eliz@gnu.org> wrote:
> From: Reuben Thomas <rrt@sc3d.org>
> Date: Mon, 19 Dec 2016 21:47:42 +0000
> Cc: 17742@debbugs.gnu.org=
>
>=C2=A0 =C2=A0 =C2=A0neither GNU Aspell nor hunspell offer any way to ge= t this information (about character classes of dictionaries) via their APIs= .
>
>=C2=A0 =C2=A0 =C2=A0They provide this information in the dictionaries, = and we glean it
>=C2=A0 =C2=A0 =C2=A0from there.=C2=A0 See ispell-parse-hunspell-affix-<= wbr>file and
>=C2=A0 =C2=A0 =C2=A0ispell-aspell-find-dictionary.
>
> =E2=80=8BThe dictionaries are not part of the API (even where the form= at is documented, the location may not be fixed), so it's not a good id= ea to rely on them.

If there's no better way, then I see no problem in relying on th= e
dictionaries, and de-facto the results are satisfactory.

= =E2=80=8BAgreed.=E2=80=8B
=C2=A0
> =E2=80=8BHaving discovered that Aspell does= not provide this information (I checked again, and ispell-aspell-find-dict= ionary does not find this information in the dictionaries, except for limit= ed information about otherchars; for casechars and not-casechars it default= s to [:alpha:]), I shall investigate with the hunspell maintainers.=E2=80= =8B

Aspell provides some of that, and there's no reason to ignore wh= at it
does provide.

=E2=80=8BAgreed.=E2=80=8B
=E2=80=8B=E2=80=8B
Whether it's g= ood enough depends on the dictionary and on what "(XP)"
means.=C2=A0 It could be that "(XP)", including the parentheses, = is a word
the dictionary recognizes, something akin to "(C)", i.e. copyrigh= t
sign.

=E2=80=8BThanks, that's a good example.=E2=80=8B

So, if "(C)" is in = the dictionary, then with [:graph:] as casechars, if I run ispell-word with= point anywhere in "(C)", Emacs will send "(C)", and it= will come back as correct. If casechars were only [:alpha:], then Emacs wo= uld send "C", and it would come back as wrong.

Conversely, if "C" is in th= e dictionary, then if I run ispell-word with casechars set to [:graph:] the= n Emacs will send "(C)" and it will come back as correct (because= Hunspell will ignore the non-wordchars characters). It would also work wit= h casechars set to [:alpha:].

So with casechars set to [:graph:], there's no false positive or f= alse negative.
=C2=A0
I don't see why it would be fragile with Enchant when it isn'= ;t with
its back-ends.

=E2=80=8BBecause there's no guarantee that E= nchant will continue to use backends in the same way as at present.=E2=80= =8B

=C2=A0 And avo= iding even fragile methods is worse than using
them, when there's no better way of gleaning the same information, and<= br> the information is important (as it is in this case).
=
=E2= =80=8BAgreed.
=C2=A0
I = think you are drawing too radical conclusions from trying that with
a single word and a single dictionary.=C2=A0 Which string was sent to the speller in this case,

"=E2=80=8B(XP)"=E2=80=8B
<= /div>

and is that the string = you expected to be sent?

=E2=80=8BI don't have strong f= eelings about that.=E2=80=8B

>=C2=A0 =C2=A0 =C2=A0Moreover, even when we = send entire lines to the speller, we want to
>=C2=A0 =C2=A0 =C2=A0skip lines that include only non-word characters. >
> =E2=80=8BWhy?=E2=80=8B

To avoid false positives and false negatives, as explained above.

=E2=80=8BBut such characters will be ignored by the spellchec= ker (unless perhaps they occur in the personal word list). So I'm not s= ure how they would generate false positives or negatives.=E2=80=8B

First, Enchant could be us= ing Hunspell as its engine, right?

=E2=80=8BSure.=E2=80=8B<= /div>

And second, AFAIU = this discussion started by you proposing to get rid
of CASECHARS etc., for all spellers, not just for Enchant, something
that will definitely cause degradation.

=E2=80=8BI didn'= ;t mean to propose that. I'm sorry if I gave that impression.=E2=80=8B = I'm just saying I don't want to put in the work now to add that sup= port for Enchant. I have not changed (and do not propose to change) the sup= port for Hunspell.

It sounds like the important part of our disagreement is in the last
sentence.=C2=A0 If so, I hope I've succeeded to change your mind.=C2=A0= Failing
that, all I can suggest is to study the spelling rules of modern
speller, such as Hunspell, and see how this information is used there.
<= /blockquote>

=E2=80=8BAs I already said, Hunspell does not provide this info= rmation to applications. So consumers of Hunspell have two choices:

1. Use side channels (as Emacs= does).

2. Have some arbi= trary idea of what constitutes a word.

The fact that an API to get the wordchars from hunspell is on= ly now being considered for addition suggests to me that neither the mainta= iners of hunspell nor the developers of hunspell-using programs have though= t this was particularly important.=E2=80=8B

I tried to explain that above: you will get false= s and/or irrelevant
or missing corrections from the speller.=C2=A0 For example, if you send
"foo.bar", and the speller doesn't support '.' as a w= ord-constituent
character, you will get separate suggestions for "foo" and "= bar", and
won't get "foobar".

What happens at the momen= t (with my Enchant patch) is I get the error "Ispell and its process h= ave different character maps". I wouldn't expect "foobar"= ; in any case, if "." is not a constituent character, though I mi= ght be surprised to get a correction for a word I thought I wasn't poin= ting at (but I could be surprised in this way in any case, if the dictionar= y has a surprising set of wordchars).
=C2=A0
I also don't understand why you want to remove this information, that<= br> is already there, is not harder to get with Enchant than it is without
it, and the code which supports it is already there?

=E2=80=8BI'm not proposing to remove this information. I am proposin= g not to add it for Enchant yet (because that will require extra work and c= ode), and I am hoping to end up with a simpler way to get it, via the API.<= /div>

--
--94eb2c05a83081663205441dedf1--