From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Reuben Thomas Newsgroups: gmane.emacs.bugs Subject: bug#17742: Acknowledgement (Support for enchant?) Date: Mon, 19 Dec 2016 21:47:42 +0000 Message-ID: References: <834m2hjbmr.fsf@gnu.org> <83bmwfbxaf.fsf@gnu.org> <837f73bqwv.fsf@gnu.org> <838trb6h7s.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=001a113e46fe9bcc12054409de2a X-Trace: blaine.gmane.org 1482184098 27027 195.159.176.226 (19 Dec 2016 21:48:18 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 19 Dec 2016 21:48:18 +0000 (UTC) Cc: 17742@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon Dec 19 22:48:12 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cJ5my-0005HI-8a for geb-bug-gnu-emacs@m.gmane.org; Mon, 19 Dec 2016 22:48:08 +0100 Original-Received: from localhost ([::1]:47888 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cJ5n2-0006Ih-SG for geb-bug-gnu-emacs@m.gmane.org; Mon, 19 Dec 2016 16:48:12 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:40793) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cJ5mv-0006Gk-M1 for bug-gnu-emacs@gnu.org; Mon, 19 Dec 2016 16:48:07 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cJ5ms-0003o9-Ev for bug-gnu-emacs@gnu.org; Mon, 19 Dec 2016 16:48:05 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:60860) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cJ5ms-0003o4-BF for bug-gnu-emacs@gnu.org; Mon, 19 Dec 2016 16:48:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1cJ5ms-0001jT-05 for bug-gnu-emacs@gnu.org; Mon, 19 Dec 2016 16:48:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Reuben Thomas Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 19 Dec 2016 21:48:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 17742 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 17742-submit@debbugs.gnu.org id=B17742.14821840726640 (code B ref 17742); Mon, 19 Dec 2016 21:48:01 +0000 Original-Received: (at 17742) by debbugs.gnu.org; 19 Dec 2016 21:47:52 +0000 Original-Received: from localhost ([127.0.0.1]:48026 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cJ5mh-0001j1-V5 for submit@debbugs.gnu.org; Mon, 19 Dec 2016 16:47:52 -0500 Original-Received: from mail-qt0-f176.google.com ([209.85.216.176]:34145) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cJ5mf-0001ij-9l for 17742@debbugs.gnu.org; Mon, 19 Dec 2016 16:47:49 -0500 Original-Received: by mail-qt0-f176.google.com with SMTP id n6so160319399qtd.1 for <17742@debbugs.gnu.org>; Mon, 19 Dec 2016 13:47:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sc3d.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=N9eY9gdy8Mwov6blugKgkZqWp77Fxh4LgjmUfnjodW0=; b=o3AF4L1J2y+ApDlGrELOonrn9E6WBKED0dBZjjde3D2KI86Glj99A9BEnuuBIQeQg8 ol4wJTES+t2Yf6zsVFwm7w8pF1mteH0+w7unMN2SWrAlMrJS/VVH2LK87neLnaV5vfiy lI518sKvzUunEVraIhpL37M0wTigA/pCAoUtU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=N9eY9gdy8Mwov6blugKgkZqWp77Fxh4LgjmUfnjodW0=; b=dPqiV2PJVhArAvoobbNwU1Y6HhYeCj1aPl7Sr0I5WXt5zEStl+jiCr8KGmiNWKaup2 bt+wYJEKV5lPPuB0SCZG6m+yObG5KpfO0NmiA6mRN1bEq7vw46xSbGv3eTY1DPObQRcR neGOfk8wY2IrpASn2G4F/8YGNBRqjKgbxRpwZFCgsi0EuiT9lB4XcZk9dpDZk0kQ6oCu T7Kw2knv4e2coVooc2o9zq5CjugCEps2hTUOzavRwTqjhMli698th/+Psjm7HpDjlzsR Q13I7pAj26dvXdEHyBNcug8iZXeBPs0sYPssSgeZh+IxCW9Q8XWRYB/Cdk+vIBVO8DjE Cx5w== X-Gm-Message-State: AIkVDXI2h/ktOVMqnH38L0A3uOK8bmMYD4SGh3RiwPJn9D/t/UrAXSZpl2+4lUCu+bld8cg6ZmWF8uTdCrPaNI39 X-Received: by 10.237.33.173 with SMTP id l42mr17151564qtc.271.1482184063677; Mon, 19 Dec 2016 13:47:43 -0800 (PST) Original-Received: by 10.140.88.51 with HTTP; Mon, 19 Dec 2016 13:47:42 -0800 (PST) In-Reply-To: <838trb6h7s.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:127221 Archived-At: --001a113e46fe9bcc12054409de2a Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 19 December 2016 at 16:01, Eli Zaretskii wrote: > > From: Reuben Thomas > > Date: Sun, 18 Dec 2016 23:39:54 +0000 > > Cc: 17742@debbugs.gnu.org > > > > I have not had any response to my enquiries yet, but I did some > research, and neither GNU Aspell nor hunspell offer any way to get this > information (about character classes of dictionaries) via their APIs. > > They provide this information in the dictionaries, and we glean it > from there. See ispell-parse-hunspell-affix-file and > ispell-aspell-find-dictionary. > =E2=80=8BThe dictionaries are not part of the API (even where the format is documented, the location may not be fixed), so it's not a good idea to rely on them. =E2=80=8BHaving discovered that Aspell does not provide this information (I= checked again, and ispell-aspell-find-dictionary does not find this information in the dictionaries, except for limited information about otherchars; for casechars and not-casechars it defaults to [:alpha:]), I shall investigate with the hunspell maintainers.=E2=80=8B > Maybe there's a misunderstanding: I'm talking about the CASECHARS, > NOT-CASECHARS, and OTHERCHARS parts of the dictionary data in > ispell-dictionary-alist. =E2=80=8BThere's no misunderstanding here, that's what I'm talking about.= =E2=80=8B Each dictionary can (and many do) use some of the punctuation > characters in the words it can handle. A notable example is the > apostrophe ' in English, used for the various suffixes that spellers > support; similar features exist in other languages, but with possibly > different punctuation characters. Ispell.el must match that by using > the speller's notion of a word, which must be independent of the > current major mode's idea of what a word is. This is where these > character sets come into play, and I really cannot see how can > ispell.el work well without using them as it does now. > =E2=80=8BCurrently, using casechars =3D [[:graph:]], if I put point over pa= rt of the string " (XP) ", and run M-x ispell-word, it says "(XP) is correct". That's good enough for me! Note that merely using the characters declared in the dictionary may not be enough: I have words like SC=C2=B3D (I spell my company that way) in my per= sonal word lists. Other users might be more imaginative, and for example have sequences of emoji. The list of characters in the dictionary is only a minimum.=E2=80=8B > So we do need this information. If Enchant doesn't provide it, we > could still use the same technique as with Aspell and Hunspell, > provided that we can figure out which back end(s) is/are used by > Enchant. Is that doable? > =E2=80=8BYes, that can be done, but it's fragile; that's why I'm trying to = avoid it.=E2=80=8B Ispell.el also supports spell-checking by words, in which case the > above is not useful, because we need to figure out what is a word. > =E2=80=8BSee above. It's not clear to me that we need a very precise idea o= f what constitutes a word.=E2=80=8B Moreover, even when we send entire lines to the speller, we want to > skip lines that include only non-word characters. =E2=80=8BWhy?=E2=80=8B Just look at the > =E2=80=8B =E2=80=8B > callers of the above-mentioned accessor functions, and you will see > how we use them. > =E2=80=8BI have read this code. I see how we use them; it's just not clear = to me that it's necessary to use them thus.=E2=80=8B Hunspell is the most modern and sophisticated speller, we certainly > don't want to degrade it. =E2=80=8BNo chance of that, this patch is only about Enchant.=E2=80=8B Also, Aspell uses the dictionaries at least > for some of this info, see the function I pointed to above. > =E2=80=8BOnly for otherchars, not casechars/not-casechars.=E2=80=8B Bottom line, this information cannot be thrown away or ignored. It is > important for correctly interfacing with a dictionary and for doing > TRT as the users expect. Any modern speller program would benefit > from it, and therefore we should strive to provide such information to > ispell.el whenever we possibly can. > =E2=80=8BIt is not a question of throwing away or ignoring information: the information is simply not available through documented channels (at least for Enchant). Yes, one can find the underlying engine and then use that information to (try to) find the dictionaries, but one is then making a number of brittle assumptions. And it's not clear that the information is actually necessary to have. It would be helpful if you could show a situation in which using [:graph:] for enchant dictionaries. actually misbehaves in some way. In fact, reading enchant's source code, it uses a fixed set of Unicode classes for its own internal equivalent of casechars. Using that would make sense (for Enchant! again, I'm not suggesting changing how we use hunspell)= . One other data point: a senior LyX maintainer, Jean-Marc Lasgouttes, agrees with you: https://github.com/AbiWord/enchant/issues/17#issuecomment-267924304 He says that LyX has a "bug open somewhere" that suggests using this information (but he didn't know it was available!). --=20 http://rrt.sc3d.org --001a113e46fe9bcc12054409de2a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
On = 19 December 2016 at 16:01, Eli Zaretskii <eliz@gnu.org> wrote:
> From: Reuben Thomas <rrt@sc3d.org>
> Date: Sun, 18 Dec 2016 23:39:54 +0000
> Cc: 17742@d= ebbugs.gnu.org
>
> I have not had any response to my enquiries yet, but I did some resear= ch, and neither GNU Aspell nor hunspell offer any way to get this informati= on (about character classes of dictionaries) via their APIs.

They provide this information in the dictionaries, and we glean it from there.=C2=A0 See ispell-parse-hunspell-affix-file and
ispell-aspell-find-dictionary.

=E2=80=8BThe dictionaries ar= e not part of the API (even where the format is documented, the location ma= y not be fixed), so it's not a good idea to rely on them.

=E2=80=8BHavi= ng discovered that Aspell does not provide this information (I checked agai= n, and ispell-aspell-find-dictionary does not find this information in the = dictionaries, except for limited information about otherchars; for casechar= s and not-casechars it defaults to [:alpha:]), I shall investigate with the= hunspell maintainers.=E2=80=8B
=C2=A0
Maybe there's a misunderstanding: I&= #39;m talking about the CASECHARS,
NOT-CASECHARS, and OTHERCHARS parts of the dictionary data in
ispell-dictionary-alist.

=E2=80=8BThere's no misunderstandi= ng here, that's what I'm talking about.=E2=80=8B
Each dictionary c= an (and many do) use some of the punctuation
characters in the words it can handle.=C2=A0 A notable example is the
apostrophe ' in English, used for the various suffixes that spellers support; similar features exist in other languages, but with possibly
different punctuation characters.=C2=A0 Ispell.el must match that by using<= br> the speller's notion of a word, which must be independent of the
current major mode's idea of what a word is.=C2=A0 This is where these<= br> character sets come into play, and I really cannot see how can
ispell.el work well without using them as it does now.

=E2= =80=8BCurrently, using casechars =3D [[:graph:]], if I put point over part = of the string " (XP) ", and run M-x ispell-word, it says "(X= P) is correct". That's good enough for me!

Note that merely using the characters declared i= n the dictionary may not be enough: I have words like SC=C2=B3D (I spell my= company that way) in my personal word lists. Other users might be more ima= ginative, and for example have sequences of emoji. The list of characters i= n the dictionary is only a minimum.=E2=80=8B
=C2=A0
So we do need this informat= ion.=C2=A0 If Enchant doesn't provide it, we
could still use the same technique as with Aspell and Hunspell,
provided that we can figure out which back end(s) is/are used by
Enchant.=C2=A0 Is that doable?

=E2=80=8BYes, that can be do= ne, but it's fragile; that's why I'm trying to avoid it.=E2=80= =8B

Ispell.el also supports spell-checking by words, in which case the
above is not useful, because we need to figure out what is a word.

=E2=80=8BSee above. It's not clear to me that we need a very pr= ecise idea of what constitutes a word.=E2=80=8B

<= blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-l= eft:1px solid rgb(204,204,204);padding-left:1ex"> Moreover, even when we send entire lines to the speller, we want to
skip lines that include only non-word characters.

=E2=80=8BWhy?= =E2=80=8B

Just look at the
=E2=80=8B =E2=80=8B
callers of the above-mention= ed accessor functions, and you will see
how we use them.

=E2=80=8BI have read this code. I see how = we use them; it's just not clear to me that it's necessary to use t= hem thus.=E2=80=8B

Hunspell is the most modern and sophisticated speller, = we certainly
don't want to degrade it.

=E2=80=8BNo chance of that, this = patch is only about Enchant.=E2=80=8B

=C2=A0 Also, Aspell uses the diction= aries at least
for some of this info, see the function I pointed to above.

=E2=80=8BOnly for otherchars, not casechars/not-casechars.=E2=80=8B
<= /div>

Botto= m line, this information cannot be thrown away or ignored.=C2=A0 It is
important for correctly interfacing with a dictionary and for doing
TRT as the users expect.=C2=A0 Any modern speller program would benefit
from it, and therefore we should strive to provide such information to
ispell.el whenever we possibly can.

= =E2=80=8BIt is not a question of throwing away or ignoring information: the= information is simply not available through documented channels (at least = for Enchant). Yes, one can find the underlying engine and then use that inf= ormation to (try to) find the dictionaries, but one is then making a number= of brittle assumptions. And it's not clear that the information is act= ually necessary to have.

= It would be helpful if you could show a situation in which using [:graph:] = for enchant dictionaries. actually misbehaves in some way.

In fact, reading enchant's source co= de, it uses a fixed set of Unicode classes for its own internal equivalent = of casechars. Using that would make sense (for Enchant! again, I'm not = suggesting changing how we use hunspell).

One other data point: a senior LyX maintainer, Jean-Marc L= asgouttes, agrees with you:


He says that LyX = has a "bug open somewhere" that suggests using this information (= but he didn't know it was available!).

--
--001a113e46fe9bcc12054409de2a--