From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#17742: Acknowledgement (Support for enchant?) Date: Tue, 20 Dec 2016 17:40:12 +0200 Message-ID: <834m1y4nj7.fsf@gnu.org> References: <834m2hjbmr.fsf@gnu.org> <83bmwfbxaf.fsf@gnu.org> <837f73bqwv.fsf@gnu.org> <838trb6h7s.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1482248472 28861 195.159.176.226 (20 Dec 2016 15:41:12 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 20 Dec 2016 15:41:12 +0000 (UTC) Cc: 17742@debbugs.gnu.org To: Reuben Thomas Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Dec 20 16:41:07 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cJMXJ-0006Td-Ly for geb-bug-gnu-emacs@m.gmane.org; Tue, 20 Dec 2016 16:41:05 +0100 Original-Received: from localhost ([::1]:51931 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cJMXO-0003ki-5B for geb-bug-gnu-emacs@m.gmane.org; Tue, 20 Dec 2016 10:41:10 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56481) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cJMXH-0003ka-Pk for bug-gnu-emacs@gnu.org; Tue, 20 Dec 2016 10:41:05 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cJMXG-0005Ue-9e for bug-gnu-emacs@gnu.org; Tue, 20 Dec 2016 10:41:03 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:33738) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cJMXG-0005Ua-5c for bug-gnu-emacs@gnu.org; Tue, 20 Dec 2016 10:41:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1cJMXF-00045S-W6 for bug-gnu-emacs@gnu.org; Tue, 20 Dec 2016 10:41:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 20 Dec 2016 15:41:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 17742 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 17742-submit@debbugs.gnu.org id=B17742.148224842415647 (code B ref 17742); Tue, 20 Dec 2016 15:41:01 +0000 Original-Received: (at 17742) by debbugs.gnu.org; 20 Dec 2016 15:40:24 +0000 Original-Received: from localhost ([127.0.0.1]:49137 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cJMWe-00044J-Cl for submit@debbugs.gnu.org; Tue, 20 Dec 2016 10:40:24 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:37076) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cJMWc-000444-QR for 17742@debbugs.gnu.org; Tue, 20 Dec 2016 10:40:23 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cJMWW-0005F0-G2 for 17742@debbugs.gnu.org; Tue, 20 Dec 2016 10:40:17 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:33216) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cJMWW-0005Eu-CH; Tue, 20 Dec 2016 10:40:16 -0500 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:2957 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1cJMWV-0002BJ-KC; Tue, 20 Dec 2016 10:40:16 -0500 In-reply-to: (message from Reuben Thomas on Mon, 19 Dec 2016 21:47:42 +0000) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:127240 Archived-At: > From: Reuben Thomas > Date: Mon, 19 Dec 2016 21:47:42 +0000 > Cc: 17742@debbugs.gnu.org > > neither GNU Aspell nor hunspell offer any way to get this information (about character classes of dictionaries) via their APIs. > > They provide this information in the dictionaries, and we glean it > from there. See ispell-parse-hunspell-affix-file and > ispell-aspell-find-dictionary. > > ​The dictionaries are not part of the API (even where the format is documented, the location may not be fixed), so it's not a good idea to rely on them. If there's no better way, then I see no problem in relying on the dictionaries, and de-facto the results are satisfactory. > ​Having discovered that Aspell does not provide this information (I checked again, and ispell-aspell-find-dictionary does not find this information in the dictionaries, except for limited information about otherchars; for casechars and not-casechars it defaults to [:alpha:]), I shall investigate with the hunspell maintainers.​ Aspell provides some of that, and there's no reason to ignore what it does provide. > ​Currently, using casechars = [[:graph:]], if I put point over part of the string " (XP) ", and run M-x ispell-word, it says "(XP) is correct". That's good enough for me! Whether it's good enough depends on the dictionary and on what "(XP)" means. It could be that "(XP)", including the parentheses, is a word the dictionary recognizes, something akin to "(C)", i.e. copyright sign. And it could be that the correct word is "XP", with the parentheses acting as punctuation. And there could be additional alternatives. Only the dictionary "knows" what is the right alternative, and ispell.el should abide by the dictionary's rules, or else it will not do what the user wants. E.g., "XP" could not be in the dictionary (as in fact I get when I try that with Hunspell), but "(XP)" is. So CASECHARS should be set up according to what the dictionary expects, or you will have false positives and false negatives. > Note that merely using the characters declared in the dictionary may not be enough: I have words like SC³D (I spell my company that way) in my personal word lists. Other users might be more imaginative, and for example have sequences of emoji. The list of characters in the dictionary is only a minimum.​ That's why personal word list go together with dictionaries: they both must use the same affix rules, so if you change to another dictionary for the same language, your personal word list should also change, or else you will get false negatives. > So we do need this information. If Enchant doesn't provide it, we > could still use the same technique as with Aspell and Hunspell, > provided that we can figure out which back end(s) is/are used by > Enchant. Is that doable? > > ​Yes, that can be done, but it's fragile; that's why I'm trying to avoid it.​ I don't see why it would be fragile with Enchant when it isn't with its back-ends. And avoiding even fragile methods is worse than using them, when there's no better way of gleaning the same information, and the information is important (as it is in this case). > Ispell.el also supports spell-checking by words, in which case the > above is not useful, because we need to figure out what is a word. > > ​See above. It's not clear to me that we need a very precise idea of what constitutes a word.​ I think you are drawing too radical conclusions from trying that with a single word and a single dictionary. Which string was sent to the speller in this case, and is that the string you expected to be sent? > Moreover, even when we send entire lines to the speller, we want to > skip lines that include only non-word characters. > > ​Why?​ To avoid false positives and false negatives, as explained above. > Hunspell is the most modern and sophisticated speller, we certainly > don't want to degrade it. > > ​No chance of that, this patch is only about Enchant.​ First, Enchant could be using Hunspell as its engine, right? And second, AFAIU this discussion started by you proposing to get rid of CASECHARS etc., for all spellers, not just for Enchant, something that will definitely cause degradation. > Also, Aspell uses the dictionaries at least > for some of this info, see the function I pointed to above. > > ​Only for otherchars, not casechars/not-casechars.​ Partial information is better than no information, IMO. > Bottom line, this information cannot be thrown away or ignored. It is > important for correctly interfacing with a dictionary and for doing > TRT as the users expect. Any modern speller program would benefit > from it, and therefore we should strive to provide such information to > ispell.el whenever we possibly can. > > ​It is not a question of throwing away or ignoring information: the information is simply not available through documented channels (at least for Enchant). Yes, one can find the underlying engine and then use that information to (try to) find the dictionaries, but one is then making a number of brittle assumptions. And it's not clear that the information is actually necessary to have. It sounds like the important part of our disagreement is in the last sentence. If so, I hope I've succeeded to change your mind. Failing that, all I can suggest is to study the spelling rules of modern speller, such as Hunspell, and see how this information is used there. > It would be helpful if you could show a situation in which using [:graph:] for enchant dictionaries. actually misbehaves in some way. I tried to explain that above: you will get falses and/or irrelevant or missing corrections from the speller. For example, if you send "foo.bar", and the speller doesn't support '.' as a word-constituent character, you will get separate suggestions for "foo" and "bar", and won't get "foobar". I also don't understand why you want to remove this information, that is already there, is not harder to get with Enchant than it is without it, and the code which supports it is already there?