From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#23097: 24.5; ispell.el: lines with both CASECHARS and NOT-CASECHARS get sent to the spell checker Date: Mon, 17 Aug 2020 19:40:58 +0300 Message-ID: <83eeo5gr8l.fsf@gnu.org> References: <56F2DC47.2090600@gmail.com> <83fuvh2gwd.fsf@gnu.org> <83bljbkhrh.fsf@gnu.org> <5fe8e18f-efb4-4f9b-fd85-0cb4eccc58b4@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="22445"; mail-complaints-to="usenet@ciao.gmane.io" Cc: stefan@marxist.se, 23097@debbugs.gnu.org To: Nikolay Kudryavtsev Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon Aug 17 18:42:20 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1k7iDE-0005iD-Ny for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 17 Aug 2020 18:42:20 +0200 Original-Received: from localhost ([::1]:49240 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k7iDD-0007VK-Qp for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 17 Aug 2020 12:42:19 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:48608) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k7iCw-0007Ua-F7 for bug-gnu-emacs@gnu.org; Mon, 17 Aug 2020 12:42:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:49758) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1k7iCw-0008BM-63 for bug-gnu-emacs@gnu.org; Mon, 17 Aug 2020 12:42:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1k7iCw-0000Fc-4k for bug-gnu-emacs@gnu.org; Mon, 17 Aug 2020 12:42:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 17 Aug 2020 16:42:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 23097 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: moreinfo notabug Original-Received: via spool by 23097-submit@debbugs.gnu.org id=B23097.1597682482900 (code B ref 23097); Mon, 17 Aug 2020 16:42:02 +0000 Original-Received: (at 23097) by debbugs.gnu.org; 17 Aug 2020 16:41:22 +0000 Original-Received: from localhost ([127.0.0.1]:33068 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1k7iCI-0000ES-9d for submit@debbugs.gnu.org; Mon, 17 Aug 2020 12:41:22 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:38358) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1k7iCG-0000EE-3F for 23097@debbugs.gnu.org; Mon, 17 Aug 2020 12:41:20 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:60371) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k7iCA-00087N-OH; Mon, 17 Aug 2020 12:41:14 -0400 Original-Received: from [176.228.60.248] (port=2323 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1k7iCA-0000NX-1v; Mon, 17 Aug 2020 12:41:14 -0400 In-Reply-To: <5fe8e18f-efb4-4f9b-fd85-0cb4eccc58b4@gmail.com> (message from Nikolay Kudryavtsev on Mon, 17 Aug 2020 12:20:08 +0300) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:185403 Archived-At: > From: Nikolay Kudryavtsev > Cc: 23097@debbugs.gnu.org > Date: Mon, 17 Aug 2020 12:20:08 +0300 > > This is not an external software bug, but very much an Emacs bug. > > I'm not sure what was the initial design idea for CASECHARS and > NOT-CASECHARS, but whatever it was, it would not work effectively due to > feeding the entire line. The most obvious practical use for them(being > able to spellcheck languages with completely different alphabets without > the spellchecker misfiring on either pass) would not work either. The original design was that a spell-checker supports a single language, and any text in other languages is a spelling mistake. This is still true for Ispell and for Aspell; only Hunspell (and Enchant, when it uses Hunspell as its back-end) supports multiple languages. With Hunspell, ispell.el effectively ignores CASECHARS and NOT-CASECHARS, and instead uses the character set specified by the dictionary file itself. This is the only multi-dictionary spell-checking configuration that ispell.el currently supports. Which is why, when you first reported this, I asked you why you couldn't use Hunspell; your answer, which described some kind of failure related to encoding, I couldn't understand then and I don't understand now (primarily because that feature works for me). Instead, you seem to insist on using Aspell in a way that to me sounds like a kludge: spell-check the region with one dictionary, then restart ispell.el with another dictionary and spell-check the same region again. AFAIU, you'd like ispell.el to support this kind of workaround OOTB. Is that correct, or did I miss something? If my understanding is correct, then, apart of being a kludgey solution for a problem that has a much cleaner one, I don't think I understand how this could work well in general. Suppose you have in your buffer a mis-spelled word such as this: fooЫbar with the Cyrillic letter being there by accident: perhaps you unintentionally pressed a key when you shouldn't have. Or imagine the following typo: fooбар which could happen if you forgot to switch the input method. With your proposed mode of operation, the spell-checker will check partial words and decide that in both cases there's no spelling mistakes here, because each partial word is spelled correctly in its language. But clearly these are typos that need to be flagged. Thus, just using 2 sets of characters is not enough to handle these typos intelligently, as you'd get a lot of false negatives. So even if we consider your report as a feature request, it is not entirely clear to me how to implement such a feature. And frankly, since at least one spell-checker exists which supports multiple dictionaries, it is not clear to me why we should try so hard forcing Aspell look as if it did, too. > The ideal pratical fix for this should spellcheck such lines word by word. I think I show above why such simplistic strategy will backfire by leaving some typos undetected.