* bug#23097: 24.5; ispell.el: lines with both CASECHARS and NOT-CASECHARS get sent to the spell checker @ 2016-03-23 18:11 Nikolay Kudryavtsev 2016-03-23 18:22 ` Eli Zaretskii 0 siblings, 1 reply; 11+ messages in thread From: Nikolay Kudryavtsev @ 2016-03-23 18:11 UTC (permalink / raw) To: 23097 Each entry ispell-dictionary-alist has elements called CASECHARS and NOT-CASECHARS. They are used for defining what gets sent to the spell checker and what does not. One use case for them is that, if you have two dictionaries for languages with totally different alphabets, you can spellcheck a file where both languages are mixed together. In theory. Here's what happens in practice: If line contains only CASECHARS, it gets sent to the spell checker. If line contains only NOT-CASECHARS, it does not get sent to the spell checker. If line contains both CASECHARS and NOT-CASECHARS, the whole line gets sent to the spell checker. Sending the whole line makes NOT-CASECHARS pretty useless. I think the reasonable behavior in this case would be sending the line word by word. Here's how to repeat this with aspell. 1. Starting from emacs -Q eval this: (setq ispell-program-name "aspell") (defun ispell-set-my-dictionaries() (setq ispell-dictionary-alist (delq (assoc "english" ispell-dictionary-alist) ispell-dictionary-alist)) (add-to-list 'ispell-dictionary-alist '("english" "[kcat]" "[dogh]" "[']" nil ("-B") nil iso-8859-1))) (advice-add 'ispell-set-spellchecker-params :after #'ispell-set-my-dictionaries) 2. ispell-change-dictionary to english. 3. ispell-buffer a buffer containing this: kat doh kat doh "Kat" at the first line would get sent to aspell, since it passes CASECHARS. This is fine. "Doh" at the second line would be ignored, since it's not in CASECHARS. This is fine too. At the line with both words, not only "kat" would get sent, but also "doh" and that's what we don't want to happen. -- Best Regards, Nikolay Kudryavtsev ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#23097: 24.5; ispell.el: lines with both CASECHARS and NOT-CASECHARS get sent to the spell checker 2016-03-23 18:11 bug#23097: 24.5; ispell.el: lines with both CASECHARS and NOT-CASECHARS get sent to the spell checker Nikolay Kudryavtsev @ 2016-03-23 18:22 ` Eli Zaretskii 2016-03-23 20:12 ` Nikolay Kudryavtsev 2020-08-15 4:22 ` Stefan Kangas 0 siblings, 2 replies; 11+ messages in thread From: Eli Zaretskii @ 2016-03-23 18:22 UTC (permalink / raw) To: Nikolay Kudryavtsev; +Cc: 23097 > From: Nikolay Kudryavtsev <nikolay.kudryavtsev@gmail.com> > Date: Wed, 23 Mar 2016 21:11:19 +0300 > > Each entry ispell-dictionary-alist has elements called CASECHARS and > NOT-CASECHARS. They are used for defining what gets sent to the spell > checker and what does not. > > One use case for them is that, if you have two dictionaries for > languages with totally different alphabets, you can spellcheck a file > where both languages are mixed together. In theory. Don't you need to restart the spell-checker each time you switch the dictionaries? AFAIK, only Hunspell supports such mixed spell-checking, and with Hunspell you don't need to break the line into separate words in that case. With any other spell-checker, you need to restart it whenever you switch languages. ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#23097: 24.5; ispell.el: lines with both CASECHARS and NOT-CASECHARS get sent to the spell checker 2016-03-23 18:22 ` Eli Zaretskii @ 2016-03-23 20:12 ` Nikolay Kudryavtsev 2020-08-15 4:22 ` Stefan Kangas 1 sibling, 0 replies; 11+ messages in thread From: Nikolay Kudryavtsev @ 2016-03-23 20:12 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 23097 [-- Attachment #1: Type: text/plain, Size: 319 bytes --] Yes, you do need to restart the spell checker when you switch dictionaries, but it's not too inconvenient in practice. As you know, I've ran into issues with hunspell, which I described in this thread <http://lists.gnu.org/archive/html/help-gnu-emacs/2016-03/msg00107.html>. -- Best Regards, Nikolay Kudryavtsev [-- Attachment #2: Type: text/html, Size: 569 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#23097: 24.5; ispell.el: lines with both CASECHARS and NOT-CASECHARS get sent to the spell checker 2016-03-23 18:22 ` Eli Zaretskii 2016-03-23 20:12 ` Nikolay Kudryavtsev @ 2020-08-15 4:22 ` Stefan Kangas 2020-08-15 16:15 ` Eli Zaretskii 1 sibling, 1 reply; 11+ messages in thread From: Stefan Kangas @ 2020-08-15 4:22 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 23097, Nikolay Kudryavtsev Eli Zaretskii <eliz@gnu.org> writes: >> From: Nikolay Kudryavtsev <nikolay.kudryavtsev@gmail.com> >> Date: Wed, 23 Mar 2016 21:11:19 +0300 >> >> Each entry ispell-dictionary-alist has elements called CASECHARS and >> NOT-CASECHARS. They are used for defining what gets sent to the spell >> checker and what does not. >> >> One use case for them is that, if you have two dictionaries for >> languages with totally different alphabets, you can spellcheck a file >> where both languages are mixed together. In theory. > > Don't you need to restart the spell-checker each time you switch the > dictionaries? AFAIK, only Hunspell supports such mixed > spell-checking, and with Hunspell you don't need to break the line > into separate words in that case. With any other spell-checker, you > need to restart it whenever you switch languages. It seems like this is a limitation of external software then, and not in Emacs? Should this therefore be closed, or is there anything more to do here? Best regards, Stefan Kangas ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#23097: 24.5; ispell.el: lines with both CASECHARS and NOT-CASECHARS get sent to the spell checker 2020-08-15 4:22 ` Stefan Kangas @ 2020-08-15 16:15 ` Eli Zaretskii 2020-08-15 16:40 ` Stefan Kangas 0 siblings, 1 reply; 11+ messages in thread From: Eli Zaretskii @ 2020-08-15 16:15 UTC (permalink / raw) To: Stefan Kangas; +Cc: 23097, nikolay.kudryavtsev > From: Stefan Kangas <stefan@marxist.se> > Date: Fri, 14 Aug 2020 21:22:24 -0700 > Cc: Nikolay Kudryavtsev <nikolay.kudryavtsev@gmail.com>, 23097@debbugs.gnu.org > > Eli Zaretskii <eliz@gnu.org> writes: > > >> From: Nikolay Kudryavtsev <nikolay.kudryavtsev@gmail.com> > >> Date: Wed, 23 Mar 2016 21:11:19 +0300 > >> > >> Each entry ispell-dictionary-alist has elements called CASECHARS and > >> NOT-CASECHARS. They are used for defining what gets sent to the spell > >> checker and what does not. > >> > >> One use case for them is that, if you have two dictionaries for > >> languages with totally different alphabets, you can spellcheck a file > >> where both languages are mixed together. In theory. > > > > Don't you need to restart the spell-checker each time you switch the > > dictionaries? AFAIK, only Hunspell supports such mixed > > spell-checking, and with Hunspell you don't need to break the line > > into separate words in that case. With any other spell-checker, you > > need to restart it whenever you switch languages. > > It seems like this is a limitation of external software then, and not in > Emacs? Should this therefore be closed, or is there anything more to do > here? Yes, I think we should close this issue. ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#23097: 24.5; ispell.el: lines with both CASECHARS and NOT-CASECHARS get sent to the spell checker 2020-08-15 16:15 ` Eli Zaretskii @ 2020-08-15 16:40 ` Stefan Kangas 2020-08-17 9:20 ` Nikolay Kudryavtsev 0 siblings, 1 reply; 11+ messages in thread From: Stefan Kangas @ 2020-08-15 16:40 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 23097, nikolay.kudryavtsev tags 23097 + notabug close 23097 thanks Eli Zaretskii <eliz@gnu.org> writes: > Yes, I think we should close this issue. Thanks, done. ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#23097: 24.5; ispell.el: lines with both CASECHARS and NOT-CASECHARS get sent to the spell checker 2020-08-15 16:40 ` Stefan Kangas @ 2020-08-17 9:20 ` Nikolay Kudryavtsev 2020-08-17 12:48 ` Stefan Kangas 2020-08-17 16:40 ` Eli Zaretskii 0 siblings, 2 replies; 11+ messages in thread From: Nikolay Kudryavtsev @ 2020-08-17 9:20 UTC (permalink / raw) To: Stefan Kangas, Eli Zaretskii; +Cc: 23097 This is not an external software bug, but very much an Emacs bug. I'm not sure what was the initial design idea for CASECHARS and NOT-CASECHARS, but whatever it was, it would not work effectively due to feeding the entire line. The most obvious practical use for them(being able to spellcheck languages with completely different alphabets without the spellchecker misfiring on either pass) would not work either. The ideal pratical fix for this should spellcheck such lines word by word. -- Best Regards, Nikolay Kudryavtsev ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#23097: 24.5; ispell.el: lines with both CASECHARS and NOT-CASECHARS get sent to the spell checker 2020-08-17 9:20 ` Nikolay Kudryavtsev @ 2020-08-17 12:48 ` Stefan Kangas 2020-08-17 16:40 ` Eli Zaretskii 1 sibling, 0 replies; 11+ messages in thread From: Stefan Kangas @ 2020-08-17 12:48 UTC (permalink / raw) To: Nikolay Kudryavtsev, Eli Zaretskii; +Cc: 23097 Nikolay Kudryavtsev <nikolay.kudryavtsev@gmail.com> writes: > This is not an external software bug, but very much an Emacs bug. > > I'm not sure what was the initial design idea for CASECHARS and > NOT-CASECHARS, but whatever it was, it would not work effectively due to > feeding the entire line. The most obvious practical use for them(being > able to spellcheck languages with completely different alphabets without > the spellchecker misfiring on either pass) would not work either. > > The ideal pratical fix for this should spellcheck such lines word by word. Okay, but that's not a documented use-case, so I'm not sure that it's a bug. The thing you suggest may be possible, but would require developing a new feature, for example to run two instances of the same spell checker at once. AFAIU, the best solution is to use an external spell checker that has support for using two languages at once. Why not use that? Best regards, Stefan Kangas ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#23097: 24.5; ispell.el: lines with both CASECHARS and NOT-CASECHARS get sent to the spell checker 2020-08-17 9:20 ` Nikolay Kudryavtsev 2020-08-17 12:48 ` Stefan Kangas @ 2020-08-17 16:40 ` Eli Zaretskii [not found] ` <08f3ac29-761c-ced7-1e2f-0f643512b986@gmail.com> 1 sibling, 1 reply; 11+ messages in thread From: Eli Zaretskii @ 2020-08-17 16:40 UTC (permalink / raw) To: Nikolay Kudryavtsev; +Cc: stefan, 23097 > From: Nikolay Kudryavtsev <nikolay.kudryavtsev@gmail.com> > Cc: 23097@debbugs.gnu.org > Date: Mon, 17 Aug 2020 12:20:08 +0300 > > This is not an external software bug, but very much an Emacs bug. > > I'm not sure what was the initial design idea for CASECHARS and > NOT-CASECHARS, but whatever it was, it would not work effectively due to > feeding the entire line. The most obvious practical use for them(being > able to spellcheck languages with completely different alphabets without > the spellchecker misfiring on either pass) would not work either. The original design was that a spell-checker supports a single language, and any text in other languages is a spelling mistake. This is still true for Ispell and for Aspell; only Hunspell (and Enchant, when it uses Hunspell as its back-end) supports multiple languages. With Hunspell, ispell.el effectively ignores CASECHARS and NOT-CASECHARS, and instead uses the character set specified by the dictionary file itself. This is the only multi-dictionary spell-checking configuration that ispell.el currently supports. Which is why, when you first reported this, I asked you why you couldn't use Hunspell; your answer, which described some kind of failure related to encoding, I couldn't understand then and I don't understand now (primarily because that feature works for me). Instead, you seem to insist on using Aspell in a way that to me sounds like a kludge: spell-check the region with one dictionary, then restart ispell.el with another dictionary and spell-check the same region again. AFAIU, you'd like ispell.el to support this kind of workaround OOTB. Is that correct, or did I miss something? If my understanding is correct, then, apart of being a kludgey solution for a problem that has a much cleaner one, I don't think I understand how this could work well in general. Suppose you have in your buffer a mis-spelled word such as this: fooЫbar with the Cyrillic letter being there by accident: perhaps you unintentionally pressed a key when you shouldn't have. Or imagine the following typo: fooбар which could happen if you forgot to switch the input method. With your proposed mode of operation, the spell-checker will check partial words and decide that in both cases there's no spelling mistakes here, because each partial word is spelled correctly in its language. But clearly these are typos that need to be flagged. Thus, just using 2 sets of characters is not enough to handle these typos intelligently, as you'd get a lot of false negatives. So even if we consider your report as a feature request, it is not entirely clear to me how to implement such a feature. And frankly, since at least one spell-checker exists which supports multiple dictionaries, it is not clear to me why we should try so hard forcing Aspell look as if it did, too. > The ideal pratical fix for this should spellcheck such lines word by word. I think I show above why such simplistic strategy will backfire by leaving some typos undetected. ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <08f3ac29-761c-ced7-1e2f-0f643512b986@gmail.com>]
* bug#23097: 24.5; ispell.el: lines with both CASECHARS and NOT-CASECHARS get sent to the spell checker [not found] ` <08f3ac29-761c-ced7-1e2f-0f643512b986@gmail.com> @ 2020-10-13 17:00 ` Eli Zaretskii [not found] ` <83mu0q132o.fsf@gnu.org> 1 sibling, 0 replies; 11+ messages in thread From: Eli Zaretskii @ 2020-10-13 17:00 UTC (permalink / raw) To: Nikolay Kudryavtsev; +Cc: stefan, 23097 > From: Nikolay Kudryavtsev <nikolay.kudryavtsev@gmail.com> > Cc: stefan@marxist.se, 23097@debbugs.gnu.org > Date: Tue, 13 Oct 2020 16:19:10 +0300 > > Anyway, Hunspell IMHO is sort of besides the point for this discussion. > This bug is about ispell.el not performing in a way a user would > realistically expect from its public facing API. Which expectations from what public API are being violated here? ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <83mu0q132o.fsf@gnu.org>]
* bug#23097: 24.5; ispell.el: lines with both CASECHARS and NOT-CASECHARS get sent to the spell checker [not found] ` <83mu0q132o.fsf@gnu.org> @ 2020-10-14 19:20 ` Nikolay Kudryavtsev 0 siblings, 0 replies; 11+ messages in thread From: Nikolay Kudryavtsev @ 2020-10-14 19:20 UTC (permalink / raw) To: Eli Zaretskii; +Cc: stefan, 23097 The whole ispell-dictionary-alist structure implies that matching would be done word by word. And looking into the dictionary setup is the first thing ispell.el user would do. Apart from NOT-CASECHARS it also has this element: > OTHERCHARS is a regexp of characters in the NOT-CASECHARS set but > which can be > used to construct words in some special way. If OTHERCHARS characters > follow > and precede characters from CASECHARS, they are parsed as part of a word, > otherwise they become word-breaks... Basically presence of both NOT-CASECHARS and OTHERCHARS implies that ispell.el does strict word by word matching. If we're just sending any line that contains a CASECHARS match, we don't really need either of them, since we can just match by CASECHARS alone and then send the line. Oh, and there's another thing. Ispell.el actually does word by word search, but only on resume. Try my recipe again, just make the last line of spellchecked buffer to look like "doh kat". Then suspend the spellcheck after the first line and resume it with C-u M-$. You'd see that it skips the last line "doh" fine in this scenario. But then it suffers from the word mix problem described by Eli: spellchecking dohkat" and "katdoh" results in kat alone being sent. Thinking a bit more about this word mix problem, seems like it's not as simple to fix it as I thought in my previous letter, since we need some list of legitimate word separators for each language. -- Best Regards, Nikolay Kudryavtsev ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2020-10-14 19:20 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-03-23 18:11 bug#23097: 24.5; ispell.el: lines with both CASECHARS and NOT-CASECHARS get sent to the spell checker Nikolay Kudryavtsev 2016-03-23 18:22 ` Eli Zaretskii 2016-03-23 20:12 ` Nikolay Kudryavtsev 2020-08-15 4:22 ` Stefan Kangas 2020-08-15 16:15 ` Eli Zaretskii 2020-08-15 16:40 ` Stefan Kangas 2020-08-17 9:20 ` Nikolay Kudryavtsev 2020-08-17 12:48 ` Stefan Kangas 2020-08-17 16:40 ` Eli Zaretskii [not found] ` <08f3ac29-761c-ced7-1e2f-0f643512b986@gmail.com> 2020-10-13 17:00 ` Eli Zaretskii [not found] ` <83mu0q132o.fsf@gnu.org> 2020-10-14 19:20 ` Nikolay Kudryavtsev
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.