From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#42602: Wrong (not-)casechars value for "polish" in ispell-dictionary-base-alist Date: Wed, 29 Jul 2020 21:43:22 +0300 Message-ID: <83h7tqf9h1.fsf@gnu.org> References: <2f58556a-8f0f-f923-2716-5366d66fa44d@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="29676"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 42602@debbugs.gnu.org To: Sebastian Urban Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Wed Jul 29 20:44:11 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1k0r3j-0007ar-8f for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 29 Jul 2020 20:44:11 +0200 Original-Received: from localhost ([::1]:41352 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k0r3i-0004Hp-7b for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 29 Jul 2020 14:44:10 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:40654) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k0r3a-0004Fs-R6 for bug-gnu-emacs@gnu.org; Wed, 29 Jul 2020 14:44:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:50403) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1k0r3a-0007BY-HQ for bug-gnu-emacs@gnu.org; Wed, 29 Jul 2020 14:44:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1k0r3a-0004ND-FT for bug-gnu-emacs@gnu.org; Wed, 29 Jul 2020 14:44:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 29 Jul 2020 18:44:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 42602 X-GNU-PR-Package: emacs Original-Received: via spool by 42602-submit@debbugs.gnu.org id=B42602.159604821516769 (code B ref 42602); Wed, 29 Jul 2020 18:44:02 +0000 Original-Received: (at 42602) by debbugs.gnu.org; 29 Jul 2020 18:43:35 +0000 Original-Received: from localhost ([127.0.0.1]:33716 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1k0r39-0004MO-J7 for submit@debbugs.gnu.org; Wed, 29 Jul 2020 14:43:35 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:58640) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1k0r37-0004MC-Mt for 42602@debbugs.gnu.org; Wed, 29 Jul 2020 14:43:33 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:51932) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k0r32-00077p-E2; Wed, 29 Jul 2020 14:43:28 -0400 Original-Received: from [176.228.60.248] (port=1083 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1k0r31-0001Oq-20; Wed, 29 Jul 2020 14:43:27 -0400 In-Reply-To: <2f58556a-8f0f-f923-2716-5366d66fa44d@gmail.com> (message from Sebastian Urban on Wed, 29 Jul 2020 18:12:02 +0200) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:183674 Archived-At: > From: Sebastian Urban > Date: Wed, 29 Jul 2020 18:12:02 +0200 > > for words like: > męski > miód > klątwa > ślad > łuk > żaba > źrebak > grzać > bańka > ispell.el sends to Aspell only part of the word, e.g. "lad" instead of > "ślad", or "kl"/"twa" (depending on the cursor position) instead of > "klątwa". > > I think this is because wrong value of (NOT-)CASECHARS, which is ASCII > A-z letters and a few chars of which only ó/Ó is valid for Polish. > > Although, for some reason, it doesn't recognize "ó" in word "miód", > sending "mi" or "d". It is on the list of CASECHARS under \363, so it > should work. Moreover, if I type in regexp-builder "[\363\323]" it > won't recognize ó/Ó, but it doesn't have a problem with other Polish > chars, like "ł" ("[\502]") or "ż" ("[\574]"). > > If I put in my init.el: > --8<---------------cut here---------------start------------->8--- > (setq ispell-program-name "C:/cygwin64/bin/aspell") > (add-hook 'ispell-initialize-spellchecker-hook > (lambda () > (add-to-list 'ispell-local-dictionary-alist > '("pl" > ;; "[[:alpha:]]" > ;; "[^[:alpha:]]" > ;; ęóąśłżźćńĘÓĄŚŁŻŹĆŃ > "[A-Za-z\431\363\405\533\502\574\572\407\504\430\323\404\532\501\573\571\406\503]" > "[^A-Za-z\431\363\405\533\502\574\572\407\504\430\323\404\532\501\573\571\406\503]" > "[.]" nil nil nil iso-8859-2)))) > (setq ispell-dictionary "pl") > --8<---------------cut here---------------start------------->8--- > > everything seems to work, even ó/Ó are recognised. I don't understand this change. Values above octal 377 cannot be right in the above regexps, because they are supposed to be in Latin-2 encoding, which is a single-byte encoding, and so can only handle values below octal 400. How did you come up with those values? Anyway, I'm quite sure some other factor is at work here. > Tested on: > - GNU Emacs 26.3 (build 1, x86_64-w64-mingw32) of 2019-08-29, > - GNU Emacs 28.0.50 (build 1, x86_64-w64-mingw32) of 2020-07-05, > with Aspell from Cygwin installation. Your Emacs is a native MinGW build, whereas Aspell seems to be a Cygwin build? If so, you could have incompatibility in character encoding. What is your Windows locale? And what does M-: (getenv "LANG") RET yield inside Emacs?