From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#42602: Wrong (not-)casechars value for "polish" in ispell-dictionary-base-alist Date: Thu, 30 Jul 2020 16:26:07 +0300 Message-ID: <831rktf828.fsf@gnu.org> References: <2f58556a-8f0f-f923-2716-5366d66fa44d@gmail.com> <83h7tqf9h1.fsf@gnu.org> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="6850"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 42602@debbugs.gnu.org To: Sebastian Urban Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Jul 30 15:27:17 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1k18aW-0001aV-UG for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 30 Jul 2020 15:27:12 +0200 Original-Received: from localhost ([::1]:56120 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k18aV-0004Ao-VH for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 30 Jul 2020 09:27:12 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:41402) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k18aN-00048E-M8 for bug-gnu-emacs@gnu.org; Thu, 30 Jul 2020 09:27:03 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:52035) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1k18aM-0006SD-SC for bug-gnu-emacs@gnu.org; Thu, 30 Jul 2020 09:27:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1k18aM-0002qc-E8 for bug-gnu-emacs@gnu.org; Thu, 30 Jul 2020 09:27:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 30 Jul 2020 13:27:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 42602 X-GNU-PR-Package: emacs Original-Received: via spool by 42602-submit@debbugs.gnu.org id=B42602.159611559510911 (code B ref 42602); Thu, 30 Jul 2020 13:27:02 +0000 Original-Received: (at 42602) by debbugs.gnu.org; 30 Jul 2020 13:26:35 +0000 Original-Received: from localhost ([127.0.0.1]:35348 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1k18Zu-0002pv-HN for submit@debbugs.gnu.org; Thu, 30 Jul 2020 09:26:34 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:59228) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1k18Zs-0002ph-AI for 42602@debbugs.gnu.org; Thu, 30 Jul 2020 09:26:34 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:37839) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k18Zm-0006Nw-HD; Thu, 30 Jul 2020 09:26:26 -0400 Original-Received: from [176.228.60.248] (port=1891 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1k18Zl-0002rK-JQ; Thu, 30 Jul 2020 09:26:26 -0400 In-Reply-To: (message from Sebastian Urban on Thu, 30 Jul 2020 13:39:55 +0200) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:183703 Archived-At: > From: Sebastian Urban > Cc: 42602@debbugs.gnu.org > Date: Thu, 30 Jul 2020 13:39:55 +0200 > > > I don't understand this change. Values above octal 377 cannot be > > right in the above regexps, because they are supposed to be in > > Latin-2 encoding, which is a single-byte encoding, and so can only > > handle values below octal 400. How did you come up with those > > values? > > Basically, C-x = on a char, which gave me octal values. This gives you the Unicode codepoint, not its Latin-2 encoding. They are different. The database in ispell.el uses Latin-2 encodings of Polish characters. > Well, I did some tests, e.g. switched back to the original value of > "polish" in my "pl" dictionary, and... it works. And if I change from > iso-8859-2 to utf-8 in my "pl" (with original value from "polish") it > doesn't work. So, as you later wrote - wrong character encoding, > I guess. > > Looking for a cause (in default settings), I think I found it in > ispell-dictionary-base-alist and ispell-dictionary-alist. During > "transfer" from *-base-* to ispell-dictionary-alist, the value of > CHARACTER-SET is changed in all cases from iso-* or cp1255 to utf-8, > then ispell uses these (from ispell-dictionary-alist) when it "talks" > with Aspell. > > On the other hand, if I use Emacs 26.3 from Cygwin, everything works > out of the box, I don't even have to set "polish" as default > dictionary. But there, in Cygwin command line, "env | grep LANG" gives > "LANG=pl_PL.UTF-8". Native MinGW builds cannot use the UTF-8 encoding. So, do we have a problem to solve, or can this issue be closed?