From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#36923: Combining Diacritical Marks are not Latin only Date: Mon, 05 Aug 2019 19:08:21 +0300 Message-ID: <83k1brd28a.fsf@gnu.org> References: <87lfw8r744.fsf@mail.linkov.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="238873"; mail-complaints-to="usenet@blaine.gmane.org" Cc: 36923@debbugs.gnu.org To: Juri Linkov Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon Aug 05 18:14:30 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1hufcz-001015-Ld for geb-bug-gnu-emacs@m.gmane.org; Mon, 05 Aug 2019 18:14:29 +0200 Original-Received: from localhost ([::1]:55802 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hufcy-0005tO-LL for geb-bug-gnu-emacs@m.gmane.org; Mon, 05 Aug 2019 12:14:28 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:60707) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hufXj-0001iq-SX for bug-gnu-emacs@gnu.org; Mon, 05 Aug 2019 12:09:05 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hufXi-0006rc-Nc for bug-gnu-emacs@gnu.org; Mon, 05 Aug 2019 12:09:03 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:55140) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hufXi-0006rW-KW for bug-gnu-emacs@gnu.org; Mon, 05 Aug 2019 12:09:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hufXi-0007ei-Eb for bug-gnu-emacs@gnu.org; Mon, 05 Aug 2019 12:09:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 05 Aug 2019 16:09:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 36923 X-GNU-PR-Package: emacs Original-Received: via spool by 36923-submit@debbugs.gnu.org id=B36923.156502132229392 (code B ref 36923); Mon, 05 Aug 2019 16:09:02 +0000 Original-Received: (at 36923) by debbugs.gnu.org; 5 Aug 2019 16:08:42 +0000 Original-Received: from localhost ([127.0.0.1]:35728 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hufXN-0007e0-OW for submit@debbugs.gnu.org; Mon, 05 Aug 2019 12:08:41 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:48288) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hufXM-0007dj-B6 for 36923@debbugs.gnu.org; Mon, 05 Aug 2019 12:08:40 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:35586) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hufXH-0006kL-52; Mon, 05 Aug 2019 12:08:35 -0400 Original-Received: from [176.228.60.248] (port=3263 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1hufXG-00046X-FF; Mon, 05 Aug 2019 12:08:34 -0400 In-reply-to: <87lfw8r744.fsf@mail.linkov.net> (message from Juri Linkov on Sun, 04 Aug 2019 23:40:38 +0300) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:164635 Archived-At: > From: Juri Linkov > Date: Sun, 04 Aug 2019 23:40:38 +0300 > > The generated file lisp/international/charscript.el > assigns the block “Combining Diacritical Marks” to the ‘latin’ script > on the assumption that these characters are used only in Latin. > > But in fact according to e.g. https://en.wikipedia.org/wiki/Acute_accent > the acute accent marks the stressed vowel of a word in several languages > with alphabets based on the Latin, Cyrillic, and Greek scripts. > In particular https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode > mentions how characters from other blocks are used in Cyrillic script. > Moreover, the Combining Diacritical Marks block also > contains several characters from the Greek script: > COMBINING GREEK PERISPOMENI, COMBINING GREEK KORONIS > COMBINING GREEK DIALYTIKA TONOS, COMBINING GREEK YPOGEGRAMMENI > > I noticed this problem recently while helping to develop char-fold where > GREEK SMALL LETTER IOTA combined with COMBINING GREEK DIALYTIKA TONOS was > alarmingly highlighted as “mixed scripts” by markchars-mode from GNU ELPA. > > Of course, it's possible to add exceptions for characters in this block > in markchars-mode. But before doing this, I'm asking a confirmation > whether Unicode data should be fixed in ‘char-script-table’, so e.g. > > (aref char-script-table ?\N{COMBINING ACUTE ACCENT}) > > could return > > (latin greek cyrillic) > > instead of the current > > latin char-script-table is documented to yield a single symbol, so returning a list would be an incompatible change, which we should avoid. More generally, I think what you describe is a clear conceptual bug in markchars-mode: it should only pay attention to the script of the base characters, not to the script of combining accents. The latter is mostly irrelevant, certainly so for the purpose of detecting confusables. So I think this should be fixed in markchars-mode, and the fact that we somewhat arbitrarily assign those diacritics to the latin script is not a serious problem, if at all.