From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Juri Linkov Newsgroups: gmane.emacs.bugs Subject: bug#36923: Combining Diacritical Marks are not Latin only Date: Sun, 04 Aug 2019 23:40:38 +0300 Organization: LINKOV.NET Message-ID: <87lfw8r744.fsf@mail.linkov.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="102811"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (x86_64-pc-linux-gnu) To: 36923@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sun Aug 04 22:50:19 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1huNSM-000QcT-KR for geb-bug-gnu-emacs@m.gmane.org; Sun, 04 Aug 2019 22:50:18 +0200 Original-Received: from localhost ([::1]:49502 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1huNSL-0005rK-LZ for geb-bug-gnu-emacs@m.gmane.org; Sun, 04 Aug 2019 16:50:17 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:37689) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1huNS7-0005qh-OE for bug-gnu-emacs@gnu.org; Sun, 04 Aug 2019 16:50:04 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1huNS6-0004z1-Mt for bug-gnu-emacs@gnu.org; Sun, 04 Aug 2019 16:50:03 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:53715) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1huNS6-0004yv-K6 for bug-gnu-emacs@gnu.org; Sun, 04 Aug 2019 16:50:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1huNS6-0007i4-Gc for bug-gnu-emacs@gnu.org; Sun, 04 Aug 2019 16:50:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Juri Linkov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 04 Aug 2019 20:50:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 36923 X-GNU-PR-Package: emacs X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.156495177629586 (code B ref -1); Sun, 04 Aug 2019 20:50:02 +0000 Original-Received: (at submit) by debbugs.gnu.org; 4 Aug 2019 20:49:36 +0000 Original-Received: from localhost ([127.0.0.1]:34301 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1huNRg-0007h8-DI for submit@debbugs.gnu.org; Sun, 04 Aug 2019 16:49:36 -0400 Original-Received: from lists.gnu.org ([209.51.188.17]:40772) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1huNRd-0007gu-N1 for submit@debbugs.gnu.org; Sun, 04 Aug 2019 16:49:35 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:37615) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1huNRc-0005oI-Io for bug-gnu-emacs@gnu.org; Sun, 04 Aug 2019 16:49:33 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1huNRb-0004n5-Gw for bug-gnu-emacs@gnu.org; Sun, 04 Aug 2019 16:49:32 -0400 Original-Received: from bonobo.birch.relay.mailchannels.net ([23.83.209.22]:18573) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1huNRb-0004mD-1V for bug-gnu-emacs@gnu.org; Sun, 04 Aug 2019 16:49:31 -0400 X-Sender-Id: dreamhost|x-authsender|jurta@jurta.org Original-Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 39A2A50105E for ; Sun, 4 Aug 2019 20:49:29 +0000 (UTC) Original-Received: from pdx1-sub0-mail-a13.g.dreamhost.com (100-96-15-31.trex.outbound.svc.cluster.local [100.96.15.31]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id 402C1500FF7 for ; Sun, 4 Aug 2019 20:49:28 +0000 (UTC) X-Sender-Id: dreamhost|x-authsender|jurta@jurta.org Original-Received: from pdx1-sub0-mail-a13.g.dreamhost.com ([TEMPUNAVAIL]. [64.90.62.162]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384) by 0.0.0.0:2500 (trex/5.17.5); Sun, 04 Aug 2019 20:49:28 +0000 X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|jurta@jurta.org X-MailChannels-Auth-Id: dreamhost X-Fumbling-Madly: 6fb21fde769b57b6_1564951768706_3383376482 X-MC-Loop-Signature: 1564951768706:819939509 X-MC-Ingress-Time: 1564951768705 Original-Received: from pdx1-sub0-mail-a13.g.dreamhost.com (localhost [127.0.0.1]) by pdx1-sub0-mail-a13.g.dreamhost.com (Postfix) with ESMTP id F0D197FE72 for ; Sun, 4 Aug 2019 13:49:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=linkov.net; h=from:to :subject:date:message-id:mime-version:content-type :content-transfer-encoding; s=linkov.net; bh=zTlrxpfIg1DPdouNZwE w5CdLjJk=; b=h9uim1bg9XOUJLlhbrxmqGsYYulVrbenI4dap6V8MbUGdZL8srZ 0KhKkQkvbYUa95bCblxw2htmVLDKsy4JAeHFZfe6cT+qTM4z4IPVr/FJFJQi9n8c 9wBICEDyB7D7ZAKWIelt7sbYNhFFH/Z8+Ylb92UAZxLEE4mbVL1an0QU= Original-Received: from mail.jurta.org (m91-129-103-91.cust.tele2.ee [91.129.103.91]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: jurta@jurta.org) by pdx1-sub0-mail-a13.g.dreamhost.com (Postfix) with ESMTPSA id A9EBF7E401 for ; Sun, 4 Aug 2019 13:49:21 -0700 (PDT) X-DH-BACKEND: pdx1-sub0-mail-a13 X-VR-OUT-STATUS: OK X-VR-OUT-SCORE: 0 X-VR-OUT-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgeduvddruddthedgudehhecutefuodetggdotefrodftvfcurfhrohhfihhlvgemucggtfgfnhhsuhgsshgtrhhisggvpdfftffgtefojffquffvnecuuegrihhlohhuthemuceftddtnecunecujfgurhephffvufhofffkfgggtgfgsehtkeertddtreejnecuhfhrohhmpefluhhrihcunfhinhhkohhvuceojhhurhhisehlihhnkhhovhdrnhgvtheqnecuffhomhgrihhnpeifihhkihhpvgguihgrrdhorhhgnecukfhppeeluddruddvledruddtfedrledunecurfgrrhgrmhepmhhouggvpehsmhhtphdphhgvlhhopehmrghilhdrjhhurhhtrgdrohhrghdpihhnvghtpeeluddruddvledruddtfedrledupdhrvghtuhhrnhdqphgrthhhpefluhhrihcunfhinhhkohhvuceojhhurhhisehlihhnkhhovhdrnhgvtheqpdhmrghilhhfrhhomhepjhhurhhisehlihhnkhhovhdrnhgvthdpnhhrtghpthhtohepsghughdqghhnuhdqvghmrggtshesghhnuhdrohhrghenucevlhhushhtvghrufhiiigvpedt X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:164596 Archived-At: The generated file lisp/international/charscript.el assigns the block =E2=80=9CCombining Diacritical Marks=E2=80=9D to the =E2= =80=98latin=E2=80=99 script on the assumption that these characters are used only in Latin. But in fact according to e.g. https://en.wikipedia.org/wiki/Acute_accent the acute accent marks the stressed vowel of a word in several languages with alphabets based on the Latin, Cyrillic, and Greek scripts. In particular https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode mentions how characters from other blocks are used in Cyrillic script. Moreover, the Combining Diacritical Marks block also contains several characters from the Greek script: COMBINING GREEK PERISPOMENI, COMBINING GREEK KORONIS COMBINING GREEK DIALYTIKA TONOS, COMBINING GREEK YPOGEGRAMMENI I noticed this problem recently while helping to develop char-fold where GREEK SMALL LETTER IOTA combined with COMBINING GREEK DIALYTIKA TONOS was alarmingly highlighted as =E2=80=9Cmixed scripts=E2=80=9D by markchars-mo= de from GNU ELPA. Of course, it's possible to add exceptions for characters in this block in markchars-mode. But before doing this, I'm asking a confirmation whether Unicode data should be fixed in =E2=80=98char-script-table=E2=80=99= , so e.g. (aref char-script-table ?\N{COMBINING ACUTE ACCENT}) could return (latin greek cyrillic) instead of the current latin