From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Juri Linkov Newsgroups: gmane.emacs.bugs Subject: bug#36923: Combining Diacritical Marks are not Latin only Date: Mon, 05 Aug 2019 22:41:59 +0300 Organization: LINKOV.NET Message-ID: <87zhknzc7c.fsf@mail.linkov.net> References: <87lfw8r744.fsf@mail.linkov.net> <83k1brd28a.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="116583"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (x86_64-pc-linux-gnu) Cc: 36923@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon Aug 05 21:59:10 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1huj8P-000UBS-LN for geb-bug-gnu-emacs@m.gmane.org; Mon, 05 Aug 2019 21:59:10 +0200 Original-Received: from localhost ([::1]:56774 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1huj8O-0006Wn-MU for geb-bug-gnu-emacs@m.gmane.org; Mon, 05 Aug 2019 15:59:08 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:54274) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1huj8J-0006S8-4I for bug-gnu-emacs@gnu.org; Mon, 05 Aug 2019 15:59:04 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1huj8I-0006lc-3x for bug-gnu-emacs@gnu.org; Mon, 05 Aug 2019 15:59:03 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:55318) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1huj8I-0006lX-0g for bug-gnu-emacs@gnu.org; Mon, 05 Aug 2019 15:59:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1huj8H-0007D8-VV for bug-gnu-emacs@gnu.org; Mon, 05 Aug 2019 15:59:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Juri Linkov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 05 Aug 2019 19:59:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 36923 X-GNU-PR-Package: emacs Original-Received: via spool by 36923-submit@debbugs.gnu.org id=B36923.156503513327701 (code B ref 36923); Mon, 05 Aug 2019 19:59:01 +0000 Original-Received: (at 36923) by debbugs.gnu.org; 5 Aug 2019 19:58:53 +0000 Original-Received: from localhost ([127.0.0.1]:35905 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1huj88-0007Ci-TZ for submit@debbugs.gnu.org; Mon, 05 Aug 2019 15:58:53 -0400 Original-Received: from antelope.elm.relay.mailchannels.net ([23.83.212.4]:1894) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1huj87-0007Ca-Fs for 36923@debbugs.gnu.org; Mon, 05 Aug 2019 15:58:52 -0400 X-Sender-Id: dreamhost|x-authsender|jurta@jurta.org Original-Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 5D8F12C269E; Mon, 5 Aug 2019 19:58:50 +0000 (UTC) Original-Received: from pdx1-sub0-mail-a2.g.dreamhost.com (100-96-86-80.trex.outbound.svc.cluster.local [100.96.86.80]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id E0A4C2C243F; Mon, 5 Aug 2019 19:58:49 +0000 (UTC) X-Sender-Id: dreamhost|x-authsender|jurta@jurta.org Original-Received: from pdx1-sub0-mail-a2.g.dreamhost.com ([TEMPUNAVAIL]. [64.90.62.162]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384) by 0.0.0.0:2500 (trex/5.17.5); Mon, 05 Aug 2019 19:58:50 +0000 X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|jurta@jurta.org X-MailChannels-Auth-Id: dreamhost X-Hysterical-Absorbed: 3afa208b1e30f0df_1565035130207_407947143 X-MC-Loop-Signature: 1565035130207:3849131590 X-MC-Ingress-Time: 1565035130206 Original-Received: from pdx1-sub0-mail-a2.g.dreamhost.com (localhost [127.0.0.1]) by pdx1-sub0-mail-a2.g.dreamhost.com (Postfix) with ESMTP id D2DC4837C8; Mon, 5 Aug 2019 12:58:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=linkov.net; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type:content-transfer-encoding; s=linkov.net; bh=biP8jY cpIuPK8lEh8860XdF0WOk=; b=OtZ7A1ywja2KrleZjkuVdGQAzmjhDyuMyx8Bje E3C+wiBcdRoeOHpCTgQSTad4EhCrboc9f8GfnNMzOgGDAXvT+1zk0yvKvhz9mqDd xWrZB6N6WzDVqsqk8PqaIkalRU2R1snmh/KgXgdxDMdqDimi0I1yJt35zPN9SHNy yuHzw= Original-Received: from mail.jurta.org (m91-129-103-91.cust.tele2.ee [91.129.103.91]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: jurta@jurta.org) by pdx1-sub0-mail-a2.g.dreamhost.com (Postfix) with ESMTPSA id 02CD8837BB; Mon, 5 Aug 2019 12:58:44 -0700 (PDT) X-DH-BACKEND: pdx1-sub0-mail-a2 In-Reply-To: <83k1brd28a.fsf@gnu.org> (Eli Zaretskii's message of "Mon, 05 Aug 2019 19:08:21 +0300") X-VR-OUT-STATUS: OK X-VR-OUT-SCORE: -100 X-VR-OUT-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgeduvddruddtkedgtddvucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuggftfghnshhusghstghrihgsvgdpffftgfetoffjqffuvfenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufhofhffjgfkfgggtgfgsehtkeertddtreejnecuhfhrohhmpefluhhrihcunfhinhhkohhvuceojhhurhhisehlihhnkhhovhdrnhgvtheqnecukfhppeeluddruddvledruddtfedrledunecurfgrrhgrmhepmhhouggvpehsmhhtphdphhgvlhhopehmrghilhdrjhhurhhtrgdrohhrghdpihhnvghtpeeluddruddvledruddtfedrledupdhrvghtuhhrnhdqphgrthhhpefluhhrihcunfhinhhkohhvuceojhhurhhisehlihhnkhhovhdrnhgvtheqpdhmrghilhhfrhhomhepjhhurhhisehlihhnkhhovhdrnhgvthdpnhhrtghpthhtohepvghlihiisehgnhhurdhorhhgnecuvehluhhsthgvrhfuihiivgeptd X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:164648 Archived-At: >> (aref char-script-table ?\N{COMBINING ACUTE ACCENT}) >> >> could return >> >> (latin greek cyrillic) >> >> instead of the current >> >> latin > > char-script-table is documented to yield a single symbol, so returning > a list would be an incompatible change, which we should avoid. The docstring of char-script-table says: Char table of script symbols. It has one extra slot whose value is a list of script symbols. So it seems char-script-table should yield a list of script symbols? I searched more for char-script-table in the documentation, and one place where it's used is forward-word. But I don't understand why forward-word doesn't stop between =E2=80=9CCOMBINING ACUTE ACCENT=E2=80=9D= (that is the Latin script) and non-Latin letters. This is good that it doesn't stop here, and I'm just trying to understand why - so the same logic could be used in markchars-mode. Maybe it doesn't stop because of special script handling in =E2=80=98find-word-boundary-function-table=E2=80=99? Or because it ignor= es all combining characters? BTW, while looking at forward-word and right-word I noticed inconsistency= : there are left-word and right-word commands, but no left-sexp and right-s= exp to accompany forward-sexp. > More generally, I think what you describe is a clear conceptual bug in > markchars-mode: it should only pay attention to the script of the base > characters, not to the script of combining accents. The latter is > mostly irrelevant, certainly so for the purpose of detecting > confusables. Could you suggest a proper function to strip all combining characters from the string?