From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Juri Linkov Newsgroups: gmane.emacs.devel Subject: Re: search-default-mode char-fold-to-regexp and Greek Extended block characters Date: Fri, 26 Jul 2019 21:40:58 +0300 Organization: LINKOV.NET Message-ID: <87v9vozrjn.fsf@mail.linkov.net> References: <834l3ium3f.fsf@gnu.org> <83wogduc41.fsf@gnu.org> <83h87cpzml.fsf@gnu.org> <87r26gv6k2.fsf@mail.linkov.net> <87blxj3u4e.fsf@mail.linkov.net> <877e869dir.fsf@mail.linkov.net> <8336itnxgv.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="90428"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (x86_64-pc-linux-gnu) Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Jul 26 20:42:31 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1hr5Aj-000NMf-Du for ged-emacs-devel@m.gmane.org; Fri, 26 Jul 2019 20:42:29 +0200 Original-Received: from localhost ([::1]:42984 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hr5Ai-0002xS-FL for ged-emacs-devel@m.gmane.org; Fri, 26 Jul 2019 14:42:28 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:59963) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hr5Ae-0002vV-Dm for emacs-devel@gnu.org; Fri, 26 Jul 2019 14:42:25 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hr5Ac-00008i-Er for emacs-devel@gnu.org; Fri, 26 Jul 2019 14:42:24 -0400 Original-Received: from brown.elm.relay.mailchannels.net ([23.83.212.23]:24331) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hr5AN-0008KW-Hk; Fri, 26 Jul 2019 14:42:09 -0400 X-Sender-Id: dreamhost|x-authsender|jurta@jurta.org Original-Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id F30831A16C6; Fri, 26 Jul 2019 18:42:03 +0000 (UTC) Original-Received: from pdx1-sub0-mail-a46.g.dreamhost.com (100-96-123-39.trex.outbound.svc.cluster.local [100.96.123.39]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id 5B1C71A1162; Fri, 26 Jul 2019 18:42:03 +0000 (UTC) X-Sender-Id: dreamhost|x-authsender|jurta@jurta.org Original-Received: from pdx1-sub0-mail-a46.g.dreamhost.com ([TEMPUNAVAIL]. [64.90.62.162]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384) by 0.0.0.0:2500 (trex/5.17.3); Fri, 26 Jul 2019 18:42:03 +0000 X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|jurta@jurta.org X-MailChannels-Auth-Id: dreamhost X-Exultant-Coil: 4c606390561d485a_1564166523767_745403117 X-MC-Loop-Signature: 1564166523767:2314506368 X-MC-Ingress-Time: 1564166523767 Original-Received: from pdx1-sub0-mail-a46.g.dreamhost.com (localhost [127.0.0.1]) by pdx1-sub0-mail-a46.g.dreamhost.com (Postfix) with ESMTP id 3972A7FD8D; Fri, 26 Jul 2019 11:41:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=linkov.net; h=from:to:cc :subject:in-reply-to:references:date:message-id:mime-version :content-type:content-transfer-encoding; s=linkov.net; bh=eRt54Q doZC9/QeRgCFRUsNyiOVQ=; b=Ukgc+Q+jc1mtYkxfazXREr+/a5RlV0frUyWS4J bW9uzyLZc3j/0/GrnfR/zxvkiV6U+1Oh4Z4hbzMCZGF83wfbFsh3kZyHPOLtDAwg 4KYNM+XXrV9yzW7mA4b8WedkNepwOQVuQq3lFAmYa2+aXNHoZTewj8/cnkL4uIaP yRHpM= Original-Received: from mail.jurta.org (m91-129-103-76.cust.tele2.ee [91.129.103.76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: jurta@jurta.org) by pdx1-sub0-mail-a46.g.dreamhost.com (Postfix) with ESMTPSA id 628187FD8A; Fri, 26 Jul 2019 11:41:56 -0700 (PDT) X-DH-BACKEND: pdx1-sub0-mail-a46 In-Reply-To: <8336itnxgv.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 26 Jul 2019 09:04:00 +0300") X-VR-OUT-STATUS: OK X-VR-OUT-SCORE: -100 X-VR-OUT-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgeduvddrkeeggdduvdelucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuggftfghnshhusghstghrihgsvgdpffftgfetoffjqffuvfenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufgjohhffgffkfggtgfgsehtkeertddtreejnecuhfhrohhmpefluhhrihcunfhinhhkohhvuceojhhurhhisehlihhnkhhovhdrnhgvtheqnecukfhppeeluddruddvledruddtfedrjeeinecurfgrrhgrmhepmhhouggvpehsmhhtphdphhgvlhhopehmrghilhdrjhhurhhtrgdrohhrghdpihhnvghtpeeluddruddvledruddtfedrjeeipdhrvghtuhhrnhdqphgrthhhpefluhhrihcunfhinhhkohhvuceojhhurhhisehlihhnkhhovhdrnhgvtheqpdhmrghilhhfrhhomhepjhhurhhisehlihhnkhhovhdrnhgvthdpnhhrtghpthhtohepvghlihiisehgnhhurdhorhhgnecuvehluhhsthgvrhfuihiivgeptd X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 23.83.212.23 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:238934 Archived-At: >> > But this one I don=CA=BCt understand. Searching for iota (capital or= small) >> > in a buffer containing =E1=BF=93 =CE=B9=CC=88=CC=81 or =CE=99=CC=88=CC= =81 already works with >> > char-fold-to-regexp, so why is this needed? >> >> Searching for =CE=B9 finds =CE=B9=CC=88=CC=81 only when searching for = a single letter =CE=B9 >> because the search matches the first part of =CE=B9=CC=88=CC=81 that c= ontains the base >> character =CE=B9 and ignores the remaining combining accents like =CC=88= =CC=81 >> >> So for testing you need to search for longer strings, e.g. >> in a buffer with this text "=E1=BF=93=CE=B9=CC=88=CC=81=CE=99=CC=88=CC= =81." try to search for "=CE=B9=CE=B9=CE=B9." >> >> It fails to find this text without adding (?=CE=B9 "=CE=B9=CC=88=CC=81= ") >> to char-fold-include. > > Maybe we should decide that this is a limitation of the current > implementation, and instead work on a more correct implementation, > which actually "folds" characters to their base variants as the search > proceed. > > Let's not forget that the current implementation was known to be > limited from the get-go, and we only accepted it because the "full" > one was too complex and required non-trivial changes on the C level. > So we shouldn't go too far into making the current implementation > support everything that the full one will inherently support. I consider the current regexp-based implementation as a fully functional prototype with complete test coverage, so after switching the implementation later to C, the same tests should still pass.