From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Robert Pluim Newsgroups: gmane.emacs.devel Subject: Re: search-default-mode char-fold-to-regexp and Greek Extended block characters Date: Sun, 21 Jul 2019 13:03:37 +0200 Message-ID: References: <834l3ium3f.fsf@gnu.org> <83wogduc41.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="126444"; mail-complaints-to="usenet@blaine.gmane.org" Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Jul 21 13:03:47 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1hp9d4-000Woj-Ol for ged-emacs-devel@m.gmane.org; Sun, 21 Jul 2019 13:03:46 +0200 Original-Received: from localhost ([::1]:55408 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hp9d3-0002DP-LP for ged-emacs-devel@m.gmane.org; Sun, 21 Jul 2019 07:03:45 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:38792) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hp9d0-0002DH-SK for emacs-devel@gnu.org; Sun, 21 Jul 2019 07:03:43 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hp9cz-0003H3-Ky for emacs-devel@gnu.org; Sun, 21 Jul 2019 07:03:42 -0400 Original-Received: from mail-wm1-x336.google.com ([2a00:1450:4864:20::336]:37910) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hp9cz-0003GT-C6; Sun, 21 Jul 2019 07:03:41 -0400 Original-Received: by mail-wm1-x336.google.com with SMTP id s15so11340553wmj.3; Sun, 21 Jul 2019 04:03:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:mail-followup-to:mail-copies-to :gmane-reply-to-list:date:in-reply-to:message-id:mime-version :content-transfer-encoding; bh=9s7syYnVcpdZpDJm+4+v0LUP8MyZE5tJQdFLWKyNVc4=; b=TuI5Dkhr2aux97xJyGifvN+bPoXuJEfg9gXhaJzmstjEyoIoJoC8IddWhfHeTNoCnT cIvAuRFpEow0SFwGukbJ7ZOZR32Soo0cswFZwOxOil0mquYfnsB5vRpLNrLmiA1CWSNt uCQ3LstXN5ZC68TSsx2g2bcLlti/HawxECMB5dwcaysHgV2wLDdpvbJqcvcrtewl6mtL /k8Cqhup+2nTjD4MzwneuzTXS7jSQi6Lm7yWFY4A3f7uVrdbmZw9GY8H+peccvF9vFy2 FEB9moi2Hr1mCjgQGWID1xai7zyFJCGRD7dIMJUQJQYZhun9xQ3RqfXpwBZbXjbC8AoH FoyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:mail-followup-to :mail-copies-to:gmane-reply-to-list:date:in-reply-to:message-id :mime-version:content-transfer-encoding; bh=9s7syYnVcpdZpDJm+4+v0LUP8MyZE5tJQdFLWKyNVc4=; b=MH75mx+IaUuWUOJOgsnta/AtmV1F6+HbqYu9XRzihVAO77eycP/46af4ITD9MhmLxK AvqwJMBh8cmXRkNuSiE56GTJMrDTVmUBvjc6NYP1+m6dsm/X+C3ssa10a+ulrYibtrMl x35lben6aLAQaUq5J3VtN1H85unAzjKX8ZRCQixcjIACNBnaBaFPKgT63RMVuW8lj41g 5O9CMNmstFTvL/XdbRTeN7rasHcjLCdR4D/7g4jrwUe6xs2wunuDklqim2BBbtU/cN3Z +r3gvw1iq4rmkCA6o7Bvf1Z6e69JhZ7AJL4IkjT8laS7K4WNAbLkAh0bu0tSXF7QiCRD 33lA== X-Gm-Message-State: APjAAAWRmNbmnOqfuub4xWc9NpE0M1hewznapH5bErBuERD6lJV/FD9b mzRWt493zj1U03mlC0dp4ZQVqFRf X-Google-Smtp-Source: APXvYqwYy1Ccgw0HLf8vimpzoMJ6GnjqpWsWrmqGWR7M9jSH9QLkuKabpmnugb5jRl7QcT45bYjx0Q== X-Received: by 2002:a1c:7f57:: with SMTP id a84mr57439865wmd.3.1563707019228; Sun, 21 Jul 2019 04:03:39 -0700 (PDT) Original-Received: from rpluim-mac ([2a01:e34:ecfc:a860:c53e:f59d:3814:6a60]) by smtp.gmail.com with ESMTPSA id s10sm47025446wmf.8.2019.07.21.04.03.37 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Sun, 21 Jul 2019 04:03:38 -0700 (PDT) Mail-Followup-To: emacs-devel@gnu.org Mail-Copies-To: never Gmane-Reply-To-List: yes In-Reply-To: <83wogduc41.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 19 Jul 2019 21:13:02 +0300") X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::336 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:238762 Archived-At: >>>>> On Fri, 19 Jul 2019 21:13:02 +0300, Eli Zaretskii said: Eli> (get-char-code-property ?=E1=BD=B7 'decomposition) =3D> (943) ; = (#x03af) i.e (?=CE=AF) Eli> (get-char-code-property ?=CE=AF 'decomposition) =3D> (953 769) ;= (#x03b9 #x0301) Eli> Do we expand the decomposition property recursively? It sounds li= ke Eli> we don't, but maybe we should. We don=CA=BCt. The following patch allows searching for =CE=B9 (0x3b9) to m= atch both =CE=AF (0x3af) and =E1=BD=B7 (1f77). It doesn=CA=BCt recurse, but I ha= ve no idea if there are longer chains of decompositions. It causes (aref char-fold-table ?=CE=B9) to expand from: "\\(?:=CE=B9[=CC=80=CC=81=CC=84=CC=86=CC=88=CC=93=CC=94=CD=82]\\|[=CE=AF=CE= =B9=CF=8A=E1=BC=B0=E1=BC=B1=E1=BD=B6=E1=BE=BE=E1=BF=90=E1=BF=91=E1=BF=96=F0= =9D=9B=8A=F0=9D=9C=84=F0=9D=9C=BE=F0=9D=9D=B8=F0=9D=9E=B2]\\)" to: "\\(?:=CE=B9[=CC=80=CC=81=CC=84=CC=86=CC=88=CC=93=CC=94=CD=82]\\|[=CE=90=CE= =AF=CE=B9=CF=8A=E1=BC=B0-=E1=BC=B7=E1=BD=B6=E1=BD=B7=E1=BE=BE=E1=BF=90=E1= =BF=91=E1=BF=92=E1=BF=96=E1=BF=97=F0=9D=9B=8A=F0=9D=9C=84=F0=9D=9C=BE=F0=9D= =9D=B8=F0=9D=9E=B2]\\)" where the additions are basically all the variants of IOTA + one or more diacritical Even if we don=CA=BCt apply this or something like it, it=CA=BCs been educational. diff --git i/lisp/char-fold.el w/lisp/char-fold.el index 9d3ea17b41..bf2a4c2484 100644 --- i/lisp/char-fold.el +++ w/lisp/char-fold.el @@ -78,6 +78,20 @@ (cons (char-to-string char) (aref equiv (car decomp)))))))) (funcall make-decomp-match-char decomp char) + ;; Check to see if the first char of the decomposition + ;; has a further decomposition. If so, add a mapping + ;; back from that second decomposition to the original + ;; character. This allows e.g. '=CE=B9' (GREEK SMALL LETTER + ;; IOTA) to match both the Basic Greek block and + ;; Extended Greek block variants of IOTA + + ;; diacritical(s) + (let ((l2-decomp (char-table-range table (car decomp)))) + (when (consp l2-decomp) + (when (symbolp (car l2-decomp)) + (setq l2-decomp (cdr l2-decomp))) + (if (not (eq (car decomp) + (car l2-decomp))) + (funcall make-decomp-match-char (list (car l2-decom= p)) char)))) ;; Do it again, without the non-spacing characters. ;; This allows 'a' to match '=C3=A4'. (let ((simpler-decomp nil)