From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Robert Pluim Newsgroups: gmane.emacs.devel Subject: ignoring combining diacritics in isearch Date: Wed, 23 Nov 2022 18:27:25 +0100 Message-ID: <878rk1fuo2.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="14143"; mail-complaints-to="usenet@ciao.gmane.io" To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Nov 23 18:28:24 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oxtXs-0003V1-Nc for ged-emacs-devel@m.gmane-mx.org; Wed, 23 Nov 2022 18:28:24 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oxtX4-0001iR-PA; Wed, 23 Nov 2022 12:27:34 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oxtX2-0001i0-RA for emacs-devel@gnu.org; Wed, 23 Nov 2022 12:27:32 -0500 Original-Received: from mail-wm1-x32b.google.com ([2a00:1450:4864:20::32b]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1oxtWz-0000Cp-H4 for emacs-devel@gnu.org; Wed, 23 Nov 2022 12:27:31 -0500 Original-Received: by mail-wm1-x32b.google.com with SMTP id o30so13594402wms.2 for ; Wed, 23 Nov 2022 09:27:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date :gmane-reply-to-list:subject:to:from:from:to:cc:subject:date :message-id:reply-to; bh=z5U3fu/IWiPlNrRqh9pYFyg6GzXBMR/jEgJRm8miGrY=; b=LhOD8H7LetDORZqBknDL9ElxJ8/OrZ+mV9ABQ8A26+hssDBPnLOyn7EN656b0dcR7D 0jl0o3eWmSfLp3BUe2TxUGGCmojcnU/xMULrmVpFMKzr2ZN+rsC9DwAiDDmQxOoeXIhP CZB14B3eYiNhezMUqnXn+v78tnclO84UnmdlWkBAx59GNpWiLX3QJD78d+N6D5WUPBUR GWa5QQyO7QnLQCypYbY5VUcFe2NPBTao55goLxFvWFK9/vAikYbwnirBOh2eUmrNnO0/ On7ihThAK9rVGWd266f/R+Vztay020lNecYuV8OcuIUbmcX13PQlRNW83MeCxwh6Vxmu lriA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date :gmane-reply-to-list:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=z5U3fu/IWiPlNrRqh9pYFyg6GzXBMR/jEgJRm8miGrY=; b=FMhfxR/sLFxVix+f6aOdNYRahv+lrXF3tt3wgmkbTML4+o+OgSW0uK0Hpjh9p0F2A9 OgxeUhAOlR3SEwG78Wk4rYUUiEat2GeuePY3vbr6m4Njfu6w1eNFZSSyUlGRftqZCexi Xerbmu8evIsBlriDf351uH7mBw4dLINIu6t0hwLwZ7HTJGZDfej+fFO0nBl62Np9qXGW 0Zzbx0ayKo1c8s443GlqLAQfDApZGa0eTnm4qlK5oLuOnev5cXvq69t4GRQCf4/R+AwR +Vh8BaRBeU9D9UvLH6Gv5VqsQNDfYMS8FtEbUpT5Bn2K+TcgAnuxG5rzNCF2q7ODQC6Y Outg== X-Gm-Message-State: ANoB5pnL2UsH7nYtI5Hl9XnbZdFwA1ZBtFqume55f2v5GHmCg2cUwV50 bILK8roM81vTd8Mg/TbttE1IbZTmXKE= X-Google-Smtp-Source: AA0mqf670G5r/lBdT9CkorWuDfIgy5AL7ShnC64RHaanU8dNltEdt+3IwRpnket8Oy/js4XikzWWgQ== X-Received: by 2002:a7b:c046:0:b0:3cf:d58f:f66e with SMTP id u6-20020a7bc046000000b003cfd58ff66emr8127728wmc.165.1669224446645; Wed, 23 Nov 2022 09:27:26 -0800 (PST) Original-Received: from rltb ([82.66.8.55]) by smtp.gmail.com with ESMTPSA id r11-20020a05600c35cb00b003a84375d0d1sm3253935wmq.44.2022.11.23.09.27.25 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 23 Nov 2022 09:27:26 -0800 (PST) Gmane-Reply-To-List: yes Received-SPF: pass client-ip=2a00:1450:4864:20::32b; envelope-from=rpluim@gmail.com; helo=mail-wm1-x32b.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:300401 Archived-At: Over on Stack Overflow, someone has been trying to get char-folded isearch working for Arabic, and has been having some issues because char-folding only works for equivalent characters, not base characters followed by combining characters. So eg searching for 'ee' when the buffer contains e=CC=81e=CC=81 (that=CA=BCs 'e' followed by COMBINING ACUTE ACCENT) fails. The following patch fixes that, but it=CA=BCs a bit of a sledgehammer (the "\\c^*" bit probably needs to be configurable, because there are diacritic-like codepoints in Arabic that are not combining, such as U+0640 ARABIC TATWEEL) diff --git c/lisp/char-fold.el i/lisp/char-fold.el index 43e3cd45ec..8e9fdd7f37 100644 --- c/lisp/char-fold.el +++ i/lisp/char-fold.el @@ -209,7 +209,11 @@ ;; is used by `describe-char-fold-equivalences'. (map-char-table (lambda (char decomp-list) - (let ((re (regexp-opt (cons (char-to-string char) decomp-list))= )) + (let ((re + (concat "\\(?:" + (string-join (cons (char-to-string char) decomp-= list) + "\\c^*\\|") + "\\c^*\\)"))) (aset equiv char re))) equiv)) equiv))) Thoughts? Robert --=20