From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Robert Pluim Newsgroups: gmane.emacs.devel Subject: Re: search-default-mode char-fold-to-regexp and Greek Extended block characters, Re: search-default-mode char-fold-to-regexp and Greek Extended block characters Date: Thu, 25 Jul 2019 22:44:29 +0200 Message-ID: References: <83h87cpzml.fsf@gnu.org> <87r26gv6k2.fsf@mail.linkov.net> <87blxj3u4e.fsf@mail.linkov.net> <87ef2f0xx3.fsf@tcd.ie> <834l3ium3f.fsf@gnu.org> <83wogduc41.fsf@gnu.org> <83h87cpzml.fsf@gnu.org> <87r26gv6k2.fsf@mail.linkov.net> <87blxj3u4e.fsf@mail.linkov.net> <87a7d2asu3.fsf@mail.linkov.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="147719"; mail-complaints-to="usenet@blaine.gmane.org" Cc: "Basil L. Contovounesios" , emacs-devel@gnu.org To: Juri Linkov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Jul 25 22:44:53 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1hqkbZ-000cAq-Vz for ged-emacs-devel@m.gmane.org; Thu, 25 Jul 2019 22:44:50 +0200 Original-Received: from localhost ([::1]:35370 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hqkbY-0006bs-PU for ged-emacs-devel@m.gmane.org; Thu, 25 Jul 2019 16:44:48 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:50464) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hqkbV-0006bm-SA for emacs-devel@gnu.org; Thu, 25 Jul 2019 16:44:46 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hqkbP-0002MF-Ad for emacs-devel@gnu.org; Thu, 25 Jul 2019 16:44:42 -0400 Original-Received: from mail-wr1-x435.google.com ([2a00:1450:4864:20::435]:33957) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hqkbL-0002GK-Bz for emacs-devel@gnu.org; Thu, 25 Jul 2019 16:44:37 -0400 Original-Received: by mail-wr1-x435.google.com with SMTP id 31so52171588wrm.1 for ; Thu, 25 Jul 2019 13:44:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:mail-followup-to:mail-copies-to :gmane-reply-to-list:date:in-reply-to:message-id:mime-version :content-transfer-encoding; bh=2hfWbS9LsDpSeytdyWAWwqwp4oWRMM4krk6FdqNwPBc=; b=r3dWPQ12lAlA6beEzmALiwSwkUtSwLHDCPGlfJKQu707SBPWbDQ8PMpiT0GAOYQYQC zjeFfX2EBZoBAMqKd4pgK1Yd30r0KgNYHQn8eyf5ZK8bTFlkDT/PoWd76Y0zbm5MbpHH PD4lj6WqJO3fmwc7xCYDIIvaAbyKu/C/Xom8Uyjx6GRjHMvP01zyqJqyqU8BCebrTrnM gzn0/amicxLibAxJf2d2dwwm1vxfidXx7UfTBUzRuRbTMJ5CR4myPENftze6aZEW7rdV 4SiqiemDiMdRZrx8sIV1vCjYvZBkPRwXVsxyuaRrGCPXDuX8YbB+cuyhB7R64yiZp1k3 dBYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:mail-followup-to :mail-copies-to:gmane-reply-to-list:date:in-reply-to:message-id :mime-version:content-transfer-encoding; bh=2hfWbS9LsDpSeytdyWAWwqwp4oWRMM4krk6FdqNwPBc=; b=ZmWg1WkYwK+NmciMM5iKWaP8E2OwppfGN8dO2dk5r+j2LsmBjKy4tctd0zQy+nXG9m cmBQVuvQRczNxOgTVT+bYxM4mas8z10G+u5MKDuufAC0o+b20CaNe+Loxu3edMqCYMtN NQAAMqzrigAtH/DAat3NrY3a/uaDs/1PDsxyVL+E640U4GmogmA5I4DWQnXMjXjw7sqD hT16Yl9oAj0uFV92RzpdRjhDtqnBNMbjWYwn+PoO0si8+nYz1v4afSTNPJ5JBlWcUqLA 6DCywXwlGWm2AF77pz1B9NcjqhB9JZ4mY539qQ25pJIcaPtDPlJ9l1KAgfxZuN83CsWM xRGg== X-Gm-Message-State: APjAAAV+zLNT0egO6EF8QSd+sSOFfnPfkyI1CM7AaHj27lm5wBE7zdMH Qn/ZcwRmZ2Fkvgqsmu8qRxNDgvuI X-Google-Smtp-Source: APXvYqyYhxd9ONf7L+b0ZjorBT4RRtoYVtIsjkz8+UXDXs4d/78XidUuTNSCk4TbfwLhdWIlG8m6Uw== X-Received: by 2002:adf:f883:: with SMTP id u3mr95823199wrp.0.1564087471261; Thu, 25 Jul 2019 13:44:31 -0700 (PDT) Original-Received: from rpluim-mac ([2a01:e34:ecfc:a860:9cc0:156d:2937:e07c]) by smtp.gmail.com with ESMTPSA id f204sm74320173wme.18.2019.07.25.13.44.29 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 25 Jul 2019 13:44:29 -0700 (PDT) Mail-Followup-To: emacs-devel@gnu.org Mail-Copies-To: never Gmane-Reply-To-List: yes In-Reply-To: <87a7d2asu3.fsf@mail.linkov.net> (Juri Linkov's message of "Thu, 25 Jul 2019 21:40:12 +0300, Thu, 25 Jul 2019 21:46:20 +0300") X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::435 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:238901 Archived-At: >>>>> On Thu, 25 Jul 2019 21:40:12 +0300, Juri Linkov sai= d: >> Can you please explain why iota with dialytika and tonos needs to be >> special-cased in these places? Juri> Here is the test case that demonstrates the need to add it Juri> to char-fold-include: Juri> 0. emacs -Q Juri> 1. Paste this text to *scratch*: "=CE=B9=CC=88=CC=81=CE=B9=CC=88= =CC=81" Juri> 2. Search for two IOTAs with char-fold, e.g.: C-s M-s ' =CE=B9=CE= =B9 Juri> The char-fold search doesn't match the characters with combining = accents Juri> with their base char GREEK SMALL LETTER IOTA. Juri> However, after adding (?=CE=B9 "=CE=B9=CC=88=CC=81") to char-fold= -include it can match the Juri> base character IOTA. Yes, I see the problem now. Maybe this can be solved by adding that mapping when building char-fold-table. Or 'those mappings' I should say, since there are going to be many cases like this. How about the following? It passes your tests with the FIXMEs uncommented (and isearch for multiple iotas matches multiple iotas + combining diacriticals). I deliberately restricted it to lower case characters, since the roundtripping fails for =C4=B0 and a large number of titlecase characters. diff --git i/lisp/char-fold.el w/lisp/char-fold.el index f379229e6c..91fd7ddc28 100644 --- i/lisp/char-fold.el +++ w/lisp/char-fold.el @@ -108,6 +108,17 @@ (car next-decomp))) (funcall make-decomp-match-char (list (car next= -decomp)) char))) (setq dec next-decomp))) + ;; If there is no precomposed uppercase version of a + ;; character with diacriticals, we also add a mapping + ;; from the base character to the base character with + ;; combining diacriticals + (when (eq (get-char-code-property char 'general-category) '= Ll) + (let* ((str (char-to-string char)) + (upper (upcase str)) + (roundtrip (downcase upper))) + (when (> (length roundtrip) 1) + (aset equiv (aref roundtrip 0) + (cons roundtrip (aref equiv (aref roundtrip 0))= ))))) ;; Do it again, without the non-spacing characters. ;; This allows 'a' to match '=C3=A4'. (let ((simpler-decomp nil)