From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: "Basil L. Contovounesios" Newsgroups: gmane.emacs.devel Subject: Re: search-default-mode char-fold-to-regexp and Greek Extended block characters Date: Thu, 25 Jul 2019 01:18:16 +0100 Message-ID: <87ef2f0xx3.fsf@tcd.ie> References: <834l3ium3f.fsf@gnu.org> <83wogduc41.fsf@gnu.org> <83h87cpzml.fsf@gnu.org> <87r26gv6k2.fsf@mail.linkov.net> <87blxj3u4e.fsf@mail.linkov.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="22723"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) Cc: emacs-devel@gnu.org To: Juri Linkov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Jul 25 02:18:33 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1hqRSp-0005lU-Jy for ged-emacs-devel@m.gmane.org; Thu, 25 Jul 2019 02:18:31 +0200 Original-Received: from localhost ([::1]:55032 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hqRSo-00057n-0n for ged-emacs-devel@m.gmane.org; Wed, 24 Jul 2019 20:18:30 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:40057) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hqRSf-00057U-I6 for emacs-devel@gnu.org; Wed, 24 Jul 2019 20:18:22 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hqRSe-0000aS-CJ for emacs-devel@gnu.org; Wed, 24 Jul 2019 20:18:21 -0400 Original-Received: from mail-wr1-x42b.google.com ([2a00:1450:4864:20::42b]:33452) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hqRSd-0000a5-Vs for emacs-devel@gnu.org; Wed, 24 Jul 2019 20:18:20 -0400 Original-Received: by mail-wr1-x42b.google.com with SMTP id n9so48848302wru.0 for ; Wed, 24 Jul 2019 17:18:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tcd-ie.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-transfer-encoding; bh=A4bu55G7Urzpo/MWuZsVvoKYgQwY/DBANM76BauBXQw=; b=zpvvAalACZLXooW5rk2N7vimpJCWOEFw8l76guPambPfksgiUb1TRIZYM8m3x6YsjE wkYZRKEW2s0IVmTq9EUTwHqq8WYKMgzCkoXoGholu/qvmVAu0G50kTiL3AG+YcqVg46t FJy3KN1S3H3RCIx2BJxsFePXkdTmqAlQBHxxZFpYPP8p2GGK/kZGebkZMH20L3B2Vg8/ 0AFyMYsOqC6vgMwV5L6QCkVbVdSllFe6Ia2gdudb3QOgxG94fSV71/DFpV2H4ymFahcH N9FWBJW2uN8CyRQ2qFERPHxBUZ5V89IbBgZZdL/+FxsAXeaXmLUaHbDKAAT8qrhSbfXM mT1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version:content-transfer-encoding; bh=A4bu55G7Urzpo/MWuZsVvoKYgQwY/DBANM76BauBXQw=; b=qmWwFmhTU1uat98Nw9GkmAUa+nPlGRFqj+o9r7L8Zrj2hyVJ7NyyuBCfNLUWsLj+W1 zvACmU+dwjldwCln+82XCZhqTXWDSJFCztB2PqgOD9pt9cyUQM0V6+D33kprgwGgU0PD 5K11izjnH+qFQVK7ESbzO18MOOfdJLVFbyo9zTbTPM6gwQ7Yamz/Kj59jdzy0Rzvy/ii lyucIJMhtwspGrO13Q0/rEdd99tIz5UFFRpICG5RD34+aVkxhlScwF1sv1xcoy++ESoS KZTTI9yuPtt93xWZzX1nRqojsjUlmNP0LVOgihaqcROF3gVf4YeG7Md24+ummodnbKBp 8bew== X-Gm-Message-State: APjAAAXEBY0EAfLCjD4aNS2NdcaYB2TcnS84XeUNJh9QSTTz+tOykgT6 dbDdFiWJqOFXb45Vga57KF3NRxPEvxk= X-Google-Smtp-Source: APXvYqw/oNcK5cN0fR3HN9fHGkPozCOWS7t4hnppb1JboeCpP0EDSQsgAP6QB+g+YwnZxmBI8YrytQ== X-Received: by 2002:adf:ea82:: with SMTP id s2mr83713783wrm.91.1564013898497; Wed, 24 Jul 2019 17:18:18 -0700 (PDT) Original-Received: from localhost ([2a02:8084:20e2:c380:1f68:7ff5:120d:64e]) by smtp.gmail.com with ESMTPSA id o7sm40526820wmc.36.2019.07.24.17.18.17 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Wed, 24 Jul 2019 17:18:17 -0700 (PDT) In-Reply-To: <87blxj3u4e.fsf@mail.linkov.net> (Juri Linkov's message of "Thu, 25 Jul 2019 02:12:01 +0300") X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::42b X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:238878 Archived-At: Juri Linkov writes: >> Juri> Thanks! Could you please look why tests fail to validate matc= hing of >> Juri> n-level decomposition. The character with 3 level decompositi= on in >> Juri> char-fold--test-without-customization is currently commented o= ut as >> Juri> FIXME. After uncommenting this test fails, and I don't unders= tand why. >> >> That test ends up doing >> >> (string-match "\\`\\(?:=CE=B9[=CC=80=CC=81=CC=84=CC=86=CC=88=CC=93=CC=94= =CD=82]\\|[=CE=90=CE=AF=CE=B9=CF=8A=E1=BC=B0-=E1=BC=B7=E1=BD=B6=E1=BD=B7=E1= =BE=BE=E1=BF=90-=E1=BF=93=E1=BF=96=E1=BF=97=F0=9D=9B=8A=F0=9D=9C=84=F0=9D= =9C=BE=F0=9D=9D=B8=F0=9D=9E=B2]\\)\\'" "=CE=99=CC=88=CC=81") >> >> because it does (upcase "=E1=BF=93") =3D> =CE=99=CC=88=CC=81 >> >> That character is GREEK SMALL LETTER IOTA WITH DIALYTIKA AND OXIA, and >> as far as I can tell there is no CAPITAL variant of that letter, so >> upcase can=CA=BCt return it, which means it returns GREEK CAPITAL LETTER >> IOTA plus the diacriticals, which is obviously not going to >> match. > > This is an interesting case like (upcase "=C3=9F") =3D> "SS" that required > adding (?=C3=9F "ss") to pass the tests. It is probably this way because all caps are not usually (if ever) accented in Greek, so the only time upper-case letters take accents is at the start of capitalised words, where dialytika can never appear, as dialytika only make sense on the second of two consecutive vowels. > So I guess we need to add (?=CE=B9 "=CE=B9=CC=88=CC=81") for the tests to= pass: [...] > But this is only for char-fold--test-with-customization. OTOH, for > char-fold--test-without-customization we need also to change the default > value in char-fold.el like: [...] Can you please explain why iota with dialytika and tonos needs to be special-cased in these places? Thanks, --=20 Basil