From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Artur Malabarba Newsgroups: gmane.emacs.bugs Subject: bug#20975: Replacing text add also the comma Date: Sat, 4 Jul 2015 22:31:45 +0100 Message-ID: References: <55970987.1060809@alice.it> <87si93d42x.fsf@mail.linkov.net> Reply-To: bruce.connor.am@gmail.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=089e0149419842725f051a1366cd X-Trace: ger.gmane.org 1436045541 4069 80.91.229.3 (4 Jul 2015 21:32:21 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 4 Jul 2015 21:32:21 +0000 (UTC) Cc: 20975@debbugs.gnu.org, Angelo Graziosi To: Juri Linkov Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sat Jul 04 23:32:13 2015 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1ZBV2h-0004A4-M2 for geb-bug-gnu-emacs@m.gmane.org; Sat, 04 Jul 2015 23:32:11 +0200 Original-Received: from localhost ([::1]:45258 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZBV2g-0003I6-SG for geb-bug-gnu-emacs@m.gmane.org; Sat, 04 Jul 2015 17:32:10 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:33953) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZBV2c-0003Gy-Vx for bug-gnu-emacs@gnu.org; Sat, 04 Jul 2015 17:32:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZBV2Z-0002Ud-Ku for bug-gnu-emacs@gnu.org; Sat, 04 Jul 2015 17:32:06 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:38550) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZBV2Z-0002UY-FH for bug-gnu-emacs@gnu.org; Sat, 04 Jul 2015 17:32:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1ZBV2Z-0000SI-28 for bug-gnu-emacs@gnu.org; Sat, 04 Jul 2015 17:32:03 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Artur Malabarba Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 04 Jul 2015 21:32:03 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 20975 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 20975-submit@debbugs.gnu.org id=B20975.14360455141734 (code B ref 20975); Sat, 04 Jul 2015 21:32:03 +0000 Original-Received: (at 20975) by debbugs.gnu.org; 4 Jul 2015 21:31:54 +0000 Original-Received: from localhost ([127.0.0.1]:39996 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1ZBV2Q-0000Rs-0d for submit@debbugs.gnu.org; Sat, 04 Jul 2015 17:31:54 -0400 Original-Received: from mail-la0-f52.google.com ([209.85.215.52]:33017) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1ZBV2N-0000Rb-Rz for 20975@debbugs.gnu.org; Sat, 04 Jul 2015 17:31:52 -0400 Original-Received: by laar3 with SMTP id r3so117424177laa.0 for <20975@debbugs.gnu.org>; Sat, 04 Jul 2015 14:31:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=Fk7uJ7hOxDJuTYl0Anj+RbV3lepEt1GejdvyirsktTc=; b=ICQ1SBNgd/bCKakArOR9qDiIia+a31X1pDs5KNceHOHQvpHrNIb5MFchG6Uf6/1uae Z4HtS2+XJuVIwZArol0eyqAh3PiiDVQc3ptHOL90N3FiXu3iqbAHFX2fdK9KJoCZ/Ds9 3ITwPAn49Gg/XI/ABiexxu+PnA1PQjt/hLuQc+9ZM6UDA33jDNTuI6ajC8Mfknw4LaM9 lIMVVdBbb+Hq0TxHPVH0EPZdW5jMyU8ZSf9WTDuzukMQW4okL6IZwk2YutQqf1/TiPld aH/ujzZsR+vHFmuumOZk8rkp8gfsZfqDfr/s7I9nEn7aGLI5xwiDwGx4s+JWR6JMWceZ x1Ug== X-Received: by 10.152.37.136 with SMTP id y8mr41866259laj.21.1436045505882; Sat, 04 Jul 2015 14:31:45 -0700 (PDT) Original-Received: by 10.25.214.133 with HTTP; Sat, 4 Jul 2015 14:31:45 -0700 (PDT) Original-Received: by 10.25.214.133 with HTTP; Sat, 4 Jul 2015 14:31:45 -0700 (PDT) In-Reply-To: <87si93d42x.fsf@mail.linkov.net> X-Google-Sender-Auth: DOclyW3BmW4ByNSJCbrL3Krs10w X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:104719 Archived-At: --089e0149419842725f051a1366cd Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Jul 4, 2015 10:07 PM, "Juri Linkov" wrote: > > > > '8' matches the adjacent comma because (character-fold-to-regexp "8") > contains "8[,.]" > > The culprit is #x1f109 =E2=80=9CDIGIT EIGHT COMMA=E2=80=9D with decomposi= tion: (compat > '8' ',') and #x248f =E2=80=9CDIGIT EIGHT FULL STOP=E2=80=9D with decompos= ition: (compat > '8' '.') > > We don't need to match the decomposition =E2=80=9C8,=E2=80=9D when search= ing for =E2=80=9C8=E2=80=9D. > We only need to match the char #x1f109 when searching for =E2=80=9C8=E2= =80=9D. > > Maybe Artur has an idea how to fix this regexp? Yes, it's simple enough to fix. The reason why we set characters to also match the decomposition of other unicode characters is that this lets us match a letter combined with a non spacing accent. Within that, I've already added a clause to avoid matching when the decomposition has more than one letter (this prevents "a" from matching "am"). Clearly, we need another clause to avoid this situation here. If no one has a different opinion, I'll add a clause so that a decomposition is folded only if it contains at least one non-spacing character. (though I'm not sure how to check for this, at the phone right now). That would fix this situation, and wouldn't affect how ascii characters are allowed to match unicode characters. This would only affect how ascii characters are allowed to match decompositions. --089e0149419842725f051a1366cd Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Jul 4, 2015 10:07 PM, "Juri Linkov" <juri@linkov.net> wrote:
>
> >
> '8' matches the adjacent comma because (character-fold-to-rege= xp "8")
> contains "8[,.]"
>
> The culprit is #x1f109 =E2=80=9CDIGIT EIGHT COMMA=E2=80=9D with decomp= osition: (compat
> '8' ',') and #x248f =E2=80=9CDIGIT EIGHT FULL STOP=E2= =80=9D with decomposition: (compat
> '8' '.')
>
> We don't need to match the decomposition =E2=80=9C8,=E2=80=9D when= searching for =E2=80=9C8=E2=80=9D.
> We only need to match the char #x1f109 when searching for =E2=80=9C8= =E2=80=9D.
>
> Maybe Artur has an idea how to fix this regexp?

Yes, it's simple enough to fix.
The reason why we set characters to also match the decomposition of other u= nicode characters is that this lets us match a letter combined with a non s= pacing accent.
Within that, I've already added a clause to avoid matching when the dec= omposition has more than one letter (this prevents "a" from match= ing "am").

Clearly, we need another clause to avoid this situation here= .
If no one has a different opinion, I'll add a clause so that a decompos= ition is folded only if it contains at least one non-spacing character. (th= ough I'm not sure how to check for this, at the phone right now).

That would fix this situation, and wouldn't affect how a= scii characters are allowed to match unicode characters. This would only af= fect how ascii characters are allowed to match decompositions.

--089e0149419842725f051a1366cd--