From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Philippe Vaucher Newsgroups: gmane.emacs.devel Subject: Re: modern regexes in emacs Date: Fri, 15 Feb 2019 16:03:03 +0100 Message-ID: References: <20180616123704.7123f6d7@jabberwock.cb.piermont.com> <87po0qs6re.fsf@gmail.com> <387b4e87-2255-0467-c23e-e60c6b090fb3@gmail.com> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="000000000000e897940581f013f0" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="179351"; mail-complaints-to="usenet@blaine.gmane.org" Cc: Emacs developers To: =?UTF-8?Q?Cl=C3=A9ment_Pit=2DClaudel?= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Feb 15 16:04:00 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1guf1x-000kOw-11 for ged-emacs-devel@m.gmane.org; Fri, 15 Feb 2019 16:03:57 +0100 Original-Received: from localhost ([127.0.0.1]:41258 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1guf1v-0001RN-Tq for ged-emacs-devel@m.gmane.org; Fri, 15 Feb 2019 10:03:55 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:37330) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1guf1l-0001Ql-RQ for emacs-devel@gnu.org; Fri, 15 Feb 2019 10:03:50 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1guf1j-0007RI-Q1 for emacs-devel@gnu.org; Fri, 15 Feb 2019 10:03:45 -0500 Original-Received: from mail-lj1-x235.google.com ([2a00:1450:4864:20::235]:42743) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1guf1c-00076H-JA for emacs-devel@gnu.org; Fri, 15 Feb 2019 10:03:38 -0500 Original-Received: by mail-lj1-x235.google.com with SMTP id l7-v6so8610594ljg.9 for ; Fri, 15 Feb 2019 07:03:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=2i4bVe0RXefFRUsRm+Q2gLeAgYSh58Ehrq8oxbIoqE4=; b=Vav5T2u4QfEycBZrxAellGSUdkzKjVfM7DcMbyd9RDflBS4GwchqpQtpJFfrK/dF5+ +nvfZC+gPWilM3CsE/cq3pfuGXvFL5a4jH991gJ8YmSsf3lAd5VE4s4QC7H9t7Oe/ZM8 Fp3DYbSzvxe9dNncXiFt/PHbGE3pxnetAJO96PU26OGYuYjS6CupdMeis+y7f+KhJdWs BBy6z2qTWP7eVc6ErBhEAgwXzuAGtlP/MoRzmynZ99gna8ME+Z8vXTgDjOm8yjPlF7Er ptVUcPLhD+Ru7vRtKB2jfPG6ETezoFX6W83s6Tz/sn14vH0waL3BmmwYIp0jzB3wroGq Kt3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=2i4bVe0RXefFRUsRm+Q2gLeAgYSh58Ehrq8oxbIoqE4=; b=Vlo8TL22/ZhgMXXZWun0PS5XZU1unXfyDeW0H3KBjndT1drw/j9rduxyHf25eVe/aj 4CGA56ZoDnhE324/TqUn22A6hFohtu4CLRToVGplRhuC+5ALC3+1hR28k8jOwIInj3KN AV3yhyfbKHMtUPzgBO3Y3ggR9ze6Yfb66EzOdUbEJK2VKcVAhVYue+buIdzkgL9JFeoL A3aVWe7V0ElrpOPJsEVJp8PLziyfT223HmVwrI5QGHvpuxVSZR7mmFrcICf7bCsSIGGL o7bId0Ssjn5amQtR8N6oOZvxz/CW2kAYH6u1f+zNRMjgZUt4hBfsWdEJc2BgqHQV6Eqv NnjQ== X-Gm-Message-State: AHQUAuZKtbvDB+ziOXP4Q8zXah9S8vjleSveGFb3zvMEcX6D6WJCcEHi Byt4JB7r4rlNq0rgie1NYnn/TrHNg8XTECBL+KE= X-Google-Smtp-Source: AHgI3IaI206nHnv43guemQaG0Nakl7MnsL8d/xTYd2o0fpUf62CvyDVlymEtysuxkfQU6ahFT+wQHut1s+RqE/Nl+6Y= X-Received: by 2002:a2e:5b11:: with SMTP id p17mr5087408ljb.37.1550243009693; Fri, 15 Feb 2019 07:03:29 -0800 (PST) In-Reply-To: <387b4e87-2255-0467-c23e-e60c6b090fb3@gmail.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::235 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:233376 Archived-At: --000000000000e897940581f013f0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable > > > Would this even be possible? I can imagine a whole lot of packages > breaking if the regexp syntax changed, and changing it just for the user > input in interactive functions looks a bit sketchy. > > We could just add a special tag at the beginning of a regexp to indicate > that it's a pcre regexp; something like this maybe? (re-search-forward > "\\(?pcre:\\)=E2=80=A6[pcre regexp goes here]=E2=80=A6"). This form is c= urrently a syntax > error, so there would be no ambiguity, and we could define a (pcre =E2=80= =A6) macro > so that you could write (re-search-forward (pcre "=E2=80=A6[pcre regexp g= oes > here]=E2=80=A6")) instead. Alternatively, we could use an explicit tag, = something > like (re-search-forward (cons 'pcre "=E2=80=A6[pcre regexp goes here]=E2= =80=A6")). > > For interactive functions, I imagine you'd have a defcustom with a > preferred regexp dialect. > I like where this is going, that and Eli's suggestion of a special text property we have plenty of ways to implement it where it'd play nice with the existing code. So far 3 proposals: - Regexps are always strings, with "\\(?pcre:\\)" as part of the regexp - when the string is displayed you need to scan the beginning to see it is a PCRE regex - no separation between the regexp and it's kind - Regexps are strings (emacs regexps) or conses with their kind as symbol with the first argument - when the argument is displayed you see immediatly wether it's an emacs regexp or one using another engine - the regexp is clearly separated from it's kind, probably faciliting convertions - seems more "open", in the sense we can easily imagine new types ('emacs, 'pcre', 'rx, 'sed, 'vim-verymagic, etc) - Special text property on the string - Not immediatly visible that it is a PCRE regexp - Harder to manipulate? Given this I'm in favor of the 2nd option, but maybe I missed some points. Philippe --000000000000e897940581f013f0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
> Would this even be possible? I can imagine a whole = lot of packages breaking if the regexp syntax changed, and changing it just= for the user input in interactive functions looks a bit sketchy.

We could just add a special tag at the beginning of a regexp to indicate th= at it's a pcre regexp; something like this maybe? (re-search-forward &q= uot;\\(?pcre:\\)=E2=80=A6[pcre regexp goes here]=E2=80=A6").=C2=A0 Thi= s form is currently a syntax error, so there would be no ambiguity, and we = could define a (pcre =E2=80=A6) macro so that you could write (re-search-fo= rward (pcre "=E2=80=A6[pcre regexp goes here]=E2=80=A6")) instead= .=C2=A0 Alternatively, we could use an explicit tag, something like (re-sea= rch-forward (cons 'pcre "=E2=80=A6[pcre regexp goes here]=E2=80=A6= ")).

For interactive functions, I imagine you'd have a defcustom with a pref= erred regexp dialect.

I like where this= is going, that and Eli's suggestion of a special text property we have= plenty of ways to implement it where it'd play nice with the existing = code.

So far 3 proposals:
  • Regexp= s are always strings, with "\\(?pcre:\\)" as part of the regexp
    • when the string is displayed you need to scan the beginning to s= ee it is a PCRE regex
    • no separation between the regexp and it's= kind
  • Regexps are strings (emacs regexps) or conses with their= kind as symbol with the first argument
    • when the argument is di= splayed you see immediatly wether it's an emacs regexp or one using ano= ther engine
    • the regexp is clearly separated from it's kind, pro= bably faciliting convertions
    • seems more "open", in the se= nse we can easily imagine new types ('emacs, 'pcre', 'rx, &= #39;sed, 'vim-verymagic, etc)
  • Special text property on the= string
    • Not immediatly visible that it is a PCRE regexp
    • Harder to manipulate?
Given this I'm in favor= of the 2nd option, but maybe I missed some points.

Philippe
--000000000000e897940581f013f0--