From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Daniel Pittman Newsgroups: gmane.emacs.devel Subject: Re: modern regexes in emacs Date: Wed, 27 Feb 2019 13:18:09 -0500 Message-ID: References: <20180616123704.7123f6d7@jabberwock.cb.piermont.com> <87po0qs6re.fsf@gmail.com> <83r2c9m8yj.fsf@gnu.org> <17581DA9-7DCA-432E-A2E8-E5184DFA8B4B@acm.org> <20190215114728.0785e891@jabberwock.cb.piermont.com> <20190215175405.GA5438@ACM> <83lg2gnbky.fsf@gnu.org> <3D5EA6AB-F0DA-4B66-8592-A111C906B3AE@acm.org> <83k1i0n88i.fsf@gnu.org> <4a830d51-526c-5e31-82e6-abf6b8d192a5@gmail.com> <8BB38367-5864-4A62-8349-5420B51BBA94@acm.org> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="0000000000005e21340582e43413" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="25454"; mail-complaints-to="usenet@blaine.gmane.org" Cc: Troy Hinckley , Andreas Schwab , Lars Ingebrigtsen , emacs-devel To: =?UTF-8?Q?Mattias_Engdeg=C3=A5rd?= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Feb 27 19:34:09 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1gz41v-0006UA-5O for ged-emacs-devel@m.gmane.org; Wed, 27 Feb 2019 19:34:07 +0100 Original-Received: from localhost ([127.0.0.1]:48936 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gz41u-0007ri-2a for ged-emacs-devel@m.gmane.org; Wed, 27 Feb 2019 13:34:06 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:39784) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gz3zI-0006CL-0S for emacs-devel@gnu.org; Wed, 27 Feb 2019 13:31:25 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gz3nR-0005jM-PV for emacs-devel@gnu.org; Wed, 27 Feb 2019 13:19:11 -0500 Original-Received: from mail-ed1-x534.google.com ([2a00:1450:4864:20::534]:39174) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gz3nI-0005YR-Ho for emacs-devel@gnu.org; Wed, 27 Feb 2019 13:19:06 -0500 Original-Received: by mail-ed1-x534.google.com with SMTP id p27so14731408edc.6 for ; Wed, 27 Feb 2019 10:18:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=diuIN0gZlCExNdiL7OQO+k1LuR1n/3G3Qbd6oUDSgI8=; b=SSfJaPf1JLESWpkCmThs+m64ItUmKmz+00lCuAjgQbQFX3zO+rbVfVGA0QbtVTpjd4 BguEG6FvGrEI+/9pRcNWl7PB9B44qpeY4635z7bCXPIETBYAYPVOXmLc/vstbJvgq7yV wKWFJ+r1g82a6M8V8P7tj2e2QWIY1mHK9jcPbgnF6UCDEWKqN48rP2w/dbl4Un869jWX kGwOyw9wCuH4rAI1wCtpmUDmQXC0/+qRzYTDfDhcXoVUQhL+hnifPw9QHSSvCCVtGwbA L5oY6saWGme1/Vv4NwtBqPyfvK3wXQz04hBgPRiEG7XC3t4eF3fGDUonH8BfBE4yQtkE xAxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=diuIN0gZlCExNdiL7OQO+k1LuR1n/3G3Qbd6oUDSgI8=; b=V5hks/lYypR5LdFSJ3Bt2U2Wgd0xtTmQaNi7gzlGxUYf3dDVnUHBl7PDq8jpJDe2yU 8hhVUzB7B1hEeqkErzWiOjam4XH1txPa0Fd5BMsFlh+LPVnip0Uxw6L8xFHGEAT/G5Zh 0/og9bjgHrVgaouU8RJlYWE5jZJ1HLe0YlST7I7+y5qSOVrBSPjNfQLj5ZZ8TSJVTLvb VYecFIpjUWOUU0rq5DHtoSdtzoJfp1andQ2X/ydNPWUAbK4Em+sTJHzRZNsCS6crHyq7 R/iJNGhHPEzoRN3KlixP7WbNSgiNZeqFE6msEmTlqRve0WgOQueDoK5TivMgVoU+FgLI mIdQ== X-Gm-Message-State: AHQUAuZmvdr3vsRdHuWku3SyTJ08GUp0tdun67RqCG4b21T1kibIROe/ OpTOGgR5dDlmjdsi7edrCTDbExVXPzDBj+457XVVjQ== X-Google-Smtp-Source: AHgI3Ib2E3VbL1GVLIdFekfEJ9eYkWwHU3gpKcrZGynrZMaxDjQIQqqEjcjChyobmm/fa0QN/nhhGNFUsv0kLIEIfMU= X-Received: by 2002:a50:b1ab:: with SMTP id m40mr2172351edd.268.1551291526266; Wed, 27 Feb 2019 10:18:46 -0800 (PST) In-Reply-To: <8BB38367-5864-4A62-8349-5420B51BBA94@acm.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::534 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:233667 Archived-At: --0000000000005e21340582e43413 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Feb 27, 2019 at 8:53 AM Mattias Engdeg=C3=A5rd w= rote: > 26 feb. 2019 kl. 15.33 skrev Andreas Schwab : > > > > If you want to byte-compile a form that contains a regexp object, a > > proper read syntax is required. > > > > The object types without read syntax are rather ephemeral, unlikely to > > occur in byte-compiled forms. > > Thanks for pointing that out. I'm not sure how it would work -- please > bear with me. > > Suppose we want to write (looking-at (pcre "a(b|c)")). > Then `pcre' is a macro returning a mutable object with the regexp in some > canonical form -- a traditional Emacs regexp, perhaps, or normalised rx o= r > something else. The object also has space for the internal compiled > pattern, roughly struct re_pattern_buffer today. > > As Richard pointed out, it is polite to make the object human-readable > (for debugging, if nothing else). This means that we are either satisfied > with the readability of the canonical form, or the original pattern is > included around for this purpose. As a somewhat outsider opinion, but based on helping a lot of junior developers get up to speed with a wide range of languages over many years, I like to imagine my suggestion here is useful. Other languages express regex literals with the equivalent of a CL reader macro, or the record literal syntax #s(...): Clojure: #"..." JavaScript and many others: /.../ Racket: #rx"..." and #px"..." for basic and PCRE respectively. Dart, and a few others: r"...", or r'...', or a tagged prefix such as $r"..." or %r/.../ Of those the most Emacs Lisp-ish would be something like the Racket versions for supporting both types, for example `#r"..."`, or `#pcre"..."`, or even `#rx(...)`. I'd personally suggest that an additional reader (macro) syntax, and using that in the printed form, is the most user friendly option. The S-expression form is a little less friendly, but in my eyes the absolute best fallback, being a printed representation of `(pcre "...")` etc. That works, but it doesn't give the "compiled expression" a distinct identity from the methods to create them, and I think separating them is the correct choice. --0000000000005e21340582e43413 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Wed, Feb 27, 2019 at 8:53 AM Mattias E= ngdeg=C3=A5rd <mat= tiase@acm.org> wrote:
26 feb. 2019 kl. 15.33 skrev Andrea= s Schwab <schwab@sus= e.de>:
>
> If you want to byte-compile a form that contains a regexp object, a > proper read syntax is required.
>
> The object types without read syntax are rather ephemeral, unlikely to=
> occur in byte-compiled forms.

Thanks for pointing that out. I'm not sure how it would work -- please = bear with me.

Suppose we want to write (looking-at (pcre "a(b|c)")).
Then `pcre' is a macro returning a mutable object with the regexp in so= me canonical form -- a traditional Emacs regexp, perhaps, or normalised rx = or something else. The object also has space for the internal compiled patt= ern, roughly struct re_pattern_buffer today.

As Richard pointed out, it is polite to make the object human-readable (for= debugging, if nothing else). This means that we are either satisfied with = the readability of the canonical form, or the original pattern is included = around for this purpose.

As a somewhat outs= ider opinion, but based on helping a lot of junior developers get up to spe= ed with a wide range of languages over many years, I like to imagine my sug= gestion here is useful.=C2=A0 Other languages express regex literals with t= he equivalent of a CL reader macro, or the record literal syntax #s(...):

Clojure: #"..."=C2=A0
JavaScrip= t and many others: /.../
Racket: #rx"..." and #px"= ..." for basic and PCRE respectively.
Dart, and a few others= : r"...", or r'...', or a tagged prefix such as $r".= .." or %r/.../

Of those the most Emacs Lisp-i= sh would be something like the Racket versions for supporting both types, f= or example `#r"..."`, or `#pcre"..."`, or even `#rx(...= )`.

I'd personally suggest that an additional = reader (macro) syntax, and using that in the printed form, is the most user= friendly option.=C2=A0 The S-expression form is a little less friendly, bu= t in my eyes the absolute best fallback, being a printed representation of = `(pcre "...")` etc.=C2=A0 That works, but it doesn't give the= "compiled expression" a distinct identity from the methods to cr= eate them, and I think separating them is the correct choice.
--0000000000005e21340582e43413--