From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: =?utf-8?Q?Mattias_Engdeg=C3=A5rd?= Newsgroups: gmane.emacs.devel Subject: Re: modern regexes in emacs Date: Mon, 11 Feb 2019 23:12:47 +0100 Message-ID: References: <20180616123704.7123f6d7@jabberwock.cb.piermont.com> <87po0qs6re.fsf@gmail.com> Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="10668"; mail-complaints-to="usenet@blaine.gmane.org" Cc: "Perry E. Metzger" , Jay Kamat , emacs-devel To: =?utf-8?Q?Elias_M=C3=A5rtenson?= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Feb 11 23:13:15 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1gtJpC-0002eq-To for ged-emacs-devel@m.gmane.org; Mon, 11 Feb 2019 23:13:15 +0100 Original-Received: from localhost ([127.0.0.1]:57058 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gtJpB-0003lw-UP for ged-emacs-devel@m.gmane.org; Mon, 11 Feb 2019 17:13:13 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:49429) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gtJp5-0003lo-SW for emacs-devel@gnu.org; Mon, 11 Feb 2019 17:13:08 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gtJoy-0000y7-S7 for emacs-devel@gnu.org; Mon, 11 Feb 2019 17:13:04 -0500 Original-Received: from mail178c50.megamailservers.eu ([91.136.10.188]:42370 helo=mail70c50.megamailservers.eu) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gtJox-0000rk-OV for emacs-devel@gnu.org; Mon, 11 Feb 2019 17:13:00 -0500 X-Authenticated-User: mattiase@bredband.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=megamailservers.eu; s=maildub; t=1549923169; bh=evPbyKLtDzSIK6kLHpz8RyuYtQszAGs8sJVvI+CPq58=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From; b=g0ksPMo4sm2CoRWb2V23vbXf3V4BsNNd6/RATi/oTkmeiZW6IXWG4vw0APiCNCHRD k48Rc+wYizdMmaPOFTtWQZSJAFMSEidOztJ/stty5rR6e2JbeU9Pwy/wfOXYKUGKgi d6j6+/o73pa9t5y641ZTJ15Y9YDupU4LMDS/Ea2s= Feedback-ID: mattiase@acm.or Original-Received: from [192.168.1.64] (c-e636e253.032-75-73746f71.bbcust.telenor.se [83.226.54.230]) (authenticated bits=0) by mail70c50.megamailservers.eu (8.14.9/8.13.1) with ESMTP id x1BMCl8i013453; Mon, 11 Feb 2019 22:12:49 +0000 In-Reply-To: X-Mailer: Apple Mail (2.3445.102.3) X-CTCH-RefID: str=0001.0A0B0201.5C61F361.0050, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0 X-CTCH-VOD: Unknown X-CTCH-Spam: Unknown X-CTCH-Score: 0.000 X-CTCH-Flags: 0 X-CTCH-ScoreCust: 0.000 X-CSC: 0 X-CHA: v=2.3 cv=PNMhB8iC c=1 sm=1 tr=0 a=M+GU/qJco4WXjv8D6jB2IA==:117 a=M+GU/qJco4WXjv8D6jB2IA==:17 a=IkcTkHD0fZMA:10 a=pGLkceISAAAA:8 a=Rk1OmJhQ2_YiQRBYArAA:9 a=QEXdDO2ut3YA:10 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x (no timestamps) [generic] X-Received-From: 91.136.10.188 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:233229 Archived-At: 10 feb. 2019 kl. 10.39 skrev Elias M=C3=A5rtenson : >=20 > While I'm sure that is true for lot of people (and for those, the = newly announced xr package helps here), others prefer to use the more = compact regex syntax.=20 >=20 > However, I don't think anyone would argue that the Emacs regex syntax = has any advantages compared to pcre. I certainly need to wade through = the Emacs regex manual every time I want to do slightly more advanced = regex matching, followed by lots of testing.=20 >=20 > When using regexes in regular editing (as opposed to elisp = programming) it's even worse.=20 >=20 > I'm most definitely in favour of pcre.=20 Hello Elias, Of course you should write "-?[0-9]+" when you need it! And for = interactive use -- search-and-replace, say -- the conventional notations = are not bad, since they are compact to write, you have the meaning all = in your head anyway, and nobody is going to look at it later on. Where rx shines is for the complex ones. I have written page-long = regexps in Perl and Python, and despite the fact that both languages = permit a "structured" regexp layout, they does not come close to rx when = it counts: rx can be read, understood, maintained, evolved, and composed = far better, and with fewer mistakes. I agree that the Posix notation is probably better than the old-style = version in Emacs since the former tends to be a tad lighter in = backslashes. Some languages - OCaml, Python, etc -- have some form of = string literal that avoids the need to escape backslashes, but = fundamentally, regexps are not strings but an algebraic notation with = values and operators, and deserve some kind of higher language-level = support. Larry Wall understood that. So I suggest you give rx a go next time you need to write a complicated = regexp in Elisp. If you still find it too verbose, you can use short = keywords, like `+' or `1+' instead of `one-or-more'. You can even speak = a hybrid dialect by injecting little regexp strings inside a big rx = expression with the `(regexp ...)' syntax! Take a look at the big `gnu' = matcher in compile.el (around line 281) to see what that looks like. Careful here -- rx is addictive, and you may very well come to use it = more and more.