From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Kangas Newsgroups: gmane.emacs.devel Subject: Re: Ugly regexps Date: Tue, 2 Mar 2021 19:32:23 -0600 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="4886"; mail-complaints-to="usenet@ciao.gmane.io" To: Stefan Monnier , emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Mar 03 02:33:53 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lHGOd-00019M-0U for ged-emacs-devel@m.gmane-mx.org; Wed, 03 Mar 2021 02:33:51 +0100 Original-Received: from localhost ([::1]:34954 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lHGOb-0005Yk-UW for ged-emacs-devel@m.gmane-mx.org; Tue, 02 Mar 2021 20:33:49 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:48202) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lHGNK-0004hp-Do for emacs-devel@gnu.org; Tue, 02 Mar 2021 20:32:31 -0500 Original-Received: from mail-pl1-x630.google.com ([2607:f8b0:4864:20::630]:40607) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lHGNH-0004lq-PM for emacs-devel@gnu.org; Tue, 02 Mar 2021 20:32:30 -0500 Original-Received: by mail-pl1-x630.google.com with SMTP id z7so13114181plk.7 for ; Tue, 02 Mar 2021 17:32:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:in-reply-to:references:mime-version:date:message-id:subject:to; bh=tdbcpoTE9nFo7Y3am3UC1JIup9id5gI6Vy80pFFX66o=; b=At8roJLO2aB3IggHebHUChS/cG4CnHWlKPM1jyA/WxbGK8G91h3PA0bSG+l9lB4u7l oFWOKpSHNI+hPL7dtZsCEXrkSh5f0gqxPnVkwhQd5HOTVf3qq6k2s4Mdo6tUj9i1omog j/gun9pRYXG6SNItV8ljY4j4pwtSErL6IzVBEAWMPNe6B5X6dwEG8IWSYSVwFxPQyQ05 Q5V27Tt27gxHv8fWtHArIleoUyn7v7scwPvWNB9kx+eiSOhI9tUYznyG1XJuuUK+uDwG wooGuk+L78dMZB52oXmsjV7JGbhxnBZ9BFIIKEyJ3MqggXBJygNVmXabbZKOmrY/Mj/z PHew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:in-reply-to:references:mime-version:date :message-id:subject:to; bh=tdbcpoTE9nFo7Y3am3UC1JIup9id5gI6Vy80pFFX66o=; b=U7bkGq013HC/CD6pE0N9YeO7ameQ8In8XDGm2nlaRTsqVVo3vEgIFfbJEry/trLBPq jAO0BzWikCpQc/LuPWeLCRouAEywAAVbxS6oLRlaVTIMLYbl9q0F87O+2GHrCi1xUB9p oYmvQmL9G29AE9H7sbnEuxWA2sH6yZ2h2vyLX8saHjDosQRY2ceyflAsxmcCOOEdgrv0 OFVhZ+0e621NKZEaXoD9p3KpJo2qU0ajo40stIySphy4xPM0qrAgMch8cFgUxeGPVDO4 rJgou40JfgX27pPCxe1CLufrjpZKvtABU870KR6fboG4uhirzelXlupBRe4CryWf2xWp jyZA== X-Gm-Message-State: AOAM5302wkkzQes6eDbMw/EAYYFuto8Cqgfh7+1wpelGyiNf6Bl4ECW9 rcTXpT6HkZoMwgseWBaQDyUIaEbgblS0324sDMc= X-Google-Smtp-Source: ABdhPJz1RI6RBBBDWytPE6E/xUjSuEsCBVApaNFmbi49wP7SoHFixSgpMB+oOjZBihBw9k0EUCOwRP9R8Jdlelh5bLE= X-Received: by 2002:a17:902:f688:b029:e4:bde1:1730 with SMTP id l8-20020a170902f688b02900e4bde11730mr6141939plg.41.1614735144159; Tue, 02 Mar 2021 17:32:24 -0800 (PST) Original-Received: from 753933720722 named unknown by gmailapi.google.com with HTTPREST; Tue, 2 Mar 2021 19:32:23 -0600 In-Reply-To: Received-SPF: pass client-ip=2607:f8b0:4864:20::630; envelope-from=stefankangas@gmail.com; helo=mail-pl1-x630.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:265853 Archived-At: Stefan Monnier writes: > BTW, while this theme of ugly regexps keeps coming up, how 'bout we add > a new function `ere` which converts between the ERE style of regexps > where grouping parens are not escaped (and plain chars meant to match > an actual paren need to be escaped instead) to ELisp-style regexps? > > So you can do > > (string-match (ere "\\(def(macro|un|subst) .{1,}")) > > instead of > > (string-match "(def\\(macro\\|un\\|subst\\) .\\{1,\\}") > > ? Sounds good to me. I was going to ask why not just do PCRE, but then I realized I'm not exactly sure what the syntactical differences are. (We obviously lack some features.) AFAIR, Emacs regexps don't exactly match GNU grep, egrep, Perl, or anything else really. So I cranked out my dusty old copy of Mastering Regular Expressions and found this overview: grep egrep Emacs Perl \? \+ \| ? + | ? + \| ? + | \( \) ( ) \( \) ( ) \< \> \< \> \b \B \b \B (Excerpt from Mastering Regular Expressions: Table 3-3: A (Very) Superficial Look at the Flavor of a Few Common Tools) This shows the differences that most commonly bites you, in my experience. While we're at it, has it ever been discussed to add support for the pcre library side-by-side with our homegrown regexp.c? It would give us sane (standard) syntax and some useful features "for free" (e.g. lookaround). I didn't test but a priori I would also assume the code to be much more performant than anything we could ever cook up ourselves. It is used by several high-profile projects. I would imagine we'd introduce entirely new function names for it. Perhaps even a completely new and improved API like Lars suggested a while back.