From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Yuri Khan Newsgroups: gmane.emacs.help Subject: Re: How to grok a complicated regex? Date: Sat, 14 Mar 2015 11:14:26 +0600 Message-ID: References: <87twxo1pnr.fsf@debian.uxu> <87egosa3od.fsf@wmi.amu.edu.pl> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1426310101 11278 80.91.229.3 (14 Mar 2015 05:15:01 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 14 Mar 2015 05:15:01 +0000 (UTC) Cc: "help-gnu-emacs@gnu.org" To: Marcin Borkowski Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Sat Mar 14 06:15:01 2015 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YWePc-0002hm-RF for geh-help-gnu-emacs@m.gmane.org; Sat, 14 Mar 2015 06:15:01 +0100 Original-Received: from localhost ([::1]:39609 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YWePb-0005p9-Q8 for geh-help-gnu-emacs@m.gmane.org; Sat, 14 Mar 2015 01:14:59 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:51761) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YWePR-0005ot-1N for help-gnu-emacs@gnu.org; Sat, 14 Mar 2015 01:14:49 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YWePQ-0004fQ-0y for help-gnu-emacs@gnu.org; Sat, 14 Mar 2015 01:14:48 -0400 Original-Received: from mail-ig0-x230.google.com ([2607:f8b0:4001:c05::230]:34719) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YWePP-0004fE-R8 for help-gnu-emacs@gnu.org; Sat, 14 Mar 2015 01:14:47 -0400 Original-Received: by igbue6 with SMTP id ue6so2182919igb.1 for ; Fri, 13 Mar 2015 22:14:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type:content-transfer-encoding; bh=IXOXdGRFrmpGIultp/xTXL4OBqm+opKdaN7Rhhd8g0U=; b=ZwqaduFfdaLmzxjQy08TZhHkDzn+jn/5jfS70c4WP9IyZp9TPvGlOcHWkIJww7TbO9 g7w5josfd1h6/HMkrfkyiwKuELqNDc4FPwEKXCtT4NTuy77QZgbaEBFCF+wE/AI8dorK Vy8QhmNHxPkV8CsrRNCDEeq+IDzZaUsxrVCgKZdiSOPOT5gPaErtbcuPMH7XValCzhlo QippIErjfBcX6iSrkf/van0WsE509eVDmrcgnalY9qqn6ialz3D1dhyW0TTx4l5PeAQT 38FzmbeQaxsQX3LOQFPjYcPsgaN+ZTL90CJDnxIkgoKKoe3Jue+k+Ja1hty38mhP93eK zhbQ== X-Received: by 10.42.79.205 with SMTP id s13mr62889637ick.67.1426310086703; Fri, 13 Mar 2015 22:14:46 -0700 (PDT) Original-Received: by 10.107.3.132 with HTTP; Fri, 13 Mar 2015 22:14:26 -0700 (PDT) In-Reply-To: <87egosa3od.fsf@wmi.amu.edu.pl> X-Google-Sender-Auth: 1ExWsD-JC-8TFtePPL4YWSMu9KU X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2607:f8b0:4001:c05::230 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:103159 Archived-At: On Sat, Mar 14, 2015 at 5:16 AM, Marcin Borkowski wr= ote: >>> "\\`\\(?:\\\\[([]\\|\\$+\\)?\\(.*?\\)\\(?:\\\\[])]\\|\\$+\\)?\\'" >>> > It's not really /difficult/. > Intimidating, yes. Boring, possibly. Laborious (and mechanical), yes. > But not /difficult/. I tried it and it=E2=80=99s not very intimidating or boring or laborious or difficult. Here=E2=80=99s my thought process: First I unescape all backslashes, by global-replacing =E2=80=9C\\=E2=80=9D = with =E2=80=9C\=E2=80=9D. Then I insert spaces at key points to separate the syntactic constructs. (Any literal spaces in the regexp need to be made explicit, e.g. by replacing as .) \` \(?: \\ [([] \| \$+ \)? \(.*?\) \(?: \\ [])] \| \$+ \)? \' Imagining the parentheses and alternatives as nested boxes might help, too: =E2=94=8C=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=AC=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=90 =E2=95=94=E2=95=90=E2=95=90=E2=95=90=E2=95=97 =E2=94=8C=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=AC=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=90 \` =E2=94=82 \\ [([] =E2=94=82 \$+ =E2=94=82? =E2=95=91.*?=E2=95=91 =E2= =94=82 \\ [])] =E2=94=82 \$+ =E2=94=82? \' =E2=94=94=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=B4=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=98 =E2=95=9A=E2=95=90=E2=95=90=E2=95=90=E2=95=9D =E2=94=94=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=B4=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=98 (Here the nesting level is just 1, so I didn=E2=80=99t actually need to dra= w it, just match.) Now I can read it: 1. start-of-string 2. optionally followed by either * a backslash and either an opening parenthesis or bracket * or one or more dollar signs 3. followed by any string, which is extracted as group 1 4. optionally followed by either * a backslash and either a closing bracket or parenthesis * or one or more dollar signs 5. followed by end-of-string I can further grok it as matching a valid (La)TeX math formula: $=E2=80=A6$= , $$=E2=80=A6$$, \(=E2=80=A6\), \[=E2=80=A6\]; as well as some invalid markup= such as $$$$=E2=80=A6$$$, $=E2=80=A6\], \(=E2=80=A6\], $$=E2=80=A6, etc. As for the bigger picture, I think, if a regular expression ends up difficult to read, it needs decomposed into small, easily digestible chunks, each with a descriptive name. Elisp has the let* form and the rx macro for this purpose.