From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Drew Adams Newsgroups: gmane.emacs.help Subject: RE: How to grok a complicated regex? Date: Sat, 14 Mar 2015 00:03:34 -0700 (PDT) Message-ID: References: <87twxo1pnr.fsf@debian.uxu> <87egosa3od.fsf@wmi.amu.edu.pl> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1426316654 1469 80.91.229.3 (14 Mar 2015 07:04:14 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 14 Mar 2015 07:04:14 +0000 (UTC) To: Marcin Borkowski , help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Sat Mar 14 08:03:59 2015 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YWg72-0001sB-Sp for geh-help-gnu-emacs@m.gmane.org; Sat, 14 Mar 2015 08:03:57 +0100 Original-Received: from localhost ([::1]:39869 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YWg72-00077t-0o for geh-help-gnu-emacs@m.gmane.org; Sat, 14 Mar 2015 03:03:56 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:60804) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YWg6p-00077n-E7 for help-gnu-emacs@gnu.org; Sat, 14 Mar 2015 03:03:44 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YWg6l-0004KN-3N for help-gnu-emacs@gnu.org; Sat, 14 Mar 2015 03:03:43 -0400 Original-Received: from userp1040.oracle.com ([156.151.31.81]:48337) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YWg6k-0004K8-SF for help-gnu-emacs@gnu.org; Sat, 14 Mar 2015 03:03:39 -0400 Original-Received: from ucsinet21.oracle.com (ucsinet21.oracle.com [156.151.31.93]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id t2E73avT000337 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sat, 14 Mar 2015 07:03:37 GMT Original-Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by ucsinet21.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id t2E73ZnK002147 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sat, 14 Mar 2015 07:03:36 GMT Original-Received: from abhmp0003.oracle.com (abhmp0003.oracle.com [141.146.116.9]) by aserv0121.oracle.com (8.13.8/8.13.8) with ESMTP id t2E73ZIf011644; Sat, 14 Mar 2015 07:03:35 GMT In-Reply-To: <87egosa3od.fsf@wmi.amu.edu.pl> X-Priority: 3 X-Mailer: Oracle Beehive Extensions for Outlook 2.0.1.8.2 (807160) [OL 12.0.6691.5000 (x86)] X-Source-IP: ucsinet21.oracle.com [156.151.31.93] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] X-Received-From: 156.151.31.81 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:103160 Archived-At: > I=E2=80=99m not talking about changing the representation, but about maki= ng the > existing one (which I agree is not /that/ bad) more comprehensible. > Font lock, grouping and unescaping backslashes would be definitely helpfu= l. >=20 > OTOH, I can imagine that some kind of diagrams might be helpful for > someone. The point is, in the end you have to read/write these regexen > in their normal form anyway, so why not train yourself to understand > their =E2=80=9Cdefault=E2=80=9D representation instead of adding the burd= en of > translationg between representations? I agree that a visual aid can help with learning - about regexps in general and about Emacs regexp syntax in particular. The Emacs Wiki page about regexps provides suggestions about learning regexp syntax: http://www.emacswiki.org/emacs/RegularExpression. Incremental regexp searching (`C-M-s') is one good tool for learning. What it does not help so much with is subgroup matching - keeping the different groups straight when there are several possibilities. Rasmus mentioned that `visual-regexp.el' can help with that. Likewise, Icicles search: it highlights different subgroup matches differently. Here is a screenshot that shows a complex regexp (5 groups) and a diagram that maps each group to its highlighting: http://www.emacswiki.org/emacs/RegularExpression#RegexpsInIcicles The regexp: "(\([-a-z*]+\) *\((\(([-a-z]+ *\([^)]*\))\))\).*". A left paren, a name, possibly some whitespace, two left parens, a name, possibly some whitespace, possibly non right-paren chars, two right parens, and any chars other than newline. But grouped in a particular way. I find that it is more often the case, for a complicated regexp, that you encounter it readymade (in some existing code), and you want to see what it is all about and perhaps make a modification to it. That use case is more typical than is creating a complex regexp from scratch. As Emanuel said, such regexps are often arrived at incrementally - they start simpler and evolve. I recommend playing with existing regexps this way, seeing what they match by using them with a visual tool such as Icicles search, `visual-regexp.el', or even `C-M-s'. A tour through the Emacs source code will show you plenty of interesting regexps you can play with - font-lock keywords and patterns defining Emacs pages, sentences, etc.