From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Emanuel Berg Newsgroups: gmane.emacs.help Subject: Re: How to grok a complicated regex? Date: Fri, 13 Mar 2015 23:46:48 +0100 Organization: Aioe.org NNTP Server Message-ID: <87twxo1pnr.fsf@debian.uxu> References: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1426286726 5497 80.91.229.3 (13 Mar 2015 22:45:26 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 13 Mar 2015 22:45:26 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Fri Mar 13 23:45:20 2015 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YWYKU-0001Ya-QQ for geh-help-gnu-emacs@m.gmane.org; Fri, 13 Mar 2015 23:45:18 +0100 Original-Received: from localhost ([::1]:38938 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YWYKU-0000Wv-3R for geh-help-gnu-emacs@m.gmane.org; Fri, 13 Mar 2015 18:45:18 -0400 Original-Path: usenet.stanford.edu!news.tele.dk!news.tele.dk!small.news.tele.dk!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!news.stack.nl!aioe.org!.POSTED!not-for-mail Original-Newsgroups: gnu.emacs.help Original-Lines: 97 Original-NNTP-Posting-Host: feB02bRejf23rfBm51Mt7Q.user.speranza.aioe.org Original-X-Complaints-To: abuse@aioe.org User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4 (gnu/linux) X-Notice: Filtered by postfilter v. 0.8.2 Cancel-Lock: sha1:lLAN2FpuJeMXcY+20jXXQAufJf4= Mail-Copies-To: never Original-Xref: usenet.stanford.edu gnu.emacs.help:210872 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:103151 Archived-At: Marcin Borkowski writes: > so I have this monstrosity [note: I know, there are > much worse ones, too!]: > > "\\`\\(?:\\\\[([]\\|\\$+\\)?\\(.*?\\)\\(?:\\\\[])]\\|\\$+\\)?\\'" > > (it's in the org-latex--script-size function in > ox-latex.el, if you're curious). > > I'm not asking “what does this match” – I can read > it myself. But it comes with a considerable effort. I dare say most people (even programmers) cannot read that so if you can that's great. As a math professional you are of course aware of the discipline called automata theory that deals with such things. Perhaps relational algebra might help to, if the data in the sets are strings. But automata theory should be it even more. Also, remember you don't have to understand those expressions. Often they are setup incrementally. They only need to be correct. The computer understands them - the programmer only understands the purpose, and the latest edition. Kind of risky, perhaps not what I math person would be appealed by, but I've constructed many that way so I know that method works. > Are you aware of any tools that might help to > understand such regexen? I have seen tools with which you can construct such expressions and they output figures, states, transitions, and so on. I wonder how advanced expression they can deal with? But if you get the basics right, it should be just basic building blocks that stick together and from there on the sky is the limit. Instead the problem is, as I see it: will those figures, balls and arrows, tagged with preconditions, postconditions, everything you can think of, will that actually be *clearer*? If I were to do it (which I am not thanks god) my answer would be *no*. The only way I could do it would instead be the opposite. Train the brain with such expressions - exactly as they are - day in, day out, until they are second nature. Example: a C++ OO project with classes and everything. Silly inheritance and interfaces. Some people would consider those pretty darn difficult to understand. But to the seasoned C++ programmer (no exaggerating here, a few years of focused training is enough) those programs are clear. For those guys, giving up writing C++ code and instead using some other representation (be it graphical or not) would be to in one stroke cripple their skills. So no, I think that representation is the best there is. To translate it back and forth would not only be very difficult to do - and even if possible, which of course it is, because a representation is just a representation of I don't know how many possible - I don't see the end result being any more clear: on the contrary, most likely. What I would do - try to get it more readable by using classes, string classes (do they exist?), and even more advanced constructs if necessary - as in this simple example: (defconst stop-char-default "\\([[:punct:]]\\|[[:space:]][[:alnum:]]\\)") How do you define those? Can you identify any which aren't there, but could/should be? Example: say there is a class called "delimiters" which contain [, (, {, <, >, }, ), and ]. Can you split that up, in "opening-delimiters" and closing ditto? Second, exactly you mentioned - the font lock issue - work on that. You do know, of course, of font-lock-regexp-grouping-construct font-lock-regexp-grouping-backslash Are there more of those, that you can identify, and add? -- underground experts united