From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: martin rudalics Newsgroups: gmane.emacs.devel Subject: Re: Unquoted special characters in regexps Date: Sun, 05 Mar 2006 12:54:14 +0100 Message-ID: <440AD166.5040108@gmx.at> References: <4400AD8E.5050001@gmx.at> <4400BBB1.2050800@gmx.at> <200602252213.k1PMDBP24413@raven.dms.auburn.edu> <4401A98D.3070809@gmx.at> <4401E0F2.7030800@gmx.at> <4401FCBA.1070206@gmx.at> <200602280044.k1S0iHG07279@raven.dms.auburn.edu> <200603050337.k253brP03395@raven.dms.auburn.edu> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1141564737 19865 80.91.229.2 (5 Mar 2006 13:18:57 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sun, 5 Mar 2006 13:18:57 +0000 (UTC) Cc: ttn@gnu.org, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Mar 05 14:18:55 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1FFt8M-00060T-Au for ged-emacs-devel@m.gmane.org; Sun, 05 Mar 2006 14:18:50 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FFt8S-0004P5-Pp for ged-emacs-devel@m.gmane.org; Sun, 05 Mar 2006 08:18:56 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1FFt8D-0003Mv-0r for emacs-devel@gnu.org; Sun, 05 Mar 2006 08:18:41 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1FFt89-00030E-4x for emacs-devel@gnu.org; Sun, 05 Mar 2006 08:18:40 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FFt89-0002zf-1D for emacs-devel@gnu.org; Sun, 05 Mar 2006 08:18:37 -0500 Original-Received: from [213.165.64.20] (helo=mail.gmx.net) by monty-python.gnu.org with smtp (Exim 4.52) id 1FFtAB-0001vr-Ql for emacs-devel@gnu.org; Sun, 05 Mar 2006 08:20:44 -0500 Original-Received: (qmail invoked by alias); 05 Mar 2006 13:18:26 -0000 Original-Received: from N921P004.adsl.highway.telekom.at (EHLO [62.47.59.4]) [62.47.59.4] by mail.gmx.net (mp033) with SMTP; 05 Mar 2006 14:18:26 +0100 X-Authenticated: #14592706 User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: de-DE, de, en-us, en Original-To: Luc Teirlinck In-Reply-To: <200603050337.k253brP03395@raven.dms.auburn.edu> X-Y-GMX-Trusted: 0 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:51229 Archived-At: Luc Teirlinck wrote: > Also, forward and backward views of a regexp are not > algorithmically equivalent. If you read a regexp forward, you know > immediately when you encounter a character whether it has to be taken > literally or not (or at worst after a _very_ limited number of > characters, as the second `[' in in "[[:..."). If you read the regexp > backward, you may have to read all the way back to the beginning > before you can be sure that a `]' is to be taken literally. How do you read the following regexp from `cc-langs.el'? (concat "\\(" "[\)\[\(]" (if (c-lang-const c-type-modifier-kwds) (concat "\\|" ;; "throw" in `c-type-modifier-kwds' is followed ;; by a parenthesis list, but no extra measures ;; are necessary to handle that. (regexp-opt (c-lang-const c-type-modifier-kwds) t) "\\>") "") "\\)") Do you really evaluate the (c-lang-const ...)s _before_ looking at the closing `\\)'? What would you do if the value of `c-type-modifier-kwds' were available at run-time only? When trying to understand such regexps I break them up into parts first. Such parts are, in my understanding, groups like `\\(...\\)', subexpressions delimited by `\\|', and character alternatives. Next I try to understand the parts that interest me without paying notice to parts that do not relate to my specific problem. And I would have troubles to isolate a character alternative when the author matches a literal right bracket with `]'. People can make reading a regexp truly awkward by writing kludgy expressions like (let ((keywords (concat "\\([;(){}`|&]\\|^\\)[ \t]*\\(\\(" (regexp-opt (sh-feature sh-leading-keywords) t) "[ \t]+\\)?" (regexp-opt (append (sh-feature sh-leading-keywords) (sh-feature sh-other-keywords)) t)))) in `sh-font-lock-keywords-1' which I understand correctly iff I read the definition of the entire function first. Such expressions are, however, rare in present Emacs code. > Hence, reading a regexp forward _is_ algorithmically _very_ superior > over reading it backward if your purpose is to understand the regexp. If my purpose is to understand how a regexp engine interprets a regexp, reading a regexp forwardly is superior. If, however, my purpose is to understand a complex regexp I want to guess the author's intentions first. In that case I do want to break up the expression into its constituents. In general, languages hiding implementation details are easier to use than languages that require users to know how specific features are implemented. > I must admit however, that if you want is to uncover the subliminal > satanic messages in the regexp, then you _have_ to read it backward. It's better to avoid "subliminal satanic messages" when _writing_ a regexp. It's bad if you have to uncover them when reading a regexp.