From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Drew Adams" Newsgroups: gmane.emacs.help Subject: RE: regexp to match a sexp? Date: Fri, 28 Jul 2006 20:59:17 -0700 Message-ID: References: <87ac6t172o.fsf@thalassa.informatimago.com> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1154145608 21602 80.91.229.2 (29 Jul 2006 04:00:08 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 29 Jul 2006 04:00:08 +0000 (UTC) Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Sat Jul 29 06:00:04 2006 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1G6fze-0002cA-P5 for geh-help-gnu-emacs@m.gmane.org; Sat, 29 Jul 2006 06:00:03 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1G6fze-0004Qx-3o for geh-help-gnu-emacs@m.gmane.org; Sat, 29 Jul 2006 00:00:02 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1G6fzS-0004Qi-3P for help-gnu-emacs@gnu.org; Fri, 28 Jul 2006 23:59:50 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1G6fzQ-0004QS-Rn for help-gnu-emacs@gnu.org; Fri, 28 Jul 2006 23:59:49 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1G6fzQ-0004QL-Ka for help-gnu-emacs@gnu.org; Fri, 28 Jul 2006 23:59:48 -0400 Original-Received: from [141.146.126.228] (helo=agminet01.oracle.com) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_3DES_EDE_CBC_SHA:24) (Exim 4.52) id 1G6g1Y-0005qD-ET for help-gnu-emacs@gnu.org; Sat, 29 Jul 2006 00:02:00 -0400 Original-Received: from rcsmt250.oracle.com (rcsmt250.oracle.com [148.87.90.195]) by agminet01.oracle.com (Switch-3.1.7/Switch-3.1.7) with ESMTP id k6T2YsB4019518 for ; Fri, 28 Jul 2006 22:59:46 -0500 Original-Received: from dhcp-amer-csvpn-gw1-141-144-64-116.vpn.oracle.com by rcsmt250.oracle.com with ESMTP id 1668863291154145561; Fri, 28 Jul 2006 21:59:21 -0600 Original-To: X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.6604 (9.0.2911.0) Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1807 In-Reply-To: <87ac6t172o.fsf@thalassa.informatimago.com> X-Whitelist: TRUE X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:36360 Archived-At: > I'm looking for a regexp that will match (only) a sexp. In general, it's not possible. Regexps cannot match recursive grammars. I know that. Again, as I said: > I'm less interested in hearing "it can't be done" than in > attempts to do the job, even if in a rough way. Are you excluding sublists? No, as I said. Excluding them would be one legitimate approximation. But the real exercise is to allow for some degree of nesting. With sublists excluded, it's possible, but note that \s(.*\s) you cannot ensure that the opening bracket is the corresponding character to the closing bracket. Yes. Of course, with no sublists, you can use \s(\S)*\s). You'll have to write: (ELEMENTS)\|\[ELEMENTS\]\|{ELEMENTS}\|\|... ELEMENTS ::= "\([^"\\]*\|\\.\)*"\|[^"]* I'm interested in candidate regexps that might be useful in limited contexts. Yes, different regexps will have different limitations, and therefore be differently useful. Think of this as a quick-and-dirty `C-M-s' that might (*might*) just usefully find a sexp some of the time. > 2. Strings and the possible escaping of `"' would be one > headache that would need to be dealt with carefully, as always. No, it's simplistic to deal with them in regexps. See above. In practice, we do deal with them in Emacs, albeit with limited success. I'm not trying to write a sexp grammar; I'm asking about what can be done, practically, with regexps, in terms of matching sexps. > 3. It would need to be effectively recursive or some > approximation thereof, for example, with some limit placed > on nesting. That is, it would need to allow for nested sexps. Yes, that's why it's not possible with regexps. Regexps are not recursive by definition. Yes, I know that. Some approximations can be made. You proposed an approximation of zero nesting. That in itself can be useful in some contexts, but one level of nesting is also useful, and two levels,... > 4. It would need to deal properly with quoting, `''. Dealing > with backquote syntax, ``', would be a plus. > > Can something like this be done in a reasonable way? What's a > good regexp that you could use, e.g., to search for one or > more sexps? > > I'm not looking for a way to search for or scan a sexp > *without* using a regexp; I know there are ways to do that. > I'm wondering what can be done *with* a regexp. IOW, imagine > that all you have is `C-M-s' (but don't worry > about the expression being too complex to type interactively). > > I'm less interested in hearing "it can't be done" than in > attempts to do the job, even if in a rough way. Well, since it's not possible, ita can't be done, but you can still go thru the mirror and see if it's possible, since you prefer that. Good bye. Now, if your purpose is to _*PARSE*_ sexps instead of using regexps, I stated that my purpose was the opposite. If I ask about turtles in Alabama, why do you tell me what you know about wine in Tuscany? ;-) then you can easily write a sexp parser. This is one of the simpliest grammar there is. In emacs, of course you can use the provided sexp parser, with functions such as: forward-sexp, backward-sexp, (thing-at-point 'sexp), read-from-string, etc... I know about those. No, I'm not interested in them here. I'm interested in how much that's interesting could be done with a regexp. The question is what you can do with `C-M-s' - what interesting regexps would you use to find which classes of sexp or almost-sexp?