From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Paul Pogonyshev Newsgroups: gmane.emacs.devel Subject: Re: using non-Emacs regexp syntax Date: Sat, 2 Dec 2006 00:54:12 +0200 Message-ID: <200612020054.12344.pogonyshev@gmx.net> References: <58590.128.165.123.18.1165012500.squirrel@webmail.lanl.gov> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1165014032 24289 80.91.229.2 (1 Dec 2006 23:00:32 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Fri, 1 Dec 2006 23:00:32 +0000 (UTC) Cc: rms@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Dec 02 00:00:30 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1GqHMc-0000a4-9u for ged-emacs-devel@m.gmane.org; Sat, 02 Dec 2006 00:00:14 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GqHMb-0003oT-QA for ged-emacs-devel@m.gmane.org; Fri, 01 Dec 2006 18:00:13 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1GqHMP-0003oL-Fe for emacs-devel@gnu.org; Fri, 01 Dec 2006 18:00:01 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1GqHMN-0003np-8E for emacs-devel@gnu.org; Fri, 01 Dec 2006 18:00:00 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GqHMM-0003nl-WC for emacs-devel@gnu.org; Fri, 01 Dec 2006 17:59:59 -0500 Original-Received: from [213.165.64.20] (helo=mail.gmx.net) by monty-python.gnu.org with smtp (Exim 4.52) id 1GqHMM-0006lU-Py for emacs-devel@gnu.org; Fri, 01 Dec 2006 17:59:59 -0500 Original-Received: (qmail invoked by alias); 01 Dec 2006 22:50:01 -0000 Original-Received: from unknown (EHLO [80.94.230.87]) [80.94.230.87] by mail.gmx.net (mp042) with SMTP; 01 Dec 2006 23:50:01 +0100 X-Authenticated: #16844820 Original-To: emacs-devel@gnu.org, herring@lanl.gov User-Agent: KMail/1.7.2 In-Reply-To: <58590.128.165.123.18.1165012500.squirrel@webmail.lanl.gov> Content-Disposition: inline X-Y-GMX-Trusted: 0 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:63209 Archived-At: Stuart D. Herring wrote: > > If you don't mind, I'll work on it now. Changes can be added to whatever > > .el file in the distribution later. > > > > Also, is there sense in supporting conversion to and from several formats? > > E.g. some require that plus operator is escaped, while everything else is > > not. E.g. something like this: > > > > (convert-regexp :sed :emacs some-regexp) > > FROM TO PATTERN-STRING > > > > Of course, it will add more complexity, but it shouldn't be much of a > > problem for users of this function and implementing it in Lisp should > > still > > be not hard. > > I've already started on this sort of thing, writing a converter just > between the two formats supported by GNU grep. (These are > "GNU-extended-basic-RE" and "extended-RE with backreferences".) As it > happens, that conversion can be done with one function because the formats > are so similar. I had planned to go on to the more general case, but for > now I'll just provide what I have for comment and/or use. (I have papers, > so any use is fine.) If, Paul, you'd like, we can collaborate on this, or > one of us of your choice can go on with it. > > [...] I will happily pass this to you if you wish. I planned a more generic implementation which can be briefly described as this: * Each implemented format provides a table of associations construct-name -> construct-generator (some constructs, like [] character class, will require a parameter.) In the simplest form, construct-generator can be just a fixed string, which will suffice in most cases. * Each format also provides a parser that splits a regexp into a list of construct-name. * Entry function (or a helper for it) combines together a table for output format and a parser for input format. The result is a regexp in output format. Maybe it is too slow, though. However, given that Emacs lived happily without this sort of function, it can hardly be too slow. But maybe you can come up with a simpler solution. (One more thing: it probably makes sense to add conversion function for replacement strings too. E.g. some formats require $N, some (like Emacs) use \N for referencing the matched group.) Paul