From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "MON KEY" Newsgroups: gmane.emacs.devel Subject: Re: regexp-quote missing escapes in grouping constructs - Bug? Date: Fri, 13 Jun 2008 13:36:07 -0400 Message-ID: References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1213378594 10672 80.91.229.12 (13 Jun 2008 17:36:34 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 13 Jun 2008 17:36:34 +0000 (UTC) Cc: emacs-devel@gnu.org To: "Miles Bader" Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Jun 13 19:37:17 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1K7DD4-0000M5-Vz for ged-emacs-devel@m.gmane.org; Fri, 13 Jun 2008 19:37:11 +0200 Original-Received: from localhost ([127.0.0.1]:41492 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1K7DCH-0006aB-48 for ged-emacs-devel@m.gmane.org; Fri, 13 Jun 2008 13:36:21 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1K7DCD-0006a6-43 for emacs-devel@gnu.org; Fri, 13 Jun 2008 13:36:17 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1K7DCB-0006Zu-Mp for emacs-devel@gnu.org; Fri, 13 Jun 2008 13:36:15 -0400 Original-Received: from [199.232.76.173] (port=39625 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1K7DCB-0006Zr-Fn for emacs-devel@gnu.org; Fri, 13 Jun 2008 13:36:15 -0400 Original-Received: from yw-out-1718.google.com ([74.125.46.155]:57050) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1K7DCA-0000UW-U6 for emacs-devel@gnu.org; Fri, 13 Jun 2008 13:36:15 -0400 Original-Received: by yw-out-1718.google.com with SMTP id 9so2427188ywk.66 for ; Fri, 13 Jun 2008 10:36:08 -0700 (PDT) Original-Received: by 10.151.98.16 with SMTP id a16mr4826352ybm.202.1213378567936; Fri, 13 Jun 2008 10:36:07 -0700 (PDT) Original-Received: by 10.151.156.18 with HTTP; Fri, 13 Jun 2008 10:36:07 -0700 (PDT) In-Reply-To: Content-Disposition: inline X-Google-Sender-Auth: 79aa883b8342ed64 X-detected-kernel: by monty-python.gnu.org: Linux 2.6 (newer, 2) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:99132 Archived-At: Y. but why are the "?" a"[" and "+" getting escaped regardless of the presence of a preceding \ whereas the alternative "|" inside the grouping construct isn't? e.g. example 1) (regexp-quote "[0-9]{2,4}(-|/)[0-9]?+(-|/)[0-9]{2,4}") ---> "\\[0-9]{2,4}(-|/)\\[0-9]\\?\\+(-|/)\\[0-9]{2,4}" as compared to 2); (regexp-quote "[0-9]{2,4}(-?+/)[0-9]?+(-|/)[0-9]{2,4}") ---> "\\[0-9]{2,4}(-\\?\\+/)\\[0-9]\\?\\+(-|/)\\[0-9]{2,4}" in the second case the ?+ nested inside the group is getting escaped. Is the "|" not considered a special operator or emacs regexp metacharacter in the regexp-quote situation? And if so, why not? The issue is that regexp-opt.el is calling regexp-quote If i understand the implications of regexp-opt it is meant as a helper function for passing regexps to font-lock-keywords and isn't intended to accept or 'optimize' existing regexp "words". So, to feed a well-formed regexp to font-lock-add-keywords I need to build the regexp by hand. Likewise, that regexp needs to be passed as a string with all special characters properly escaped e.g. (defconst stupid-mode-keywords '("^[A-z]\\?\\+\\(some\\|stupid\\|regexp\\)\\{2,4\\}" . my-stupid-mode-face)) Am I to understand that the ? and + should be escaped but the | shouldn't be in order for the regexp to work with font-lock? FWIW my epierience is otherwise, and the previous case doesn't work, whereas the following does: (defconst stupid-mode-keywords '("^[A-z]?+\\(some\\|stupid\\|regexp\\)\\{2,4\\}" . my-stupid-mode-face)) For my purposes, the larger issue is that I can't find a sensible way to cons or append a well formed regexp to an existing one without running into regexp-quote and regexp-opt confusion esp. as I am unclear as to the correctness of the quoting and escaping of regexps for font-locking by the two respective functions. The only solution that seems approachable is to make a new defconst defvar and defface for each new regexp i wish to font-lock. This approach is not really particularly maintanable over the longterm. On Fri, Jun 13, 2008 at 2:17 AM, Miles Bader wrote: > "St/n_P/rm/n" writes: >> (regexp-quote "[0-9]\{2,4\}\(-\|/\)[0-9]?+\(-\|/\)[0-9]\{2,4\}") >> >> ---> "\\[0-9]{2,4}(-|/)\\[0-9]\\?\\+(-|/)\\[0-9]{2,4}" >> >> Am I misunderstanding something? > > The backslashes you entered in the original lisp string were eaten by > the lisp reader, so there are no backslashes in the string. Since (, ), > |, etc., are not emacs regexp metacharacters (without a preceding > backslash), there's no need to quote them. > > Here's what you probably meant: > > (regexp-quote "[0-9]\\{2,4\\}\\(-\\|/\\)[0-9]?+\\(-\\|/\\)[0-9]\\{2,4\\}") > => "\\[0-9]\\\\{2,4\\\\}\\\\(-\\\\|/\\\\)\\[0-9]\\?\\+\\\\(-\\\\|/\\\\)\\[0-9]\\\\{2,4\\\\}" > > -Miles > > -- > Joy, n. An emotion variously excited, but in its highest degree arising from > the contemplation of grief in another. >