* regexp-quote missing escapes in grouping constructs - Bug? @ 2008-06-12 23:39 St/n_P/rm/n 2008-06-13 6:17 ` Miles Bader 2008-06-13 6:20 ` Herbert Euler 0 siblings, 2 replies; 6+ messages in thread From: St/n_P/rm/n @ 2008-06-12 23:39 UTC (permalink / raw) To: emacs-devel (regexp-quote "[0-9]\{2,4\}\(-\|/\)[0-9]?+\(-\|/\)[0-9]\{2,4\}") ---> "\\[0-9]{2,4}(-|/)\\[0-9]\\?\\+(-|/)\\[0-9]{2,4}" Am I misunderstanding something? Shouldn't passing that string to regexp-quote give back something more like this: ---> "[0-9]\\{2,4\\}\\(-\\|/\\)[0-9]?+(-\\|/)[0-9]\\{2,4\\}" or *flinches at the thought* ---> "[0-9]\\\\{2,4\\\\}\\\\(-\\\\|/\\\\)[0-9]?+(-\\\\|/)[0-9]\\\\{2,4\\\\}" - Its possible that I am misunderstanding the function, and I'd rather not file this as a bug report b/c I am using a Lennart's recent W32 patched... GNU Emacs 23.0.60.1 (i386-mingw-nt5.1.2600) of 2008-05-12 on LENNART-69DE564 (patched) However, currently building a derived mode and the regexp-opt and -quote are kinda required, esp. as there doesn't seem to be a clean way to avoid passing everything around through multiple instances of defconst defvar defcustom etc. just to "cache" keyword regexes for font-lock --- I find the following two most relevant to the matter at hand. case a) We get the requisite lisp reader 4x \\\\ for the group construct , but the function not only misses the interior alternative escape but omits it e.g. (regexp-quote "\\(123\|567\\)") ---> "\\\\(123|567\\\\)" case b) In contrast, when we give him enough the double escape "\\" inside the group he DOES catch the the escape and gives us 4x the \ (regexp-quote "\\(123\\|567\\)") ---> "\\\\(123\\\\|567\\\\)" --- This doesn't seem like consistent behavior esp. as regexp-quote is feeding regexp-opt elsewhere. --- These others examples do not strike me as edge cases when 'manually-optimizing' a regex for font-locks: (regexp-quote "\(123\|567\)") ---> "(123|567)" (regexp-quote `(,"\(123\|567\)") ---> ("(123|567)") (regexp-quote '("\(123|567\)") ---> ("(123|567)") (regexp-quote "(123|567)") ---> "(123|567)" (regexp-quote '"(123|567)") ---> "(123|567)" --- again, maybe I am missing something but my head hurts... despite having really come to appreciate emacs regexps :) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: regexp-quote missing escapes in grouping constructs - Bug? 2008-06-12 23:39 regexp-quote missing escapes in grouping constructs - Bug? St/n_P/rm/n @ 2008-06-13 6:17 ` Miles Bader 2008-06-13 17:36 ` MON KEY 2008-06-13 6:20 ` Herbert Euler 1 sibling, 1 reply; 6+ messages in thread From: Miles Bader @ 2008-06-13 6:17 UTC (permalink / raw) To: St/n_P/rm/n; +Cc: emacs-devel "St/n_P/rm/n" <Stan@SandPframing.com> writes: > (regexp-quote "[0-9]\{2,4\}\(-\|/\)[0-9]?+\(-\|/\)[0-9]\{2,4\}") > > ---> "\\[0-9]{2,4}(-|/)\\[0-9]\\?\\+(-|/)\\[0-9]{2,4}" > > Am I misunderstanding something? The backslashes you entered in the original lisp string were eaten by the lisp reader, so there are no backslashes in the string. Since (, ), |, etc., are not emacs regexp metacharacters (without a preceding backslash), there's no need to quote them. Here's what you probably meant: (regexp-quote "[0-9]\\{2,4\\}\\(-\\|/\\)[0-9]?+\\(-\\|/\\)[0-9]\\{2,4\\}") => "\\[0-9]\\\\{2,4\\\\}\\\\(-\\\\|/\\\\)\\[0-9]\\?\\+\\\\(-\\\\|/\\\\)\\[0-9]\\\\{2,4\\\\}" -Miles -- Joy, n. An emotion variously excited, but in its highest degree arising from the contemplation of grief in another. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: regexp-quote missing escapes in grouping constructs - Bug? 2008-06-13 6:17 ` Miles Bader @ 2008-06-13 17:36 ` MON KEY 2008-06-13 22:21 ` Stefan Monnier 2008-06-14 4:16 ` tomas 0 siblings, 2 replies; 6+ messages in thread From: MON KEY @ 2008-06-13 17:36 UTC (permalink / raw) To: Miles Bader; +Cc: emacs-devel Y. but why are the "?" a"[" and "+" getting escaped regardless of the presence of a preceding \ whereas the alternative "|" inside the grouping construct isn't? e.g. example 1) (regexp-quote "[0-9]{2,4}(-|/)[0-9]?+(-|/)[0-9]{2,4}") ---> "\\[0-9]{2,4}(-|/)\\[0-9]\\?\\+(-|/)\\[0-9]{2,4}" as compared to 2); (regexp-quote "[0-9]{2,4}(-?+/)[0-9]?+(-|/)[0-9]{2,4}") ---> "\\[0-9]{2,4}(-\\?\\+/)\\[0-9]\\?\\+(-|/)\\[0-9]{2,4}" in the second case the ?+ nested inside the group is getting escaped. Is the "|" not considered a special operator or emacs regexp metacharacter in the regexp-quote situation? And if so, why not? The issue is that regexp-opt.el is calling regexp-quote If i understand the implications of regexp-opt it is meant as a helper function for passing regexps to font-lock-keywords and isn't intended to accept or 'optimize' existing regexp "words". So, to feed a well-formed regexp to font-lock-add-keywords I need to build the regexp by hand. Likewise, that regexp needs to be passed as a string with all special characters properly escaped e.g. (defconst stupid-mode-keywords '("^[A-z]\\?\\+\\(some\\|stupid\\|regexp\\)\\{2,4\\}" . my-stupid-mode-face)) Am I to understand that the ? and + should be escaped but the | shouldn't be in order for the regexp to work with font-lock? FWIW my epierience is otherwise, and the previous case doesn't work, whereas the following does: (defconst stupid-mode-keywords '("^[A-z]?+\\(some\\|stupid\\|regexp\\)\\{2,4\\}" . my-stupid-mode-face)) For my purposes, the larger issue is that I can't find a sensible way to cons or append a well formed regexp to an existing one without running into regexp-quote and regexp-opt confusion esp. as I am unclear as to the correctness of the quoting and escaping of regexps for font-locking by the two respective functions. The only solution that seems approachable is to make a new defconst defvar and defface for each new regexp i wish to font-lock. This approach is not really particularly maintanable over the longterm. On Fri, Jun 13, 2008 at 2:17 AM, Miles Bader <miles.bader@necel.com> wrote: > "St/n_P/rm/n" <Stan@SandPframing.com> writes: >> (regexp-quote "[0-9]\{2,4\}\(-\|/\)[0-9]?+\(-\|/\)[0-9]\{2,4\}") >> >> ---> "\\[0-9]{2,4}(-|/)\\[0-9]\\?\\+(-|/)\\[0-9]{2,4}" >> >> Am I misunderstanding something? > > The backslashes you entered in the original lisp string were eaten by > the lisp reader, so there are no backslashes in the string. Since (, ), > |, etc., are not emacs regexp metacharacters (without a preceding > backslash), there's no need to quote them. > > Here's what you probably meant: > > (regexp-quote "[0-9]\\{2,4\\}\\(-\\|/\\)[0-9]?+\\(-\\|/\\)[0-9]\\{2,4\\}") > => "\\[0-9]\\\\{2,4\\\\}\\\\(-\\\\|/\\\\)\\[0-9]\\?\\+\\\\(-\\\\|/\\\\)\\[0-9]\\\\{2,4\\\\}" > > -Miles > > -- > Joy, n. An emotion variously excited, but in its highest degree arising from > the contemplation of grief in another. > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: regexp-quote missing escapes in grouping constructs - Bug? 2008-06-13 17:36 ` MON KEY @ 2008-06-13 22:21 ` Stefan Monnier 2008-06-14 4:16 ` tomas 1 sibling, 0 replies; 6+ messages in thread From: Stefan Monnier @ 2008-06-13 22:21 UTC (permalink / raw) To: MON KEY; +Cc: emacs-devel, Miles Bader > Y. but why are the "?" a"[" and "+" getting escaped regardless of the > presence of a preceding \ whereas the alternative "|" inside the > grouping construct isn't? Please read the Elisp manual's section about regular expressions. Stefan ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: regexp-quote missing escapes in grouping constructs - Bug? 2008-06-13 17:36 ` MON KEY 2008-06-13 22:21 ` Stefan Monnier @ 2008-06-14 4:16 ` tomas 1 sibling, 0 replies; 6+ messages in thread From: tomas @ 2008-06-14 4:16 UTC (permalink / raw) To: MON KEY; +Cc: emacs-devel, Miles Bader -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Fri, Jun 13, 2008 at 01:36:07PM -0400, MON KEY wrote: > Y. but why are the "?" a"[" and "+" getting escaped regardless of the > presence of a preceding \ whereas the alternative "|" inside the > grouping construct isn't? Ah -- maybe somewhat unexpectedly to you (e.g. in Perl it is), the "|" isn't special in Emacs. You have to spell "\|" to say "alternative". HTH - -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFIU0Y1Bcgs9XrR2kYRAjc/AJwPIAhKhoMdc5iDeFo6U3dNMAYn6wCeJZH9 FcFJY6lNWbQODDxxBlDm2Qw= =O1bw -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: regexp-quote missing escapes in grouping constructs - Bug? 2008-06-12 23:39 regexp-quote missing escapes in grouping constructs - Bug? St/n_P/rm/n 2008-06-13 6:17 ` Miles Bader @ 2008-06-13 6:20 ` Herbert Euler 1 sibling, 0 replies; 6+ messages in thread From: Herbert Euler @ 2008-06-13 6:20 UTC (permalink / raw) To: St/n_P/rm/n, emacs-devel > (regexp-quote "[0-9]\{2,4\}\(-\|/\)[0-9]?+\(-\|/\)[0-9]\{2,4\}") > > ---> "\\[0-9]{2,4}(-|/)\\[0-9]\\?\\+(-|/)\\[0-9]{2,4}" > > Am I misunderstanding something? > Shouldn't passing that string to regexp-quote give back something more > like this: > > ---> "[0-9]\\{2,4\\}\\(-\\|/\\)[0-9]?+(-\\|/)[0-9]\\{2,4\\}" I think this is because `regexp-quote' sees the string being processed by the Lisp reader, i.e. "\{" ==> "{" in the internal representation, while "\\{" ==> "\{": ELISP> (regexp-quote "[0-9]\{2,4\}") "\\[0-9]{2,4}" ELISP> (regexp-quote "[0-9]\\{2,4\\}") "\\[0-9]\\\\{2,4\\\\}" ELISP> Regards, Guanpeng Xu _________________________________________________________________ Explore the seven wonders of the world http://search.msn.com/results.aspx?q=7+wonders+world&mkt=en-US&form=QBRE ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-06-14 4:16 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-06-12 23:39 regexp-quote missing escapes in grouping constructs - Bug? St/n_P/rm/n 2008-06-13 6:17 ` Miles Bader 2008-06-13 17:36 ` MON KEY 2008-06-13 22:21 ` Stefan Monnier 2008-06-14 4:16 ` tomas 2008-06-13 6:20 ` Herbert Euler
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).