unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* regexp-quote missing escapes in grouping constructs - Bug?
@ 2008-06-12 23:39 St/n_P/rm/n
  2008-06-13  6:17 ` Miles Bader
  2008-06-13  6:20 ` Herbert Euler
  0 siblings, 2 replies; 6+ messages in thread
From: St/n_P/rm/n @ 2008-06-12 23:39 UTC (permalink / raw)
  To: emacs-devel

(regexp-quote "[0-9]\{2,4\}\(-\|/\)[0-9]?+\(-\|/\)[0-9]\{2,4\}")

---> "\\[0-9]{2,4}(-|/)\\[0-9]\\?\\+(-|/)\\[0-9]{2,4}"

Am I misunderstanding something?
Shouldn't passing that string to regexp-quote give back something more
like this:

---> "[0-9]\\{2,4\\}\\(-\\|/\\)[0-9]?+(-\\|/)[0-9]\\{2,4\\}"

or *flinches at the thought*

---> "[0-9]\\\\{2,4\\\\}\\\\(-\\\\|/\\\\)[0-9]?+(-\\\\|/)[0-9]\\\\{2,4\\\\}"

-
Its possible that I am misunderstanding the function, and  I'd rather
not file this as a bug report b/c I am using a Lennart's recent W32
patched...

GNU Emacs 23.0.60.1 (i386-mingw-nt5.1.2600)
of 2008-05-12 on LENNART-69DE564 (patched)


However, currently building a derived mode and the regexp-opt and
-quote are kinda required, esp. as there doesn't seem to be a clean
way to  avoid passing everything around through multiple instances of
defconst defvar defcustom etc. just to "cache" keyword regexes for
font-lock

---
I find the following two most relevant to the matter at hand.

case a)
We get the requisite lisp reader 4x \\\\ for the group construct , but
the function not only misses the interior alternative escape but omits
it e.g.

(regexp-quote "\\(123\|567\\)")
---> "\\\\(123|567\\\\)"

case b)
In contrast, when we give him enough the double escape "\\" inside the
group he DOES catch the the escape and gives us  4x the \

(regexp-quote "\\(123\\|567\\)")
--->  "\\\\(123\\\\|567\\\\)"

---
This doesn't seem like consistent behavior esp. as regexp-quote is
feeding regexp-opt elsewhere.
---

These others examples do not strike me as edge cases when
'manually-optimizing' a regex for font-locks:

(regexp-quote "\(123\|567\)")
--->  "(123|567)"

(regexp-quote `(,"\(123\|567\)")
--->  ("(123|567)")

(regexp-quote '("\(123|567\)")
--->  ("(123|567)")

(regexp-quote "(123|567)")
---> "(123|567)"

(regexp-quote '"(123|567)")
---> "(123|567)"

---
again, maybe I am missing something but my head hurts... despite
having really come to appreciate emacs regexps :)




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: regexp-quote missing escapes in grouping constructs - Bug?
  2008-06-12 23:39 regexp-quote missing escapes in grouping constructs - Bug? St/n_P/rm/n
@ 2008-06-13  6:17 ` Miles Bader
  2008-06-13 17:36   ` MON KEY
  2008-06-13  6:20 ` Herbert Euler
  1 sibling, 1 reply; 6+ messages in thread
From: Miles Bader @ 2008-06-13  6:17 UTC (permalink / raw)
  To: St/n_P/rm/n; +Cc: emacs-devel

"St/n_P/rm/n" <Stan@SandPframing.com> writes:
> (regexp-quote "[0-9]\{2,4\}\(-\|/\)[0-9]?+\(-\|/\)[0-9]\{2,4\}")
>
> ---> "\\[0-9]{2,4}(-|/)\\[0-9]\\?\\+(-|/)\\[0-9]{2,4}"
>
> Am I misunderstanding something?

The backslashes you entered in the original lisp string were eaten by
the lisp reader, so there are no backslashes in the string.  Since (, ),
|, etc., are not emacs regexp metacharacters (without a preceding
backslash), there's no need to quote them.

Here's what you probably meant:

(regexp-quote "[0-9]\\{2,4\\}\\(-\\|/\\)[0-9]?+\\(-\\|/\\)[0-9]\\{2,4\\}")
=> "\\[0-9]\\\\{2,4\\\\}\\\\(-\\\\|/\\\\)\\[0-9]\\?\\+\\\\(-\\\\|/\\\\)\\[0-9]\\\\{2,4\\\\}"

-Miles

-- 
Joy, n. An emotion variously excited, but in its highest degree arising from
the contemplation of grief in another.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: regexp-quote missing escapes in grouping constructs - Bug?
  2008-06-12 23:39 regexp-quote missing escapes in grouping constructs - Bug? St/n_P/rm/n
  2008-06-13  6:17 ` Miles Bader
@ 2008-06-13  6:20 ` Herbert Euler
  1 sibling, 0 replies; 6+ messages in thread
From: Herbert Euler @ 2008-06-13  6:20 UTC (permalink / raw)
  To: St/n_P/rm/n, emacs-devel


> (regexp-quote "[0-9]\{2,4\}\(-\|/\)[0-9]?+\(-\|/\)[0-9]\{2,4\}")
> 
> ---> "\\[0-9]{2,4}(-|/)\\[0-9]\\?\\+(-|/)\\[0-9]{2,4}"
> 
> Am I misunderstanding something?
> Shouldn't passing that string to regexp-quote give back something more
> like this:
> 
> ---> "[0-9]\\{2,4\\}\\(-\\|/\\)[0-9]?+(-\\|/)[0-9]\\{2,4\\}"

I think this is because `regexp-quote' sees the string being processed
by the Lisp reader, i.e. "\{" ==> "{" in the internal representation,
while "\\{" ==> "\{":

     ELISP> (regexp-quote "[0-9]\{2,4\}")
     "\\[0-9]{2,4}"
     ELISP> (regexp-quote "[0-9]\\{2,4\\}")
     "\\[0-9]\\\\{2,4\\\\}"
     ELISP>

Regards,
Guanpeng Xu
_________________________________________________________________
Explore the seven wonders of the world
http://search.msn.com/results.aspx?q=7+wonders+world&mkt=en-US&form=QBRE




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: regexp-quote missing escapes in grouping constructs - Bug?
  2008-06-13  6:17 ` Miles Bader
@ 2008-06-13 17:36   ` MON KEY
  2008-06-13 22:21     ` Stefan Monnier
  2008-06-14  4:16     ` tomas
  0 siblings, 2 replies; 6+ messages in thread
From: MON KEY @ 2008-06-13 17:36 UTC (permalink / raw)
  To: Miles Bader; +Cc: emacs-devel

Y. but why are the "?" a"[" and "+" getting escaped regardless of the
presence of a preceding \ whereas the alternative "|" inside the
grouping construct isn't?

e.g.

example 1)

(regexp-quote "[0-9]{2,4}(-|/)[0-9]?+(-|/)[0-9]{2,4}")
---> "\\[0-9]{2,4}(-|/)\\[0-9]\\?\\+(-|/)\\[0-9]{2,4}"

as compared to 2);

(regexp-quote "[0-9]{2,4}(-?+/)[0-9]?+(-|/)[0-9]{2,4}")
---> "\\[0-9]{2,4}(-\\?\\+/)\\[0-9]\\?\\+(-|/)\\[0-9]{2,4}"

in the second case the ?+ nested inside the group is getting escaped.

Is the "|" not considered a special operator or emacs regexp
metacharacter in the regexp-quote situation?

And if so, why not?
The issue is that regexp-opt.el is calling regexp-quote

If i understand the implications of regexp-opt  it is meant as a
helper function for passing regexps to font-lock-keywords and isn't
intended to accept or 'optimize' existing regexp "words".  So, to feed
a well-formed regexp to font-lock-add-keywords I need to build the
regexp by hand. Likewise, that regexp needs to be passed as a string
with all special characters properly escaped e.g.

(defconst stupid-mode-keywords
'("^[A-z]\\?\\+\\(some\\|stupid\\|regexp\\)\\{2,4\\}" . my-stupid-mode-face))

Am I to understand that the ? and + should be escaped but the |
shouldn't be in order for the regexp to work with font-lock?

FWIW my epierience is otherwise, and the previous case doesn't work,
whereas the following does:

(defconst stupid-mode-keywords
'("^[A-z]?+\\(some\\|stupid\\|regexp\\)\\{2,4\\}" . my-stupid-mode-face))

For my purposes, the larger issue is that I can't find a sensible way
to cons or append a well formed regexp to an existing one without
running into regexp-quote and regexp-opt confusion esp. as I am
unclear as to the correctness of the quoting and escaping of regexps
for font-locking by the two respective functions.

The only solution that seems approachable is to make a new defconst
defvar and defface for each new regexp i wish to font-lock.  This
approach is not really particularly maintanable over the longterm.

On Fri, Jun 13, 2008 at 2:17 AM, Miles Bader <miles.bader@necel.com> wrote:
> "St/n_P/rm/n" <Stan@SandPframing.com> writes:
>> (regexp-quote "[0-9]\{2,4\}\(-\|/\)[0-9]?+\(-\|/\)[0-9]\{2,4\}")
>>
>> ---> "\\[0-9]{2,4}(-|/)\\[0-9]\\?\\+(-|/)\\[0-9]{2,4}"
>>
>> Am I misunderstanding something?
>
> The backslashes you entered in the original lisp string were eaten by
> the lisp reader, so there are no backslashes in the string.  Since (, ),
> |, etc., are not emacs regexp metacharacters (without a preceding
> backslash), there's no need to quote them.
>
> Here's what you probably meant:
>
> (regexp-quote "[0-9]\\{2,4\\}\\(-\\|/\\)[0-9]?+\\(-\\|/\\)[0-9]\\{2,4\\}")
> => "\\[0-9]\\\\{2,4\\\\}\\\\(-\\\\|/\\\\)\\[0-9]\\?\\+\\\\(-\\\\|/\\\\)\\[0-9]\\\\{2,4\\\\}"
>
> -Miles
>
> --
> Joy, n. An emotion variously excited, but in its highest degree arising from
> the contemplation of grief in another.
>




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: regexp-quote missing escapes in grouping constructs - Bug?
  2008-06-13 17:36   ` MON KEY
@ 2008-06-13 22:21     ` Stefan Monnier
  2008-06-14  4:16     ` tomas
  1 sibling, 0 replies; 6+ messages in thread
From: Stefan Monnier @ 2008-06-13 22:21 UTC (permalink / raw)
  To: MON KEY; +Cc: emacs-devel, Miles Bader

> Y. but why are the "?" a"[" and "+" getting escaped regardless of the
> presence of a preceding \ whereas the alternative "|" inside the
> grouping construct isn't?

Please read the Elisp manual's section about regular expressions.


        Stefan




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: regexp-quote missing escapes in grouping constructs - Bug?
  2008-06-13 17:36   ` MON KEY
  2008-06-13 22:21     ` Stefan Monnier
@ 2008-06-14  4:16     ` tomas
  1 sibling, 0 replies; 6+ messages in thread
From: tomas @ 2008-06-14  4:16 UTC (permalink / raw)
  To: MON KEY; +Cc: emacs-devel, Miles Bader

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, Jun 13, 2008 at 01:36:07PM -0400, MON KEY wrote:
> Y. but why are the "?" a"[" and "+" getting escaped regardless of the
> presence of a preceding \ whereas the alternative "|" inside the
> grouping construct isn't?

Ah -- maybe somewhat unexpectedly to you (e.g. in Perl it is), the "|"
isn't special in Emacs. You have to spell "\|" to say "alternative".

HTH
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFIU0Y1Bcgs9XrR2kYRAjc/AJwPIAhKhoMdc5iDeFo6U3dNMAYn6wCeJZH9
FcFJY6lNWbQODDxxBlDm2Qw=
=O1bw
-----END PGP SIGNATURE-----




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-06-14  4:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-12 23:39 regexp-quote missing escapes in grouping constructs - Bug? St/n_P/rm/n
2008-06-13  6:17 ` Miles Bader
2008-06-13 17:36   ` MON KEY
2008-06-13 22:21     ` Stefan Monnier
2008-06-14  4:16     ` tomas
2008-06-13  6:20 ` Herbert Euler

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).