unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
From: "D. D. Brierton" <darren@dzr-web.com>
Subject: Re: Help needed with regexps
Date: Fri, 13 Feb 2004 20:16:11 +0000	[thread overview]
Message-ID: <pan.2004.02.13.20.16.08.185780@dzr-web.com> (raw)
In-Reply-To: pan.2004.02.13.19.17.40.790445@dzr-web.com

On Fri, 13 Feb 2004 19:17:44 +0000, D. D. Brierton wrote:

> In particular, the regexps for html-css-embedded and
> html-javascript-embedded are the ones I need someone to look over for me.
> 
> So, for CSS
> 
> "<style\\s-+\\(\\s-*.*\\s-+\\)*.*css\"?\\(\\s-*.*\\s-*\\)*\\s-*>"

My current version of this is:

"<style\\s-+\\(\\s-*\\w+=\"\\w+\"\\s-+\\)*type=\"\\(text/\\)?css\"\\(\\s-*\\w+=\"\\w+\"\\s-*\\)*\\s-*>"

This now looks for a "style" attribute that contains a "type" attribute
with either value "text/css" or the incorrect "css". Not ideal, and
doesn't work for the situation I just thought of where someone has just
used a "<style> ... </style>" element with no attributes. Hmmm. Perhaps
just this would be better?

"<style\\(\\s-+\\w+=\"?\\w+\"?\\)*\\s-*>"

(I'd originally wanted to keep the "css" string a match requirement in
case I ever came across some weird instance of someone attempting to use
something other than CSS to style an HTML page (I don't know what ... may
be JSSL). But in all honesty, I guess that is never going to happen.)

> For javascript
> 
> "<script\\s-+\\(\\s-*.*\\s-+\\)*.*javascript.*\\(\\s-*.*\\s-+\\)*\\s-*>"
> 
> should match a "script" element that contains the string "javascript" and
> which may again be variably spaced and either upper case or lower case.

My current regexp for embedded javascript is:

"<script\\s-+\\(\\s-*\\w+=\"\\w+\"\\s-+\\)*\\(language\\|type\\)=\"\\(text/\\)?javascript[.0-9]*\"\\(\\s-*\\w+=\"\\w+\"\\s-+\\)*\\s-*>"

Unlike the CSS case, matching "javascript" is more of an issue, as people
do include VBscript on web pages. However, I probably want the case where
all there is is a "<script> ... </script>" element with no attributes to
default to javascript-mode as well. Besides, the above regexp looks way to
complicated to me. Any suggestions?

-- 
======================================================================
D. D. Brierton            darren@dzr-web.com           www.dzr-web.com
       Trying is the first step towards failure (Homer Simpson)
======================================================================

      parent reply	other threads:[~2004-02-13 20:16 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-02-13 19:17 Help needed with regexps D. D. Brierton
2004-02-13 19:21 ` D. D. Brierton
2004-02-13 19:37 ` Stefan Monnier
2004-02-13 19:57   ` D. D. Brierton
2004-02-13 20:16 ` D. D. Brierton [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pan.2004.02.13.20.16.08.185780@dzr-web.com \
    --to=darren@dzr-web.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).