* Re: xml-lite.el
[not found] ` <200203072345.g27NjKh17379@rum.cs.yale.edu>
@ 2002-03-08 21:41 ` Felix Natter
2002-03-09 10:30 ` xml-lite.el Karl Eichwalder
2002-03-09 10:49 ` xml-lite.el Mike Williams
0 siblings, 2 replies; 5+ messages in thread
From: Felix Natter @ 2002-03-08 21:41 UTC (permalink / raw)
Cc: Mike Williams, emacs-devel, Sam Steingold, Eli Zaretskii,
Karl Eichwalder
"Stefan Monnier" <monnier+gnu/emacs@RUM.cs.yale.edu> writes:
> > hi,
> >
> > here's what we have to do to spice xml-lite up for sgml:
> >
> > - allow Different tag-names
The regular expressions in `xml-lite-parse-tag-name' need to be changed
for SGML compatibility.
Maybe Karl can help us out: Which characters are allowed at the beginning
of a (normal) tag, and which characters are allowed for the following
characters ?
We can also change this to name only the characters that may
_not_ appear in the tag name ((re-search-forward "[^ />\"']*" nil t)).
> > - replace (setq tag-end (search-forward ">" limit t))
All occurrences of (search-forward ">") need to be changed and I suggest
to put this in a function on its own so that we can just replace the above
by (sgml-end-of-tag)
(we can use a simple re-search-forward, as suggested below)
Furthermore, I am not sure whether '<' may be used unescaped. If so,
then we need to use (sgml-beginning-of-tag) (which already exists).
In that case, I think it is a good idea to optimize this: rename the current
`sgml-beginning-of-tag' to something like `sgml-get-name-of-tag' and
write a faster `sgml-beginning-of-tag'.
Based on Stefan's regular expression, I tried this:
(defmacro sgml-beginning-of-tag()
"Skip to beginning of tag."
(re-search-backward "</?\\([^<\"']\\|'[^']*'\\|\"[^\"]*\"\\)*/?" nil t))
but this will stop at the wrong (quoted) '<' in e.g. '<a href =
"><x.png" border="><0">', probably due to the re-search-_backward_.
I will try continue to try to find the solution.
> > by something like this:
> > ;; exit once unquoted '/' or '>' is found:
> > ;; in HTML (SGML?), unquoted attribute values may only contain
> > ;; [A-Za-z0-9-.] (section 3.2.2 of the html4 spec);
> > ;; in xml all attribute values are quoted.
> > ;; TODO: maybe this can be done faster with a regular expression ?
> > ;; (something like sgml-start-tag-regexp)
> > (while (not (and (or (char-equal (char-after) ?/)
> > (char-equal (char-after) ?>))
> > (null quote-token)))
> >
> > (if (and (char-equal (char-after) ?\")
> > (not (char-equal (char-before) ?\\)))
> > ;; unescaped ?\"
> > (cond ((null quote-token)
> > ;; start of quoted content
> > (setq quote-token ?\"))
> > ;; end of quoted content
> > ((char-equal quote-token ?\")
> > (setq quote-token nil))
> > ;; quote-token == ?\' => part of quoted content => ignore
> > ))
>
> Use forward-sexp to jump over matched ". It's enormously faster.
I guess the re-search-forward is faster, right ?
> > (if (and (char-equal (char-after) ?\')
> > (not (char-equal (char-before) ?\\)))
> > ;; unescaped ?\'
> > (cond ((null quote-token)
> > ;; start of quoted content
> > (setq quote-token ?\'))
> > ;; end of quoted content
> > ((char-equal quote-token ?\')
> > (setq quote-token nil))
> > ;; quote-token == ?\" => part of quoted content => ignore
> > ))
> > (forward-char))
>
> Indeed the above looks like
>
> (re-search-forward "\\([^/>\"']\\|'[^']*'\\|\"[^\"]*\"\\)*[/>]" limit t)
I suggest to name this `sgml-end-of-tag' (note that for symmetry with
sgml-beginning-of-tag, this skips up to '>' and allows whitespace
before the '/', as in <br />):
(defmacro sgml-end-of-tag()
"Skip to end of tag (either '/' or '>')"
(re-search-forward "\\([^/>\"']\\|'[^']*'\\|\"[^\"]*\"\\)*/?" nil t))
> > - support `sgml-empty-tags': in xml-lite-parse-tag-backward:
> > add code in this block:
> > (t
> > (setq tag-type 'open
> > name (xml-lite-parse-tag-name)
> > name-end (point))
> > ;; check whether it's an empty tag
> > (if (and tag-end (eq ?/ (char-before (- tag-end 1))))
> > (setq tag-type 'empty)))
>
> That looks easy enough, indeed.
I use this helper-function which makes the code simpler:
(defun sgml-is-empty-tag-p(tag-name)
"Return t if tag is in `sgml-empty-tags'."
(if (null tag-name)
nil
(if sgml-xml
(member tag-name sgml-empty-tags)
(member-ignore-case tag-name sgml-empty-tags))))
(but you can just as well put the code "inline" because this is
the only place where sgml-is-empty-tag-p is used)
161c161,162
< (if (and tag-end (eq ?/ (char-before (- tag-end 1))))
---
> (if (or (and tag-end (eq ?/ (char-before (- tag-end 1))))
> (sgml-is-empty-tag-p name))
but this is not yet tested.
> > - support `sgml-unclosed-tags': we cannot get this perfectly right,
first we need this in sgml-mode.el:
(defvar sgml-unclosed-tags nil
"A list of elements for which the end-tag may be omitted.
In XML these elements should be closed or empty-element tags.
This variable is most useful when used file-locally
\(see C-h i m Emacs RET m Local Variables in Files RET)")
and a helper function:
(defun sgml-is-unclosed-tag-p(tag-name)
"Return t if tag is in `sgml-unclosed-tags'."
(if (null tag-name)
nil
(if sgml-xml
(member tag-name sgml-unclosed-tags)
(member-ignore-case tag-name sgml-unclosed-tags))))
(The check for null came in handy when I wrote my indenter,
so I suggest to include it; we can kick it out when we find that it
is useless)
> > because the dtd controls how missing end-tags are inferred. However,
> > with a set of simple rules you can make most cases work okay (my
> > "absolute" (non-relative) proof-of-concept indenter can now handle
> > unclosed <li> and <dl>'s):
> > 1. the end-tag of an `sgml-unclosed-tag' will be closed before its parent
> > closed:
> > <ul>
> > <li>
> > </ul>
> > (</li> kommt direkt vor </ul>)
>
> That could even be used for all tags (even those that are not
> in sgml-unclosed-tags).
I don't think we should do this just like sgml-close-tag (or
sgml-insert-end-tag) should tell the user when an end-tag is omitted
and it's not legal.
> > 2. an `sgml-unclosed-tag' is closed before another `sgml-unclosed-tag' is
> > openend (but this rule doesn't support i.e. <li><p>item1</p></li>..., I'll
> > have to improve this):
> > <ul>
> > <li>
> > <li>
> > </ul>
> > (the first <li> is closed before the second is opened)
> > or
> > <dl>
> > <dt>x
> > <dd>the dependent variable
> > <dt>y
> > <dd>another variable
> > </dl>
> > and so on. Next I will try to support <dl>'s.
>
> Indeed, I'm not sure what the rule should be. For sure seeing the same
> tag again implies the previous one is closed (this covers the <li><li>
> case above). But for mixes of unclosed tags, it's less clear. Maybe
Yes, the second rule definitely needs to be rethought.
> we can just use some kind of precedence scheme, but I'd rather first see
> how things turn out in practice with a trivial system.
What are you referring to when you say "trivial system" ?
> > But xml-lite has some xml-only features which are nice and is
> > very fast, so I think we might want to keep xml-lite as is and we use
> > sgml-close-tag and a new (slower) relative indenter for sgml-mode.
>
> I don't think that the changes to make it handle sgml indentation should
> make it noticeably slower.
Okay, something general about xml-lite.el/sgml-mode.el: If we support
jsp/php/asp etc. then the code you be more complicated.
That's why I suggest to tell users who need this to use html-helper-mode
(http://www.gest.unipd.it/~saint/hth.html) instead.
> Stefan
>
> PS: any reason why you took out the rest of the crowd from the Cc line ?
> I'm pretty sure Mike Williams would like to get copies of our discussion.
done :-)
--
Felix Natter
_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/emacs-devel
^ permalink raw reply [flat|nested] 5+ messages in thread