Re: xml-lite.el

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Re: xml-lite.el
       [not found] ` <200203072345.g27NjKh17379@rum.cs.yale.edu>
@ 2002-03-08 21:41   ` Felix Natter
  2002-03-09 10:30     ` xml-lite.el Karl Eichwalder
  2002-03-09 10:49     ` xml-lite.el Mike Williams
  0 siblings, 2 replies; 5+ messages in thread
From: Felix Natter @ 2002-03-08 21:41 UTC (permalink / raw)
  Cc: Mike Williams, emacs-devel, Sam Steingold, Eli Zaretskii,
	Karl Eichwalder


"Stefan Monnier" <monnier+gnu/emacs@RUM.cs.yale.edu> writes:
> > hi,
> > 
> > here's what we have to do to spice xml-lite up for sgml:
> > 
> > - allow Different tag-names

The regular expressions in `xml-lite-parse-tag-name' need to be changed
for SGML compatibility.

Maybe Karl can help us out: Which characters are allowed at the beginning
of a (normal) tag, and which characters are allowed for the following
characters ?
We can also change this to name only the characters that may
_not_ appear in the tag name ((re-search-forward "[^ />\"']*" nil t)).

> > - replace (setq tag-end (search-forward ">" limit t))

All occurrences of (search-forward ">") need to be changed and I suggest
to put this in a function on its own so that we can just replace the above
by (sgml-end-of-tag)
(we can use a simple re-search-forward, as suggested below)

Furthermore, I am not sure whether '<' may be used unescaped. If so,
then we need to use (sgml-beginning-of-tag) (which already exists).

In that case, I think it is a good idea to optimize this: rename the current
`sgml-beginning-of-tag' to something like `sgml-get-name-of-tag' and
write a faster `sgml-beginning-of-tag'.


Based on Stefan's regular expression, I tried this:

(defmacro sgml-beginning-of-tag()
  "Skip to beginning of tag."
  (re-search-backward "</?\\([^<\"']\\|'[^']*'\\|\"[^\"]*\"\\)*/?" nil t))

but this will stop at the wrong (quoted) '<' in e.g. '<a href =
"><x.png" border="><0">', probably due to the re-search-_backward_.

I will try continue to try to find the solution.

> > by something like this:
> > ;; exit once unquoted '/' or '>' is found:
> > ;; in HTML (SGML?), unquoted attribute values may only contain
> > ;; [A-Za-z0-9-.]  (section 3.2.2 of the html4 spec);
> > ;; in xml all attribute values are quoted.
> > ;; TODO: maybe this can be done faster with a regular expression ?
> > ;; (something like sgml-start-tag-regexp)
> > (while (not (and (or (char-equal (char-after) ?/)
> > 		     (char-equal (char-after) ?>))
> > 		 (null quote-token)))
> > 
> >   (if (and (char-equal (char-after) ?\")
> > 	   (not (char-equal (char-before) ?\\)))
> >       ;; unescaped ?\"
> >       (cond ((null quote-token)
> > 	     ;; start of quoted content
> > 	     (setq quote-token ?\"))
> > 	    ;; end of quoted content
> > 	    ((char-equal quote-token ?\")
> > 	     (setq quote-token nil))
> > 	    ;; quote-token == ?\' => part of quoted content => ignore
> > 	    ))
> 
> Use forward-sexp to jump over matched ".  It's enormously faster.

I guess the re-search-forward is faster, right ?
 
> >   (if (and (char-equal (char-after) ?\')
> > 	   (not (char-equal (char-before) ?\\)))
> >       ;; unescaped ?\'
> >       (cond ((null quote-token)
> > 	     ;; start of quoted content
> > 	     (setq quote-token ?\'))
> > 	    ;; end of quoted content
> > 	    ((char-equal quote-token ?\')
> > 	     (setq quote-token nil))
> > 	    ;; quote-token == ?\" => part of quoted content => ignore
> > 	    ))
> >   (forward-char))
> 
> Indeed the above looks like
> 
>   (re-search-forward "\\([^/>\"']\\|'[^']*'\\|\"[^\"]*\"\\)*[/>]" limit t)

I suggest to name this `sgml-end-of-tag' (note that for symmetry with
sgml-beginning-of-tag, this skips up to '>' and allows whitespace
before the '/', as in <br />):

(defmacro sgml-end-of-tag()
  "Skip to end of tag (either '/' or '>')"
  (re-search-forward "\\([^/>\"']\\|'[^']*'\\|\"[^\"]*\"\\)*/?" nil t))
 
> > - support `sgml-empty-tags': in xml-lite-parse-tag-backward:
> > add code in this block:
> >        (t
> >         (setq tag-type 'open
> >               name (xml-lite-parse-tag-name)
> >               name-end (point))
> >         ;; check whether it's an empty tag
> >         (if (and tag-end (eq ?/ (char-before (- tag-end 1))))
> >             (setq tag-type 'empty)))
> 
> That looks easy enough, indeed.

I use this helper-function which makes the code simpler:

(defun sgml-is-empty-tag-p(tag-name)
  "Return t if tag is in `sgml-empty-tags'."
  (if (null tag-name)
      nil
    (if sgml-xml
	(member tag-name sgml-empty-tags)
      (member-ignore-case tag-name sgml-empty-tags))))

(but you can just as well put the code "inline" because this is
the only place where sgml-is-empty-tag-p is used)

161c161,162
<         (if (and tag-end (eq ?/ (char-before (- tag-end 1))))
---
>         (if (or (and tag-end (eq ?/ (char-before (- tag-end 1))))
> 		(sgml-is-empty-tag-p name))


but this is not yet tested.

> > - support `sgml-unclosed-tags': we cannot get this perfectly right,

first we need this in sgml-mode.el:

(defvar sgml-unclosed-tags nil
  "A list of elements for which the end-tag may be omitted.
In XML these elements should be closed or empty-element tags.
This variable is most useful when used file-locally
\(see C-h i m Emacs RET m Local Variables in Files RET)")

and a helper function:

(defun sgml-is-unclosed-tag-p(tag-name)
  "Return t if tag is in `sgml-unclosed-tags'."
  (if (null tag-name)
      nil
    (if sgml-xml
	(member tag-name sgml-unclosed-tags)
      (member-ignore-case tag-name sgml-unclosed-tags))))

(The check for null came in handy when I wrote my indenter,
so I suggest to include it; we can kick it out when we find that it
is useless)

> > because the dtd controls how missing end-tags are inferred. However,
> > with a set of simple rules you can make most cases work okay (my
> > "absolute" (non-relative) proof-of-concept indenter can now handle
> > unclosed <li> and <dl>'s):
> > 1. the end-tag of an `sgml-unclosed-tag' will be closed before its parent
> >    closed:
> > <ul>
> >    <li>
> > </ul>
> > (</li> kommt direkt vor </ul>)
> 
> That could even be used for all tags (even those that are not
> in sgml-unclosed-tags).

I don't think we should do this just like sgml-close-tag (or
sgml-insert-end-tag) should tell the user when an end-tag is omitted
and it's not legal.
 
> > 2. an `sgml-unclosed-tag' is closed before another `sgml-unclosed-tag' is
> >    openend (but this rule doesn't support i.e. <li><p>item1</p></li>..., I'll
> >    have to improve this):
> > <ul>
> >   <li>
> >   <li>
> > </ul>
> > (the first <li> is closed before the second is opened)
> > or
> > <dl>
> >  <dt>x
> >  <dd>the dependent variable
> >  <dt>y
> >  <dd>another variable
> > </dl>
> > and so on. Next I will try to support <dl>'s.
> 
> Indeed, I'm not sure what the rule should be.  For sure seeing the same
> tag again implies the previous one is closed (this covers the <li><li>
> case above).  But for mixes of unclosed tags, it's less clear.  Maybe

Yes, the second rule definitely needs to be rethought.

> we can just use some kind of precedence scheme, but I'd rather first see
> how things turn out in practice with a trivial system.

What are you referring to when you say "trivial system" ?
 
> > But xml-lite has some xml-only features which are nice and is
> > very fast, so I think we might want to keep xml-lite as is and we use
> > sgml-close-tag and a new (slower) relative indenter for sgml-mode.
> 
> I don't think that the changes to make it handle sgml indentation should
> make it noticeably slower.

Okay, something general about xml-lite.el/sgml-mode.el: If we support
jsp/php/asp etc. then the code you be more complicated.

That's why I suggest to tell users who need this to use html-helper-mode
(http://www.gest.unipd.it/~saint/hth.html) instead.

> 	Stefan
> 
> PS: any reason why you took out the rest of the crowd from the Cc line ?
>     I'm pretty sure Mike Williams would like to get copies of our discussion.

done :-)

-- 
Felix Natter


_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/emacs-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: xml-lite.el
  2002-03-08 21:41   ` xml-lite.el Felix Natter
@ 2002-03-09 10:30     ` Karl Eichwalder
  2002-03-09 16:24       ` xml-lite.el Felix Natter
  2002-03-09 10:49     ` xml-lite.el Mike Williams
  1 sibling, 1 reply; 5+ messages in thread
From: Karl Eichwalder @ 2002-03-09 10:30 UTC (permalink / raw)
  Cc: Stefan Monnier, Mike Williams, emacs-devel, Sam Steingold,
	Eli Zaretskii

Felix Natter <fnatter@gmx.net> writes:

> Which characters are allowed at the beginning of a (normal) tag, and
> which characters are allowed for the following characters ?

For XML?  Please, check the XML Standard at www.w3.org; IIRC, Unicode
characters are even allowed for tag names.  You can issue the following
command if you want to know whether "_:d-2" is allowed as an element
name:

{ 
  cat /usr/share/sgml/openjade/xml.dcl
  echo '<!DOCTYPE _:d-2. [ <!ELEMENT _:d-2. (#PCDATA)> ]><_:d-2.>x</_:d-2.>'
} | onsgmls -wxml
onsgmls:<OSFD>0:1:W: SGML declaration was not implied
(_:d-2.
-x
)_:d-2.
C

The warning isn't fatal.  As you can see allowed names depend on the
SGML/XML declaration in use.

> That's why I suggest to tell users who need this to use html-helper-mode
> (http://www.gest.unipd.it/~saint/hth.html) instead.

Others are happy with psgml plus xxml.el (plus some self written macros).

-- 
ke@suse.de (work) / keichwa@gmx.net (home):              |
http://www.suse.de/~ke/                                  |      ,__o
Free Translation Project:                                |    _-\_<,
http://www.iro.umontreal.ca/contrib/po/HTML/             |   (*)/'(*)

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/emacs-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: xml-lite.el
  2002-03-09 10:30     ` xml-lite.el Karl Eichwalder
@ 2002-03-09 16:24       ` Felix Natter
  2002-03-09 16:49         ` xml-lite.el Karl Eichwalder
  0 siblings, 1 reply; 5+ messages in thread
From: Felix Natter @ 2002-03-09 16:24 UTC (permalink / raw)
  Cc: Stefan Monnier, Mike Williams, emacs-devel, Sam Steingold,
	Eli Zaretskii

Karl Eichwalder <ke@gnu.franken.de> writes:

> Felix Natter <fnatter@gmx.net> writes:
> 
> > Which characters are allowed at the beginning of a (normal) tag, and
> > which characters are allowed for the following characters ?
> 
> For XML?  Please, check the XML Standard at www.w3.org; IIRC, Unicode
> characters are even allowed for tag names.  You can issue the following
> command if you want to know whether "_:d-2" is allowed as an element
> name:
> 
> { 
>   cat /usr/share/sgml/openjade/xml.dcl
>   echo '<!DOCTYPE _:d-2. [ <!ELEMENT _:d-2. (#PCDATA)> ]><_:d-2.>x</_:d-2.>'
> } | onsgmls -wxml
> onsgmls:<OSFD>0:1:W: SGML declaration was not implied
> (_:d-2.
> -x
> )_:d-2.
> C
> 
> The warning isn't fatal.  As you can see allowed names depend on the
> SGML/XML declaration in use.

Sorry, we are in the process of modifying xml-lite.el so that it works
with _SGML_, so I was asking about SGML.
 
> > That's why I suggest to tell users who need this to use html-helper-mode
> > (http://www.gest.unipd.it/~saint/hth.html) instead.
> 
> Others are happy with psgml plus xxml.el (plus some self written macros).

Yes, but psgml doesn't support things like <?php ?>, jsp and asp.

-- 
Felix Natter


_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/emacs-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: xml-lite.el
  2002-03-09 16:24       ` xml-lite.el Felix Natter
@ 2002-03-09 16:49         ` Karl Eichwalder
  0 siblings, 0 replies; 5+ messages in thread
From: Karl Eichwalder @ 2002-03-09 16:49 UTC (permalink / raw)
  Cc: Stefan Monnier, Mike Williams, emacs-devel, Sam Steingold,
	Eli Zaretskii

Felix Natter <fnatter@gmx.net> writes:

> Sorry, we are in the process of modifying xml-lite.el so that it works
> with _SGML_, so I was asking about SGML.

For SGML it depends on the SGML declaration enforced by your DTD.  Most
common is the so called Reference Conrete Syntax with increased
limits.  You can try the defacto "standard" implemented by not too old
versions of (o)nsgmls:

echo '<!DOCTYPE d-_2. [ <!ELEMENT d-_2. - - (#PCDATA)> ]><d-_2.>x</d-_2.>' \
  | onsgmls
onsgmls:<OSFD>0:1:12:E: character "_" invalid: only delimiter ">", delimiter "[", "PUBLIC", "SYSTEM" and parameter separators allowed
onsgmls:<OSFD>0:1:12:E: cannot continue because of previous errors

> Yes, but psgml doesn't support things like <?php ?>, jsp and asp.

Extension or work arounds were already posted to the psgml mailing
lists.

-- 
ke@suse.de (work) / keichwa@gmx.net (home):              |
http://www.suse.de/~ke/                                  |      ,__o
Free Translation Project:                                |    _-\_<,
http://www.iro.umontreal.ca/contrib/po/HTML/             |   (*)/'(*)

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/emacs-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: xml-lite.el
  2002-03-08 21:41   ` xml-lite.el Felix Natter
  2002-03-09 10:30     ` xml-lite.el Karl Eichwalder
@ 2002-03-09 10:49     ` Mike Williams
  1 sibling, 0 replies; 5+ messages in thread
From: Mike Williams @ 2002-03-09 10:49 UTC (permalink / raw)
  Cc: Stefan Monnier, Mike Williams, emacs-devel, Sam Steingold,
	Eli Zaretskii, Karl Eichwalder

  >> PS: any reason why you took out the rest of the crowd from the Cc line ?
  >> I'm pretty sure Mike Williams would like to get copies of our discussion.

  Felix> done :-)

Cheers :-)

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/emacs-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2002-03-09 16:49 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <87lmd42p1r.fsf@gmx.net>
     [not found] ` <200203072345.g27NjKh17379@rum.cs.yale.edu>
2002-03-08 21:41   ` xml-lite.el Felix Natter
2002-03-09 10:30     ` xml-lite.el Karl Eichwalder
2002-03-09 16:24       ` xml-lite.el Felix Natter
2002-03-09 16:49         ` xml-lite.el Karl Eichwalder
2002-03-09 10:49     ` xml-lite.el Mike Williams

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).