From mboxrd@z Thu Jan 1 00:00:00 1970 Path: quimby.gnus.org!not-for-mail From: Felix Natter Newsgroups: gmane.emacs.devel Subject: Re: xml-lite.el Date: 08 Mar 2002 22:41:58 +0100 Message-ID: <87pu2e4tzt.fsf@gmx.net> References: <87lmd42p1r.fsf@gmx.net> <200203072345.g27NjKh17379@rum.cs.yale.edu> NNTP-Posting-Host: quimby.gnus.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: quimby.gnus.org 1015625931 8228 80.91.224.244 (8 Mar 2002 22:18:51 GMT) X-Complaints-To: usenet@quimby.gnus.org NNTP-Posting-Date: 8 Mar 2002 22:18:51 GMT Cc: Mike Williams , emacs-devel@gnu.org, Sam Steingold , Eli Zaretskii , keichwa@gmx.net (Karl Eichwalder) Original-Received: from fencepost.gnu.org ([199.232.76.164]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 16jShO-00028c-00 for ; Fri, 08 Mar 2002 23:18:50 +0100 Original-Received: from localhost ([127.0.0.1] helo=fencepost.gnu.org) by fencepost.gnu.org with esmtp (Exim 3.34 #1 (Debian)) id 16jSgg-00053g-00; Fri, 08 Mar 2002 17:18:06 -0500 Original-Received: from mail.gmx.net ([213.165.64.20]) by fencepost.gnu.org with smtp (Exim 3.34 #1 (Debian)) id 16jSAD-0002ft-00 for ; Fri, 08 Mar 2002 16:44:33 -0500 Original-Received: (qmail 18439 invoked by uid 0); 8 Mar 2002 21:44:29 -0000 Original-Received: from pd9569fd6.dip.t-dialin.net (HELO couchpotato) (217.86.159.214) by mail.gmx.net (mp006-rz3) with SMTP; 8 Mar 2002 21:44:29 -0000 Original-To: "Stefan Monnier" In-Reply-To: <200203072345.g27NjKh17379@rum.cs.yale.edu> User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.1 Original-Lines: 221 Errors-To: emacs-devel-admin@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.0.5 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: Xref: quimby.gnus.org gmane.emacs.devel:1815 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:1815 "Stefan Monnier" writes: > > hi, > > > > here's what we have to do to spice xml-lite up for sgml: > > > > - allow Different tag-names The regular expressions in `xml-lite-parse-tag-name' need to be changed for SGML compatibility. Maybe Karl can help us out: Which characters are allowed at the beginning of a (normal) tag, and which characters are allowed for the following characters ? We can also change this to name only the characters that may _not_ appear in the tag name ((re-search-forward "[^ />\"']*" nil t)). > > - replace (setq tag-end (search-forward ">" limit t)) All occurrences of (search-forward ">") need to be changed and I suggest to put this in a function on its own so that we can just replace the above by (sgml-end-of-tag) (we can use a simple re-search-forward, as suggested below) Furthermore, I am not sure whether '<' may be used unescaped. If so, then we need to use (sgml-beginning-of-tag) (which already exists). In that case, I think it is a good idea to optimize this: rename the current `sgml-beginning-of-tag' to something like `sgml-get-name-of-tag' and write a faster `sgml-beginning-of-tag'. Based on Stefan's regular expression, I tried this: (defmacro sgml-beginning-of-tag() "Skip to beginning of tag." (re-search-backward "', probably due to the re-search-_backward_. I will try continue to try to find the solution. > > by something like this: > > ;; exit once unquoted '/' or '>' is found: > > ;; in HTML (SGML?), unquoted attribute values may only contain > > ;; [A-Za-z0-9-.] (section 3.2.2 of the html4 spec); > > ;; in xml all attribute values are quoted. > > ;; TODO: maybe this can be done faster with a regular expression ? > > ;; (something like sgml-start-tag-regexp) > > (while (not (and (or (char-equal (char-after) ?/) > > (char-equal (char-after) ?>)) > > (null quote-token))) > > > > (if (and (char-equal (char-after) ?\") > > (not (char-equal (char-before) ?\\))) > > ;; unescaped ?\" > > (cond ((null quote-token) > > ;; start of quoted content > > (setq quote-token ?\")) > > ;; end of quoted content > > ((char-equal quote-token ?\") > > (setq quote-token nil)) > > ;; quote-token == ?\' => part of quoted content => ignore > > )) > > Use forward-sexp to jump over matched ". It's enormously faster. I guess the re-search-forward is faster, right ? > > (if (and (char-equal (char-after) ?\') > > (not (char-equal (char-before) ?\\))) > > ;; unescaped ?\' > > (cond ((null quote-token) > > ;; start of quoted content > > (setq quote-token ?\')) > > ;; end of quoted content > > ((char-equal quote-token ?\') > > (setq quote-token nil)) > > ;; quote-token == ?\" => part of quoted content => ignore > > )) > > (forward-char)) > > Indeed the above looks like > > (re-search-forward "\\([^/>\"']\\|'[^']*'\\|\"[^\"]*\"\\)*[/>]" limit t) I suggest to name this `sgml-end-of-tag' (note that for symmetry with sgml-beginning-of-tag, this skips up to '>' and allows whitespace before the '/', as in
): (defmacro sgml-end-of-tag() "Skip to end of tag (either '/' or '>')" (re-search-forward "\\([^/>\"']\\|'[^']*'\\|\"[^\"]*\"\\)*/?" nil t)) > > - support `sgml-empty-tags': in xml-lite-parse-tag-backward: > > add code in this block: > > (t > > (setq tag-type 'open > > name (xml-lite-parse-tag-name) > > name-end (point)) > > ;; check whether it's an empty tag > > (if (and tag-end (eq ?/ (char-before (- tag-end 1)))) > > (setq tag-type 'empty))) > > That looks easy enough, indeed. I use this helper-function which makes the code simpler: (defun sgml-is-empty-tag-p(tag-name) "Return t if tag is in `sgml-empty-tags'." (if (null tag-name) nil (if sgml-xml (member tag-name sgml-empty-tags) (member-ignore-case tag-name sgml-empty-tags)))) (but you can just as well put the code "inline" because this is the only place where sgml-is-empty-tag-p is used) 161c161,162 < (if (and tag-end (eq ?/ (char-before (- tag-end 1)))) --- > (if (or (and tag-end (eq ?/ (char-before (- tag-end 1)))) > (sgml-is-empty-tag-p name)) but this is not yet tested. > > - support `sgml-unclosed-tags': we cannot get this perfectly right, first we need this in sgml-mode.el: (defvar sgml-unclosed-tags nil "A list of elements for which the end-tag may be omitted. In XML these elements should be closed or empty-element tags. This variable is most useful when used file-locally \(see C-h i m Emacs RET m Local Variables in Files RET)") and a helper function: (defun sgml-is-unclosed-tag-p(tag-name) "Return t if tag is in `sgml-unclosed-tags'." (if (null tag-name) nil (if sgml-xml (member tag-name sgml-unclosed-tags) (member-ignore-case tag-name sgml-unclosed-tags)))) (The check for null came in handy when I wrote my indenter, so I suggest to include it; we can kick it out when we find that it is useless) > > because the dtd controls how missing end-tags are inferred. However, > > with a set of simple rules you can make most cases work okay (my > > "absolute" (non-relative) proof-of-concept indenter can now handle > > unclosed
  • and
    's): > > 1. the end-tag of an `sgml-unclosed-tag' will be closed before its parent > > closed: > >
      > >
    • > >
    > > (
  • kommt direkt vor ) > > That could even be used for all tags (even those that are not > in sgml-unclosed-tags). I don't think we should do this just like sgml-close-tag (or sgml-insert-end-tag) should tell the user when an end-tag is omitted and it's not legal. > > 2. an `sgml-unclosed-tag' is closed before another `sgml-unclosed-tag' is > > openend (but this rule doesn't support i.e.
  • item1

  • ..., I'll > > have to improve this): > >
      > >
    • > >
    • > >
    > > (the first
  • is closed before the second is opened) > > or > >
    > >
    x > >
    the dependent variable > >
    y > >
    another variable > >
    > > and so on. Next I will try to support
    's. > > Indeed, I'm not sure what the rule should be. For sure seeing the same > tag again implies the previous one is closed (this covers the
  • > case above). But for mixes of unclosed tags, it's less clear. Maybe Yes, the second rule definitely needs to be rethought. > we can just use some kind of precedence scheme, but I'd rather first see > how things turn out in practice with a trivial system. What are you referring to when you say "trivial system" ? > > But xml-lite has some xml-only features which are nice and is > > very fast, so I think we might want to keep xml-lite as is and we use > > sgml-close-tag and a new (slower) relative indenter for sgml-mode. > > I don't think that the changes to make it handle sgml indentation should > make it noticeably slower. Okay, something general about xml-lite.el/sgml-mode.el: If we support jsp/php/asp etc. then the code you be more complicated. That's why I suggest to tell users who need this to use html-helper-mode (http://www.gest.unipd.it/~saint/hth.html) instead. > Stefan > > PS: any reason why you took out the rest of the crowd from the Cc line ? > I'm pretty sure Mike Williams would like to get copies of our discussion. done :-) -- Felix Natter _______________________________________________ Emacs-devel mailing list Emacs-devel@gnu.org http://mail.gnu.org/mailman/listinfo/emacs-devel