From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: mah@everybody.org (Mark A. Hershberger) Newsgroups: gmane.emacs.devel,gmane.emacs.pretest.bugs Subject: Re: Refactoring xml.el namespace handling Date: Fri, 05 Mar 2004 13:03:09 -0600 Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <87d67r2k2a.fsf@weblog.localhost> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1078867939 2088 80.91.224.253 (9 Mar 2004 21:32:19 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 9 Mar 2004 21:32:19 +0000 (UTC) Cc: emacs-pretest-bug@gnu.org, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Tue Mar 09 22:32:08 2004 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1B0opc-0005qk-00 for ; Tue, 09 Mar 2004 22:32:08 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1B0opb-0006jT-00 for ; Tue, 09 Mar 2004 22:32:08 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.30) id 1B0onu-0004TX-FQ for emacs-devel@quimby.gnus.org; Tue, 09 Mar 2004 16:30:22 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.30) id 1AzKcX-0004c2-QF for emacs-devel@gnu.org; Fri, 05 Mar 2004 14:04:29 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.30) id 1AzKc1-0004N0-Ba for emacs-devel@gnu.org; Fri, 05 Mar 2004 14:04:28 -0500 Original-Received: from [204.251.8.130] (helo=superman.everybody.org) by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 4.30) id 1AzKc0-0004Mx-NW; Fri, 05 Mar 2004 14:03:57 -0500 Original-Received: from [68.88.185.123] (helo=weblog.localhost.everybody.org) by superman.everybody.org with asmtp (Exim 3.35 #1 (Debian)) id 1AzKdq-0004t8-00; Fri, 05 Mar 2004 13:05:51 -0600 Original-To: Stefan Monnier X-URL: http://mah.everybody.org/weblog/ User-Agent: Gnus/5.110002 (No Gnus v0.2) Emacs/22.0.0 (gnu/linux) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:20280 gmane.emacs.pretest.bugs:2356 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:20280 > Thank you. Here is a counter patch. The main difference is that it uses > 'http://foo/bar rather than :http://foo/bar so as to avoid an unnecessary > (concat ":" foo) and also so that (symbol-name foo) immediately returns > a usable URL. There needs to be a way of differentiating between the (unlikely) namespace uri "nil" and "" (that is, no namespace), which is why I believe we need to stick with (concat ":" foo). Also, since I believe we need a prefix, using ':' means less work for the programmer since :symbols are automatically interned. (FWIW, it was James Clark who gave me this idea of using ':' as the prefix.) I confess that I'm not very moved by the argument of a usable URL since namespace URIs needn't be usable URLs. My understanding is that they are essentially opaque IDs. > It also cleans up a few elisp things (like replace mapcar->mapc->dolist, > and (append (list x) y) -> (cons x y), ...). Good. Thanks. > Things left: > - it seems that the new code returns either a TAG (a symbol) or (NS . TAG) > where TAG is a string rather than a symbol. Do I understand this right? > Is that done on purpose? It looks like a bad idea. It was done on purpose, but it is a bad idea. should parse into ((: . foo) (((:http://www.w3.org/2000/xmlns/ . "") ""))). Likewise, should parse into ((:nil . foo) (((:http://www.w3.org/2000/xmlns/ "") "nil") ((: . "a") "b"))) > - in xml-parse-tag, you do > > (let (.. (children (list A B)) ...) > ... (car children) ... (setcdr children ...) ... > > it would be better to do something slightly different so you don't > need to extract the car of what you just built and you don't need > to setcdr. Excellent. The patch itself was there to clean up some bad coding. Fixing even more is better. > ChangeLog text should use the present tense: it makes it easier to > write/read. Thanks. I'll remember it for future reference. I'm restricting this patch to namespace stuff. I'll put the DTD parsing updates in another patch where I hope to add an understanding of * xml.el (xml-maybe-do-ns): New function to handle namespace parsing of both attribute and element names. (xml-ns-parse-ns-attrs, xml-ns-expand-el, xml-ns-expand-attr, xml-intern-attrlist): Remove in favor of xml-maybe-do-ns. (xml-parse-tag): Update assumed namespaces. Clean up namespace parsing. (xml-parse-attlist): Make it do its own namespace parsing. --- xml.el 2 Mar 2004 21:45:06 -0000 1.30 +++ xml.el 5 Mar 2004 18:55:21 -0000 @@ -52,15 +52,15 @@ ;;; LIST FORMAT -;; The functions `xml-parse-file' and `xml-parse-tag' return a list with -;; the following format: +;; The functions `xml-parse-file', `xml-parse-region' and +;; `xml-parse-tag' return a list with the following format: ;; ;; xml-list ::= (node node ...) -;; node ::= (tag_name attribute-list . child_node_list) +;; node ::= (qname attribute-list . child_node_list) ;; child_node_list ::= child_node child_node ... ;; child_node ::= node | string -;; tag_name ::= string -;; attribute_list ::= (("attribute" . "value") ("attribute" . "value") ...) +;; qname ::= (:namespace-uri . "name") | "name" +;; attribute_list ::= ((qname . "value") (qname . "value") ...) ;; | nil ;; string ::= "..." ;; @@ -68,6 +68,11 @@ ;; Whitespace is preserved. Fixme: There should be a tree-walker that ;; can remove it. +;; TODO: +;; * xml:base, xml:space support +;; * more complete DOCTYPE parsing +;; * pi support + ;;; Code: ;; Note that {buffer-substring,match-string}-no-properties were @@ -230,72 +335,27 @@ (cons dtd (nreverse xml)) (nreverse xml))))))) -(defun xml-ns-parse-ns-attrs (attr-list &optional xml-ns) - "Parse the namespace attributes and return a list of cons in the form: -\(namespace . prefix)" - - (mapcar - (lambda (attr) - (let* ((splitup (split-string (car attr) ":")) - (prefix (nth 0 splitup)) - (lname (nth 1 splitup))) - (when (string= "xmlns" prefix) - (push (cons (if lname - lname - "") - (cdr attr)) - xml-ns)))) attr-list) - xml-ns) - -;; expand element names -(defun xml-ns-expand-el (el xml-ns) - "Expand the XML elements from \"prefix:local-name\" to a cons in the form -\"(namespace . local-name)\"." - - (let* ((splitup (split-string el ":")) - (lname (or (nth 1 splitup) - (nth 0 splitup))) - (prefix (if (nth 1 splitup) - (nth 0 splitup) - (if (string= lname "xmlns") - "xmlns" - ""))) - (ns (cdr (assoc-string prefix xml-ns)))) - (if (string= "" ns) - lname - (cons (intern (concat ":" ns)) - lname)))) - -;; expand attribute names -(defun xml-ns-expand-attr (attr-list xml-ns) - "Expand the attribute list for a particular element from the form -\"prefix:local-name\" to the form \"{namespace}:local-name\"." - - (mapcar - (lambda (attr) - (let* ((splitup (split-string (car attr) ":")) - (lname (or (nth 1 splitup) - (nth 0 splitup))) - (prefix (if (nth 1 splitup) - (nth 0 splitup) - (if (string= (car attr) "xmlns") - "xmlns" - ""))) - (ns (cdr (assoc-string prefix xml-ns)))) - (setcar attr - (if (string= "" ns) - lname - (cons (intern (concat ":" ns)) - lname))))) - attr-list) - attr-list) - -(defun xml-intern-attrlist (attr-list) - "Convert attribute names to symbols for backward compatibility." - (mapcar (lambda (attr) - (setcar attr (intern (car attr)))) - attr-list) - attr-list) +(defun xml-maybe-do-ns (name default xml-ns) + "Perform any namespace expansion. NAME is the name to perform the expansion on. +DEFAULT is the default namespace. XML-NS is a cons of namespace +names to uris. When namespace-aware parsing is off, then XML-NS +is nil. + +During namespace-aware parsing, any name without a namespace is +put into the namespace identified by DEFAULT. nil is used to +specify that the name shouldn't be given a namespace." + (if (consp xml-ns) + (let* ((nsp (string-match ":" name)) + (lname (if nsp (substring name (match-end 0)) name)) + (prefix (if nsp (substring name 0 (match-beginning 0)) default)) + (special (and (string-equal lname "xmlns") (not prefix))) + ;; Setting default to nil will insure that there is not + ;; matching cons in xml-ns. In which case we + (ns (or (cdr (assoc (if special "xmlns" prefix) + xml-ns)) + :))) + (cons ns (if special "" lname))) + (intern name))) (defun xml-parse-tag (&optional parse-dtd parse-ns) "Parse the tag at point. @@ -310,10 +370,12 @@ parse-ns (if parse-ns (list - ;; Default no namespace - (cons "" "") + ;; Default for empty prefix is no namespace + (cons "" :) + ;; "xml" namespace + (cons "xml" :http://www.w3.org/XML/1998/namespace) ;; We need to seed the xmlns namespace - (cons "xmlns" "http://www.w3.org/2000/xmlns/")))))) + (cons "xmlns" :http://www.w3.org/2000/xmlns/)))))) (cond ;; Processing instructions (like the tag at the ;; beginning of a document). @@ -350,19 +412,23 @@ ;; Parse this node (let* ((node-name (match-string 1)) - (attr-list (xml-parse-attlist)) - (children (if (consp xml-ns) ;; take care of namespace parsing - (progn - (setq xml-ns (xml-ns-parse-ns-attrs - attr-list xml-ns)) - (list (xml-ns-expand-attr - attr-list xml-ns) - (xml-ns-expand-el - node-name xml-ns))) - (list (xml-intern-attrlist attr-list) - (intern node-name)))) - pos) + ;; Parse the attribute list. + (attrs (xml-parse-attlist xml-ns)) + children pos) + + ;; add the xmlns:* attrs to our cache + (when (consp xml-ns) + (dolist (attr attrs) + (when (and (consp (car attr)) + (eq :http://www.w3.org/2000/xmlns/ + (caar attr))) + (push (cons (cdar attr) (intern (concat ":" (cdr attr)))) + xml-ns)))) + + ;; expand element names + (setq node-name (list (xml-maybe-do-ns node-name "" xml-ns))) + (setq children (list attrs node-name)) ;; is this an empty element ? (if (looking-at "/>") (progn @@ -416,7 +482,7 @@ (t ;; This is not a tag. (error "XML: Invalid character"))))) -(defun xml-parse-attlist () +(defun xml-parse-attlist (&optional xml-ns) "Return the attribute-list after point. Leave point at the first non-blank character after the tag." (let ((attlist ()) @@ -424,8 +490,9 @@ (skip-syntax-forward " ") (while (looking-at (eval-when-compile (concat "\\(" xml-name-regexp "\\)\\s-*=\\s-*"))) - (setq name (match-string 1)) - (goto-char (match-end 0)) + (setq end-pos (match-end 0)) + (setq name (xml-maybe-do-ns (match-string 1) nil xml-ns)) + (goto-char end-pos) ;; See also: http://www.w3.org/TR/2000/REC-xml-20001006#AVNormalize -- A choice between one man and a shovel, or a dozen men with teaspoons is clear to me, and I'm sure it is clear to you also. -- Zimran Ahmed