From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: mah@everybody.org (Mark A. Hershberger) Newsgroups: gmane.emacs.pretest.bugs,gmane.emacs.devel Subject: Refactoring xml.el namespace handling Date: Mon, 01 Mar 2004 09:59:10 -0600 Sender: emacs-pretest-bug-bounces+gebp-emacs-pretest-bug=gmane.org@gnu.org Message-ID: <87u117ew6g.fsf@weblog.localhost> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1078213637 23056 80.91.224.253 (2 Mar 2004 07:47:17 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 2 Mar 2004 07:47:17 +0000 (UTC) Original-X-From: emacs-pretest-bug-bounces+gebp-emacs-pretest-bug=gmane.org@gnu.org Tue Mar 02 08:47:06 2004 Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1Ay4cM-0003ld-00 for ; Tue, 02 Mar 2004 08:47:06 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.30) id 1Ay4Pe-0007gz-OX for gebp-emacs-pretest-bug@gmane.org; Tue, 02 Mar 2004 02:33:58 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.30) id 1Ay15I-0001Pq-Dw for emacs-pretest-bug@gnu.org; Mon, 01 Mar 2004 23:00:44 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.30) id 1Ay14k-0001Ah-MR for emacs-pretest-bug@gnu.org; Mon, 01 Mar 2004 23:00:42 -0500 Original-Received: from [204.251.8.130] (helo=superman.everybody.org) by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 4.30) id 1Ay14B-0000vy-Qv; Mon, 01 Mar 2004 22:59:36 -0500 Original-Received: from [68.92.76.68] (helo=weblog.localhost.everybody.org) by superman.everybody.org with asmtp (Exim 3.35 #1 (Debian)) id 1AxvSn-0007UN-00; Mon, 01 Mar 2004 16:00:37 -0600 Original-To: emacs-pretest-bug@gnu.org, emacs-devel@gnu.org X-URL: http://mah.everybody.org/weblog/ User-Agent: Gnus/5.110002 (No Gnus v0.2) Emacs/22.0.0 (gnu/linux) X-BeenThere: emacs-pretest-bug@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-pretest-bug-bounces+gebp-emacs-pretest-bug=gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.pretest.bugs:2291 gmane.emacs.devel:20223 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:20223 (Sent this earlier to just emacs-devel...) In late September, I proposed changing the way xml.el handles namespaces. That message is appended below. Because of a variety of things that happened between then and now, I never submitted a patch to make these changes. This patch differs slightly from the proposal in that namespaces are interned after being concatenated with `:'. Thus, Is parsed into: (((:urn . "top") (((:http://www\.w3\.org/2000/xmlns/ . "") . "urn")))) Or, without namespace aware parsing: ((top ((xmlns . "urn")))) 2004-02-29 Mark A. Hershberger * xml.el (xml-maybe-do-ns): new function to handle namespace expansion of qnames. (xml-parse-region): Update to work with namespace parsing. (xml-parse-tag): Change namespace parsing so that it produces (:uri . "lname") instead of {uri}lname. Avoid unneccessy interning. (xml-parse-attlist): Update to work with namespace parsing. *** xml.el 14 Jul 2003 20:41:12 -0000 1.23 --- xml.el 1 Mar 2004 15:53:50 -0000 *************** *** 52,66 **** ;;; LIST FORMAT ! ;; The functions `xml-parse-file' and `xml-parse-tag' return a list with ! ;; the following format: ;; ;; xml-list ::= (node node ...) ! ;; node ::= (tag_name attribute-list . child_node_list) ;; child_node_list ::= child_node child_node ... ;; child_node ::= node | string ! ;; tag_name ::= string ! ;; attribute_list ::= (("attribute" . "value") ("attribute" . "value") ...) ;; | nil ;; string ::= "..." ;; --- 52,66 ---- ;;; LIST FORMAT ! ;; The functions `xml-parse-file', `xml-parse-region' and ! ;; `xml-parse-tag' return a list with the following format: ;; ;; xml-list ::= (node node ...) ! ;; node ::= (qname attribute-list . child_node_list) ;; child_node_list ::= child_node child_node ... ;; child_node ::= node | string ! ;; qname ::= (:namespace-uri . "name") | "name" ! ;; attribute_list ::= ((qname . "value") (qname . "value") ...) ;; | nil ;; string ::= "..." ;; *************** *** 68,73 **** --- 68,78 ---- ;; Whitespace is preserved. Fixme: There should be a tree-walker that ;; can remove it. + ;; TODO: + ;; * xml:base, xml:space support + ;; * more complete DOCTYPE parsing + ;; * pi support + ;;; Code: ;; Note that {buffer-substring,match-string}-no-properties were *************** *** 214,220 **** (setq result (xml-parse-tag parse-dtd parse-ns)) (cond ((null result)) ! ((listp (car result)) (setq dtd (car result)) (if (cdr result) ; possible leading comment (add-to-list 'xml (cdr result)))) --- 226,233 ---- (setq result (xml-parse-tag parse-dtd parse-ns)) (cond ((null result)) ! ((and (listp (car result)) ! parse-dtd) (setq dtd (car result)) (if (cdr result) ; possible leading comment (add-to-list 'xml (cdr result)))) *************** *** 225,230 **** --- 238,274 ---- (cons dtd (nreverse xml)) (nreverse xml))))))) + (defun xml-maybe-do-ns (name default xml-ns) + "Perform any namespace expansion. NAME is the name to perform the expansion on. + DEFAULT is the default namespace. XML-NS is a cons of namespace + names to uris. When namespace-aware parsing is off, then XML-NS + is nil. + + During namespace-aware parsing, any name without a namespace is + put into the namespace identified by DEFAULT. nil is used to + specify that the name shouldn't be given a namespace." + (if (consp xml-ns) + (let* ((splitup (split-string name ":")) + (lname (or (nth 1 splitup) + (nth 0 splitup))) + (prefix (if (nth 1 splitup) + (nth 0 splitup) + default)) + (ns (progn + (if (and + (string-equal lname "xmlns") + (not prefix)) + (cdr (assoc "xmlns" xml-ns)) + (cdr (assoc prefix xml-ns)))))) + (if ns + (cons ns + (if (and + (string-equal lname "xmlns") + (not prefix)) + "" + lname)) + lname)) + (intern name))) (defun xml-parse-tag (&optional parse-dtd parse-ns) "Parse the tag at point. *************** *** 237,248 **** - a pair : the first element is the DTD, the second is the node." (let ((xml-ns (if (consp parse-ns) parse-ns ! (if parse-ns (list ! ;; Default no namespace ! (cons "" "") ;; We need to seed the xmlns namespace ! (cons "xmlns" "http://www.w3.org/2000/xmlns/")))))) (cond ;; Processing instructions (like the tag at the ;; beginning of a document). --- 281,292 ---- - a pair : the first element is the DTD, the second is the node." (let ((xml-ns (if (consp parse-ns) parse-ns ! (when parse-ns (list ! ;; "xml" namespace ! (cons "xml" :http://www.w3.org/XML/1998/namespace) ;; We need to seed the xmlns namespace ! (cons "xmlns" :http://www.w3.org/2000/xmlns/)))))) (cond ;; Processing instructions (like the tag at the ;; beginning of a document). *************** *** 278,338 **** (goto-char (match-end 1)) (let* ((node-name (match-string 1)) ;; Parse the attribute list. ! (children (list (xml-parse-attlist) (intern node-name))) pos) ;; add the xmlns:* attrs to our cache (when (consp xml-ns) (mapcar (lambda (attr) ! (let* ((splitup (split-string (symbol-name (car attr)) ":")) ! (prefix (nth 0 splitup)) ! (lname (nth 1 splitup))) ! (when (string= "xmlns" prefix) ! (setq xml-ns (append (list (cons (if lname ! lname ! "") ! (cdr attr))) ! xml-ns))))) ! (car children)) ! ! ;; expand element names ! (let* ((splitup (split-string (symbol-name (cadr children)) ":")) ! (lname (or (nth 1 splitup) ! (nth 0 splitup))) ! (prefix (if (nth 1 splitup) ! (nth 0 splitup) ! ""))) ! (setcdr children (list ! (intern (concat "{" ! (cdr (assoc-string prefix xml-ns)) ! "}" lname))))) ! ! ;; expand attribute names ! (mapcar ! (lambda (attr) ! (let* ((splitup (split-string (symbol-name (car attr)) ":")) ! (lname (or (nth 1 splitup) ! (nth 0 splitup))) ! (prefix (if (nth 1 splitup) ! (nth 0 splitup) ! (caar xml-ns)))) ! ! (setcar attr (intern (concat "{" ! (cdr (assoc-string prefix xml-ns)) ! "}" lname))))) (car children))) ! ;; is this an empty element ? ! (if (looking-at "/>") ! (progn ! (forward-char 2) ! (nreverse children)) ! ;; is this a valid start tag ? ! (if (eq (char-after) ?>) ! (progn ! (forward-char 1) ;; Now check that we have the right end-tag. Note that this ;; one might contain spaces after the tag name (let ((end (concat ""))) --- 322,354 ---- (goto-char (match-end 1)) (let* ((node-name (match-string 1)) ;; Parse the attribute list. ! (children (list (xml-parse-attlist xml-ns) node-name)) pos) ;; add the xmlns:* attrs to our cache (when (consp xml-ns) (mapcar (lambda (attr) ! (when (and (listp (car attr)) ! (eq :http://www.w3.org/2000/xmlns/ ! (caar attr))) ! (setq xml-ns (append (list (cons (cdar attr) ! (intern (concat ":" (cdr attr))))) ! xml-ns)))) (car children))) ! ;; expand element names ! (setcdr children (list (xml-maybe-do-ns (cadr children) "" xml-ns))) ! ;; is this an empty element ? ! (if (looking-at "/>") ! (progn ! (forward-char 2) ! (nreverse children)) ! ;; is this a valid start tag ? ! (if (eq (char-after) ?>) ! (progn ! (forward-char 1) ;; Now check that we have the right end-tag. Note that this ;; one might contain spaces after the tag name (let ((end (concat ""))) *************** *** 349,355 **** (setq pos (point)) (search-forward "<") (forward-char -1) ! (let ((string (buffer-substring pos (point))) (pos 0)) ;; Clean up the string. As per XML --- 365,372 ---- (setq pos (point)) (search-forward "<") (forward-char -1) ! (let ((string (buffer-substring ! pos (point))) (pos 0)) ;; Clean up the string. As per XML *************** *** 369,390 **** (cdr children)) (cons string children)))))))) ! (goto-char (match-end 0)) (nreverse children)) ! ;; This was an invalid start tag ! (error "XML: Invalid attribute list"))))) ! (t ;; This is not a tag. (error "XML: Invalid character"))))) ! (defun xml-parse-attlist () ! "Return the attribute-list after point.Leave point at the first non-blank character after the tag." (let ((attlist ()) start-pos name) (skip-syntax-forward " ") (while (looking-at (eval-when-compile (concat "\\(" xml-name-regexp "\\)\\s-*=\\s-*"))) ! (setq name (intern (match-string 1))) ! (goto-char (match-end 0)) ;; See also: http://www.w3.org/TR/2000/REC-xml-20001006#AVNormalize --- 386,410 ---- (cdr children)) (cons string children)))))))) ! (goto-char (match-end 0)) (nreverse children)) ! ;; This was an invalid Start tag ! (error "XML: Invalid attribute list"))))) ! (t ;; This is not a tag (error "XML: Invalid character"))))) ! ! (defun xml-parse-attlist (&optional xml-ns) ! "Return the attribute-list after point. Leave point at the first non-blank character after the tag." (let ((attlist ()) start-pos name) (skip-syntax-forward " ") (while (looking-at (eval-when-compile (concat "\\(" xml-name-regexp "\\)\\s-*=\\s-*"))) ! (let ((name (match-string 1)) ! (end (match-end 0))) ! (setq qname (xml-maybe-do-ns name nil xml-ns)) ! (goto-char end)) ;; See also: http://www.w3.org/TR/2000/REC-xml-20001006#AVNormalize *************** *** 395,403 **** (if (looking-at "'\\([^']*\\)'") (setq start-pos (match-beginning 0)) (error "XML: Attribute values must be given between quotes"))) - ;; Each attribute must be unique within a given element ! (if (assoc name attlist) (error "XML: each attribute must be unique within an element")) ;; Multiple whitespace characters should be replaced with a single one --- 415,422 ---- (if (looking-at "'\\([^']*\\)'") (setq start-pos (match-beginning 0)) (error "XML: Attribute values must be given between quotes"))) ;; Each attribute must be unique within a given element ! (if (assoc qname attlist) (error "XML: each attribute must be unique within an element")) ;; Multiple whitespace characters should be replaced with a single one *************** *** 405,415 **** (let ((string (match-string 1)) (pos 0)) (replace-regexp-in-string "\\s-\\{2,\\}" " " string) ! (push (cons name (xml-substitute-special string)) attlist)) (goto-char start-pos) (forward-sexp) ; we have string syntax - (skip-syntax-forward " ")) (nreverse attlist))) --- 424,433 ---- (let ((string (match-string 1)) (pos 0)) (replace-regexp-in-string "\\s-\\{2,\\}" " " string) ! (push (cons qname (xml-substitute-special string)) attlist)) (goto-char start-pos) (forward-sexp) ; we have string syntax (skip-syntax-forward " ")) (nreverse attlist))) *************** *** 488,494 **** ;; Translation of rule [45] of XML specifications ((looking-at ! "]+\\)>") (setq element (intern (match-string 1)) type (match-string-no-properties 2)) --- 506,512 ---- ;; Translation of rule [45] of XML specifications ((looking-at ! "]+\\)>") (setq element (intern (match-string 1)) type (match-string-no-properties 2)) *************** *** 517,528 **** (goto-char end-pos)) ((looking-at "")) ! (t ! (error "XML: Invalid DTD item"))) ;; Skip the end of the DTD ! (search-forward ">")))) (nreverse dtd))) --- 535,554 ---- (goto-char end-pos)) ((looking-at "")) ! ((looking-at "]+\\)>") ! ; Put the ENTITY in ! (goto-char (match-end 0))) ! ((looking-at "]+\\)>") ! ; Put in the ATTLIST ! (goto-char (match-end 0))) ! ((looking-at "]+\\)>") ! ; Put in the NOTATION ! (goto-char (match-end 0))) (t ! (error "XML: Invalid DTD item")))) ;; Skip the end of the DTD ! (search-forward ">"))) (nreverse dtd))) -- Peace is only better than war if peace isn't hell, too. -- Walker Percy, "The Second Coming" From: Mark A. Hershberger Subject: More XML Parsing: Is anyone using CVS xml.el namespace processing? Date: Sun, 28 Sep 2003 12:31:48 -0500 Recently I had a short conversation with James Clark about the structures xml.el produces. I asked him what he thought about the current (CVS) namespace-aware processing. Based on his feedback, I plan to submit changes that will return an incompatible structure to the one currently in CVS. Currently, when xml.el encounters a bit of XML like: it produces: (({uri:namespace}xml (({http://www\.w3\.org/2000/xmlns/}ns . "uri:namespace") ({uri:namespace}attr . "value")))) At the time that I wrote this, I saw some W3 docs where this style was used and copied it. Some people here asked me why I did this instead of something like (uri:namespace . "xml"), but I forged ahead. Now, after my conversation with Mr. Clark, I've been persuaded that I was wrong. At his suggestion, I'd like to change the above xml representation produce the following: (((uri:namespace . "xml") ((((http://www\.w3\.org/2000/xmlns/ . "ns") . "uri:namespace") ((uri:namespace . "attr") . "value"))))) As Mr. Clark said: ... there are typically not very many different namespace URIs, so keeping them in Emacs symbol table is not a problem; in the returned representation of the XML, the namespaces would be shared, but strings are mutable in Emacs, which is kind of ugly. Where there is no namespace given: It would produce the following: (("xml" (("attr" . "value")))) Unless there are major objections, I'd like to repent of my previous code and submit changes to produce the above. Mark.