From: mah@everybody.org (Mark A. Hershberger)
Subject: Refactoring xml.el namespace handling
Date: Mon, 01 Mar 2004 09:59:10 -0600 [thread overview]
Message-ID: <87u117ew6g.fsf@weblog.localhost> (raw)
(Sent this earlier to just emacs-devel...)
In late September, I proposed changing the way xml.el handles
namespaces. That message is appended below.
Because of a variety of things that happened between then and now, I
never submitted a patch to make these changes.
This patch differs slightly from the proposal in that namespaces are
interned after being concatenated with `:'. Thus,
<top xmlns="urn"/>
Is parsed into:
(((:urn . "top")
(((:http://www\.w3\.org/2000/xmlns/ . "") . "urn"))))
Or, without namespace aware parsing:
((top ((xmlns . "urn"))))
2004-02-29 Mark A. Hershberger <mah@everybody.org>
* xml.el (xml-maybe-do-ns): new function to handle namespace
expansion of qnames.
(xml-parse-region): Update to work with namespace parsing.
(xml-parse-tag): Change namespace parsing so that it produces
(:uri . "lname") instead of {uri}lname. Avoid unneccessy
interning.
(xml-parse-attlist): Update to work with namespace parsing.
*** xml.el 14 Jul 2003 20:41:12 -0000 1.23
--- xml.el 1 Mar 2004 15:53:50 -0000
***************
*** 52,66 ****
;;; LIST FORMAT
! ;; The functions `xml-parse-file' and `xml-parse-tag' return a list with
! ;; the following format:
;;
;; xml-list ::= (node node ...)
! ;; node ::= (tag_name attribute-list . child_node_list)
;; child_node_list ::= child_node child_node ...
;; child_node ::= node | string
! ;; tag_name ::= string
! ;; attribute_list ::= (("attribute" . "value") ("attribute" . "value") ...)
;; | nil
;; string ::= "..."
;;
--- 52,66 ----
;;; LIST FORMAT
! ;; The functions `xml-parse-file', `xml-parse-region' and
! ;; `xml-parse-tag' return a list with the following format:
;;
;; xml-list ::= (node node ...)
! ;; node ::= (qname attribute-list . child_node_list)
;; child_node_list ::= child_node child_node ...
;; child_node ::= node | string
! ;; qname ::= (:namespace-uri . "name") | "name"
! ;; attribute_list ::= ((qname . "value") (qname . "value") ...)
;; | nil
;; string ::= "..."
;;
***************
*** 68,73 ****
--- 68,78 ----
;; Whitespace is preserved. Fixme: There should be a tree-walker that
;; can remove it.
+ ;; TODO:
+ ;; * xml:base, xml:space support
+ ;; * more complete DOCTYPE parsing
+ ;; * pi support
+
;;; Code:
;; Note that {buffer-substring,match-string}-no-properties were
***************
*** 214,220 ****
(setq result (xml-parse-tag parse-dtd parse-ns))
(cond
((null result))
! ((listp (car result))
(setq dtd (car result))
(if (cdr result) ; possible leading comment
(add-to-list 'xml (cdr result))))
--- 226,233 ----
(setq result (xml-parse-tag parse-dtd parse-ns))
(cond
((null result))
! ((and (listp (car result))
! parse-dtd)
(setq dtd (car result))
(if (cdr result) ; possible leading comment
(add-to-list 'xml (cdr result))))
***************
*** 225,230 ****
--- 238,274 ----
(cons dtd (nreverse xml))
(nreverse xml)))))))
+ (defun xml-maybe-do-ns (name default xml-ns)
+ "Perform any namespace expansion. NAME is the name to perform the expansion on.
+ DEFAULT is the default namespace. XML-NS is a cons of namespace
+ names to uris. When namespace-aware parsing is off, then XML-NS
+ is nil.
+
+ During namespace-aware parsing, any name without a namespace is
+ put into the namespace identified by DEFAULT. nil is used to
+ specify that the name shouldn't be given a namespace."
+ (if (consp xml-ns)
+ (let* ((splitup (split-string name ":"))
+ (lname (or (nth 1 splitup)
+ (nth 0 splitup)))
+ (prefix (if (nth 1 splitup)
+ (nth 0 splitup)
+ default))
+ (ns (progn
+ (if (and
+ (string-equal lname "xmlns")
+ (not prefix))
+ (cdr (assoc "xmlns" xml-ns))
+ (cdr (assoc prefix xml-ns))))))
+ (if ns
+ (cons ns
+ (if (and
+ (string-equal lname "xmlns")
+ (not prefix))
+ ""
+ lname))
+ lname))
+ (intern name)))
(defun xml-parse-tag (&optional parse-dtd parse-ns)
"Parse the tag at point.
***************
*** 237,248 ****
- a pair : the first element is the DTD, the second is the node."
(let ((xml-ns (if (consp parse-ns)
parse-ns
! (if parse-ns
(list
! ;; Default no namespace
! (cons "" "")
;; We need to seed the xmlns namespace
! (cons "xmlns" "http://www.w3.org/2000/xmlns/"))))))
(cond
;; Processing instructions (like the <?xml version="1.0"?> tag at the
;; beginning of a document).
--- 281,292 ----
- a pair : the first element is the DTD, the second is the node."
(let ((xml-ns (if (consp parse-ns)
parse-ns
! (when parse-ns
(list
! ;; "xml" namespace
! (cons "xml" :http://www.w3.org/XML/1998/namespace)
;; We need to seed the xmlns namespace
! (cons "xmlns" :http://www.w3.org/2000/xmlns/))))))
(cond
;; Processing instructions (like the <?xml version="1.0"?> tag at the
;; beginning of a document).
***************
*** 278,338 ****
(goto-char (match-end 1))
(let* ((node-name (match-string 1))
;; Parse the attribute list.
! (children (list (xml-parse-attlist) (intern node-name)))
pos)
;; add the xmlns:* attrs to our cache
(when (consp xml-ns)
(mapcar
(lambda (attr)
! (let* ((splitup (split-string (symbol-name (car attr)) ":"))
! (prefix (nth 0 splitup))
! (lname (nth 1 splitup)))
! (when (string= "xmlns" prefix)
! (setq xml-ns (append (list (cons (if lname
! lname
! "")
! (cdr attr)))
! xml-ns)))))
! (car children))
!
! ;; expand element names
! (let* ((splitup (split-string (symbol-name (cadr children)) ":"))
! (lname (or (nth 1 splitup)
! (nth 0 splitup)))
! (prefix (if (nth 1 splitup)
! (nth 0 splitup)
! "")))
! (setcdr children (list
! (intern (concat "{"
! (cdr (assoc-string prefix xml-ns))
! "}" lname)))))
!
! ;; expand attribute names
! (mapcar
! (lambda (attr)
! (let* ((splitup (split-string (symbol-name (car attr)) ":"))
! (lname (or (nth 1 splitup)
! (nth 0 splitup)))
! (prefix (if (nth 1 splitup)
! (nth 0 splitup)
! (caar xml-ns))))
!
! (setcar attr (intern (concat "{"
! (cdr (assoc-string prefix xml-ns))
! "}" lname)))))
(car children)))
! ;; is this an empty element ?
! (if (looking-at "/>")
! (progn
! (forward-char 2)
! (nreverse children))
! ;; is this a valid start tag ?
! (if (eq (char-after) ?>)
! (progn
! (forward-char 1)
;; Now check that we have the right end-tag. Note that this
;; one might contain spaces after the tag name
(let ((end (concat "</" node-name "\\s-*>")))
--- 322,354 ----
(goto-char (match-end 1))
(let* ((node-name (match-string 1))
;; Parse the attribute list.
! (children (list (xml-parse-attlist xml-ns) node-name))
pos)
;; add the xmlns:* attrs to our cache
(when (consp xml-ns)
(mapcar
(lambda (attr)
! (when (and (listp (car attr))
! (eq :http://www.w3.org/2000/xmlns/
! (caar attr)))
! (setq xml-ns (append (list (cons (cdar attr)
! (intern (concat ":" (cdr attr)))))
! xml-ns))))
(car children)))
! ;; expand element names
! (setcdr children (list (xml-maybe-do-ns (cadr children) "" xml-ns)))
! ;; is this an empty element ?
! (if (looking-at "/>")
! (progn
! (forward-char 2)
! (nreverse children))
! ;; is this a valid start tag ?
! (if (eq (char-after) ?>)
! (progn
! (forward-char 1)
;; Now check that we have the right end-tag. Note that this
;; one might contain spaces after the tag name
(let ((end (concat "</" node-name "\\s-*>")))
***************
*** 349,355 ****
(setq pos (point))
(search-forward "<")
(forward-char -1)
! (let ((string (buffer-substring pos (point)))
(pos 0))
;; Clean up the string. As per XML
--- 365,372 ----
(setq pos (point))
(search-forward "<")
(forward-char -1)
! (let ((string (buffer-substring
! pos (point)))
(pos 0))
;; Clean up the string. As per XML
***************
*** 369,390 ****
(cdr children))
(cons string children))))))))
! (goto-char (match-end 0))
(nreverse children))
! ;; This was an invalid start tag
! (error "XML: Invalid attribute list")))))
! (t ;; This is not a tag.
(error "XML: Invalid character")))))
! (defun xml-parse-attlist ()
! "Return the attribute-list after point.Leave point at the first non-blank character after the tag."
(let ((attlist ())
start-pos name)
(skip-syntax-forward " ")
(while (looking-at (eval-when-compile
(concat "\\(" xml-name-regexp "\\)\\s-*=\\s-*")))
! (setq name (intern (match-string 1)))
! (goto-char (match-end 0))
;; See also: http://www.w3.org/TR/2000/REC-xml-20001006#AVNormalize
--- 386,410 ----
(cdr children))
(cons string children))))))))
! (goto-char (match-end 0))
(nreverse children))
! ;; This was an invalid Start tag
! (error "XML: Invalid attribute list")))))
! (t ;; This is not a tag
(error "XML: Invalid character")))))
!
! (defun xml-parse-attlist (&optional xml-ns)
! "Return the attribute-list after point. Leave point at the first non-blank character after the tag."
(let ((attlist ())
start-pos name)
(skip-syntax-forward " ")
(while (looking-at (eval-when-compile
(concat "\\(" xml-name-regexp "\\)\\s-*=\\s-*")))
! (let ((name (match-string 1))
! (end (match-end 0)))
! (setq qname (xml-maybe-do-ns name nil xml-ns))
! (goto-char end))
;; See also: http://www.w3.org/TR/2000/REC-xml-20001006#AVNormalize
***************
*** 395,403 ****
(if (looking-at "'\\([^']*\\)'")
(setq start-pos (match-beginning 0))
(error "XML: Attribute values must be given between quotes")))
-
;; Each attribute must be unique within a given element
! (if (assoc name attlist)
(error "XML: each attribute must be unique within an element"))
;; Multiple whitespace characters should be replaced with a single one
--- 415,422 ----
(if (looking-at "'\\([^']*\\)'")
(setq start-pos (match-beginning 0))
(error "XML: Attribute values must be given between quotes")))
;; Each attribute must be unique within a given element
! (if (assoc qname attlist)
(error "XML: each attribute must be unique within an element"))
;; Multiple whitespace characters should be replaced with a single one
***************
*** 405,415 ****
(let ((string (match-string 1))
(pos 0))
(replace-regexp-in-string "\\s-\\{2,\\}" " " string)
! (push (cons name (xml-substitute-special string)) attlist))
(goto-char start-pos)
(forward-sexp) ; we have string syntax
-
(skip-syntax-forward " "))
(nreverse attlist)))
--- 424,433 ----
(let ((string (match-string 1))
(pos 0))
(replace-regexp-in-string "\\s-\\{2,\\}" " " string)
! (push (cons qname (xml-substitute-special string)) attlist))
(goto-char start-pos)
(forward-sexp) ; we have string syntax
(skip-syntax-forward " "))
(nreverse attlist)))
***************
*** 488,494 ****
;; Translation of rule [45] of XML specifications
((looking-at
! "<!ELEMENT\\s-+\\([[:alnum:].%;]+\\)\\s-+\\([^>]+\\)>")
(setq element (intern (match-string 1))
type (match-string-no-properties 2))
--- 506,512 ----
;; Translation of rule [45] of XML specifications
((looking-at
! "<!ELEMENT\\s-+\\([[:alnum:].%;-]+\\)\\s-+\\([^>]+\\)>")
(setq element (intern (match-string 1))
type (match-string-no-properties 2))
***************
*** 517,528 ****
(goto-char end-pos))
((looking-at "<!--")
(search-forward "-->"))
!
(t
! (error "XML: Invalid DTD item")))
;; Skip the end of the DTD
! (search-forward ">"))))
(nreverse dtd)))
--- 535,554 ----
(goto-char end-pos))
((looking-at "<!--")
(search-forward "-->"))
! ((looking-at "<!ENTITY\\s-+\\([[:alnum:].%;-]+\\)\\s-+\\([^>]+\\)>")
! ; Put the ENTITY in
! (goto-char (match-end 0)))
! ((looking-at "<!ATTLIST\\s+\\([[:alnum:].%;-]+\\)\\s+\\([^>]+\\)>")
! ; Put in the ATTLIST
! (goto-char (match-end 0)))
! ((looking-at "<!NOTATION\\s+\\([[:alnum:].%;-]+\\)\\s+\\([^>]+\\)>")
! ; Put in the NOTATION
! (goto-char (match-end 0)))
(t
! (error "XML: Invalid DTD item"))))
;; Skip the end of the DTD
! (search-forward ">")))
(nreverse dtd)))
--
Peace is only better than war if peace isn't hell, too.
-- Walker Percy, "The Second Coming"
From: Mark A. Hershberger
Subject: More XML Parsing: Is anyone using CVS xml.el namespace processing?
Date: Sun, 28 Sep 2003 12:31:48 -0500
Recently I had a short conversation with James Clark about the
structures xml.el produces.
I asked him what he thought about the current (CVS) namespace-aware
processing. Based on his feedback, I plan to submit changes that
will return an incompatible structure to the one currently in CVS.
Currently, when xml.el encounters a bit of XML like:
<ns:xml xmlns:ns="uri:namespace" ns:attr="value"/>
it produces:
(({uri:namespace}xml
(({http://www\.w3\.org/2000/xmlns/}ns . "uri:namespace")
({uri:namespace}attr . "value"))))
At the time that I wrote this, I saw some W3 docs where this style was
used and copied it. Some people here asked me why I did this instead
of something like (uri:namespace . "xml"), but I forged ahead.
Now, after my conversation with Mr. Clark, I've been persuaded that I
was wrong. At his suggestion, I'd like to change the above xml
representation produce the following:
(((uri:namespace . "xml")
((((http://www\.w3\.org/2000/xmlns/ . "ns") . "uri:namespace")
((uri:namespace . "attr") . "value")))))
As Mr. Clark said:
... there are typically not very many different namespace URIs, so
keeping them in Emacs symbol table is not a problem; in the
returned representation of the XML, the namespaces would be
shared, but strings are mutable in Emacs, which is kind of ugly.
Where there is no namespace given:
<xml attr="value">
It would produce the following:
(("xml"
(("attr" . "value"))))
Unless there are major objections, I'd like to repent of my previous
code and submit changes to produce the above.
Mark.
next reply other threads:[~2004-03-01 15:59 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-03-01 15:59 Mark A. Hershberger [this message]
2004-03-02 21:52 ` Refactoring xml.el namespace handling Stefan Monnier
2004-03-03 6:35 ` Mark A. Hershberger
2004-03-04 17:50 ` Stefan Monnier
2004-03-04 17:52 ` Stefan Monnier
2004-03-08 1:33 ` Kim F. Storm
2004-03-08 4:06 ` Mark A. Hershberger
2004-03-12 9:57 ` Richard Stallman
-- strict thread matches above, loose matches on Subject: below --
2004-03-05 19:03 Mark A. Hershberger
2004-03-05 19:29 ` Stefan Monnier
2004-04-14 18:36 ` Stefan Monnier
2004-04-16 3:14 ` Katsumi Yamaoka
2004-04-16 20:05 ` Mark A. Hershberger
2004-04-16 22:27 ` Stefan Monnier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87u117ew6g.fsf@weblog.localhost \
--to=mah@everybody.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).