From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: MON KEY Newsgroups: gmane.emacs.bugs Subject: bug#4950: `xml-parse-file' returns incorrect results strings after `>' before `<' when CR\LF TAB+ Date: Tue, 17 Nov 2009 17:12:37 -0500 Message-ID: Reply-To: MON KEY , 4950@emacsbugs.donarmstrong.com NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: ger.gmane.org 1258496869 15726 80.91.229.12 (17 Nov 2009 22:27:49 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 17 Nov 2009 22:27:49 +0000 (UTC) To: bug-gnu-emacs@gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Nov 17 23:27:42 2009 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1NAWWT-0000kx-UK for geb-bug-gnu-emacs@m.gmane.org; Tue, 17 Nov 2009 23:27:42 +0100 Original-Received: from localhost ([127.0.0.1]:60860 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NAWWS-0007hQ-Sd for geb-bug-gnu-emacs@m.gmane.org; Tue, 17 Nov 2009 17:27:40 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NAWWK-0007g8-0V for bug-gnu-emacs@gnu.org; Tue, 17 Nov 2009 17:27:32 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NAWWF-0007dR-0b for bug-gnu-emacs@gnu.org; Tue, 17 Nov 2009 17:27:31 -0500 Original-Received: from [199.232.76.173] (port=41274 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NAWWE-0007dB-Or for bug-gnu-emacs@gnu.org; Tue, 17 Nov 2009 17:27:26 -0500 Original-Received: from rzlab.ucr.edu ([138.23.92.77]:42615) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1NAWWD-0002q3-Vk for bug-gnu-emacs@gnu.org; Tue, 17 Nov 2009 17:27:26 -0500 Original-Received: from rzlab.ucr.edu (rzlab.ucr.edu [127.0.0.1]) by rzlab.ucr.edu (8.14.3/8.14.3/Debian-5) with ESMTP id nAHMRK52028858; Tue, 17 Nov 2009 14:27:20 -0800 Original-Received: (from debbugs@localhost) by rzlab.ucr.edu (8.14.3/8.14.3/Submit) id nAHMK3FN028107; Tue, 17 Nov 2009 14:20:03 -0800 Resent-Date: Tue, 17 Nov 2009 14:20:03 -0800 X-Loop: owner@emacsbugs.donarmstrong.com Resent-From: MON KEY Original-Sender: stan@derbycityprints.com Resent-To: bug-submit-list@donarmstrong.com Resent-CC: Emacs Bugs 2Resent-Date: Tue, 17 Nov 2009 22:20:03 +0000 Resent-Message-ID: Resent-Sender: owner@emacsbugs.donarmstrong.com X-Emacs-PR-Message: report 4950 X-Emacs-PR-Package: emacs X-Emacs-PR-Keywords: Original-Received: via spool by submit@emacsbugs.donarmstrong.com id=B.125849596727537 (code B ref -1); Tue, 17 Nov 2009 22:20:03 +0000 Original-Received: (at submit) by emacsbugs.donarmstrong.com; 17 Nov 2009 22:12:47 +0000 X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. Original-Received: from lists.gnu.org (lists.gnu.org [199.232.76.165]) by rzlab.ucr.edu (8.14.3/8.14.3/Debian-5) with ESMTP id nAHMCisp027533 for ; Tue, 17 Nov 2009 14:12:46 -0800 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NAWHz-0001Iv-AZ for bug-gnu-emacs@gnu.org; Tue, 17 Nov 2009 17:12:43 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NAWHu-0001I1-Ie for bug-gnu-emacs@gnu.org; Tue, 17 Nov 2009 17:12:42 -0500 Original-Received: from [199.232.76.173] (port=43490 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NAWHu-0001Hy-Dk for bug-gnu-emacs@gnu.org; Tue, 17 Nov 2009 17:12:38 -0500 Original-Received: from mail-yw0-f177.google.com ([209.85.211.177]:46334) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NAWHu-0000Vr-6f for bug-gnu-emacs@gnu.org; Tue, 17 Nov 2009 17:12:38 -0500 Original-Received: by ywh7 with SMTP id 7so528071ywh.24 for ; Tue, 17 Nov 2009 14:12:37 -0800 (PST) Original-Received: by 10.150.23.10 with SMTP id 10mr963121ybw.329.1258495957241; Tue, 17 Nov 2009 14:12:37 -0800 (PST) X-Google-Sender-Auth: 6006def3b38056db X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 2) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 2) Resent-Date: Tue, 17 Nov 2009 17:27:31 -0500 X-BeenThere: bug-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:32674 Archived-At: `xml-parse-file' returns incorrect results strings after `>' before `<' when CR\LF TAB+ `xml-parse-file' fails to retrun correct results when there are ^C-j (e.g. CR\LF) followed by \t+ e.g TAB+ after a tag's trailing `>' and before the next tag's leading `<'. IOW the following: ,---- | CR\LF | TAB TAB TAB `---- Returns (:NOTE with my pp-ing to help clarify the problem): ,---- | (ELEMENT nil | ((attr1 . "a1") | (attr2 . "a2") | (attr3 . "a3") | (attr4 . "a4") | (attr5 . "a5") " | " ;; <-i.e. (mapconcat #'char-to-string '(32 10 9 9 9) "") | (NEXT-NODE nil (... `---- Is it if fair/safe to assume that where these types of sequences occur they are not part of the XML and can be removed with a regexp? E.g. : ,---- | (while (search-forward-regexp "\"\)\n[\[:blank:]]+\"\)" nil t) | (replace-match "")) `---- or perhaps: ,---- | (defun cln-xml<-parsed (fname &optional insertp intrp) | "Strip non-sensical strings created by xml-parse-file because of | CR\LF TAB+ following tags/elements. | FNAME is an XML filename path to parse and clean. | When INSERTP is non-nil or called-interactively insert pretty printed lisp | representation of XML file at point. Does not move point." | (interactive "fXML file to parse: \ni\np") | (let (get-xml) | (setq get-xml | (with-temp-buffer | (prin1 (xml-parse-file fname) (current-buffer)) | (goto-char (point-min)) | (while (search-forward-regexp | "\\( \"\n[\[:blank:]]+\\)\"\\(\\(\\()\\)\\|\\( (\\)\\)\\)" nil t) | ;;^^1^^^^^^^^^^^^^^^^^^^^^^^^^2^^3^^^^^^^^^^^^4^^^^^^^^^^^^ | (replace-match "\\2")) | (pp-buffer) | (buffer-substring-no-properties (point-min) (point-max)))) | (if (or insertp intrp) | (save-excursion | (newline) | (princ get-xml (current-buffer))) | get-xml))) `---- :SEE-ALSO (URL `http://lists.gnu.org/archive/html/bug-gnu-emacs/2001-11/msg00052.html') s_P