From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Ricardo Wurmus Newsgroups: gmane.lisp.guile.bugs Subject: bug#20339: sxml simple: sxml->xml mishandles namespaces? Date: Tue, 12 Feb 2019 21:30:04 +0100 Message-ID: <87wom4iwc3.fsf@elephly.net> References: <20150415194714.GA30295@tuxteam.de> <87y45vln0f.fsf@pobox.com> <20160713132403.GA2349@tuxteam.de> <87furc1qeu.fsf@pobox.com> <87a7jbi8rx.fsf@elephly.net> <20190212095602.GD13448@tuxteam.de> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="39756"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: mu4e 1.0; emacs 26.1 Cc: 20339@debbugs.gnu.org To: tomas@tuxteam.de Original-X-From: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Wed Feb 13 01:17:11 2019 Return-path: Envelope-to: guile-bugs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1gtiEg-000ADZ-3y for guile-bugs@m.gmane.org; Wed, 13 Feb 2019 01:17:10 +0100 Original-Received: from localhost ([127.0.0.1]:48372 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gtiEf-0000hD-41 for guile-bugs@m.gmane.org; Tue, 12 Feb 2019 19:17:09 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:47531) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gtiEZ-0000gw-Ee for bug-guile@gnu.org; Tue, 12 Feb 2019 19:17:04 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gtiEY-0005YR-Dz for bug-guile@gnu.org; Tue, 12 Feb 2019 19:17:03 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:46283) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gtiEY-0005YE-5w for bug-guile@gnu.org; Tue, 12 Feb 2019 19:17:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1gtiEY-0006hs-14 for bug-guile@gnu.org; Tue, 12 Feb 2019 19:17:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Ricardo Wurmus Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Wed, 13 Feb 2019 00:17:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 20339 X-GNU-PR-Package: guile Original-Received: via spool by 20339-submit@debbugs.gnu.org id=B20339.155001698524126 (code B ref 20339); Wed, 13 Feb 2019 00:17:01 +0000 Original-Received: (at 20339) by debbugs.gnu.org; 13 Feb 2019 00:16:25 +0000 Original-Received: from localhost ([127.0.0.1]:45563 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gtiDw-0006Go-W2 for submit@debbugs.gnu.org; Tue, 12 Feb 2019 19:16:25 -0500 Original-Received: from sender-of-o51.zoho.com ([135.84.80.216]:21146) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gtiDt-0006Dl-H3 for 20339@debbugs.gnu.org; Tue, 12 Feb 2019 19:16:24 -0500 ARC-Seal: i=1; a=rsa-sha256; t=1550003411; cv=none; d=zoho.com; s=zohoarc; b=BzPTtnVsdtKj91FW2mGqniG3l3iowKk35iqDYogZPgAqd+YnojHAGc+bTfrVpmvsMBnjEwgKKDhUH4jr0i/ynsQxB8DZ/cIRPRmkaE2FPNI8A7bPQT9Y6Vw5EmtI9Ncp/Fcq93HaJZoAz8YkOjs6td3NDV0+9G5CyFynNjmimB4= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1550003411; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:To:ARC-Authentication-Results; bh=rgwjLTIHDGg8TKkjlK2JEo2aDgrYMkop9+qsNXVTdFg=; b=aaOxbRkEyXmyBHLQ5h0c6OIJvpCr8VWPgCjA7C8A/8abTz4dZMt3X/moVE1D/ywQzYdy9zvdjL/x2ok8hBr95i/I3fS7pLOiekTDLmSuReO0t5D5ggxpKUIbKxzW6f6vwftIstkHjXt8ZqPMiXuQs3lwXu/K44jC7h4wsqE2PIA= ARC-Authentication-Results: i=1; mx.zoho.com; dkim=pass header.i=elephly.net; spf=pass smtp.mailfrom=rekado@elephly.net; dmarc=pass header.from= header.from= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1550003411; s=zoho; d=elephly.net; i=rekado@elephly.net; h=References:From:To:Cc:Subject:In-reply-to:Date:Message-ID:MIME-Version:Content-Type:Content-Transfer-Encoding; l=1960; bh=rgwjLTIHDGg8TKkjlK2JEo2aDgrYMkop9+qsNXVTdFg=; b=W310mK9tZOQquXjTd4oWbgz0S99qWH3Hoeh97mk6kbSvXUzlrmsvhKeiM+JgzZK8 MWW2BqeIq38Ti42F44MQCtBRZVqt7cyVSoYNGTZuMD6HteMtMPM13B6Z6dGv1Z13o6q nQek8J4Ap5p2J0tvuj0+/mXnTYYsp6l9u107/eNA= Original-Received: from localhost (p3E9E9E6F.dip0.t-ipconnect.de [62.158.158.111]) by mx.zohomail.com with SMTPS id 1550003408876634.6826516030922; Tue, 12 Feb 2019 12:30:08 -0800 (PST) In-reply-to: <20190212095602.GD13448@tuxteam.de> X-URL: https://elephly.net X-PGP-Key: https://elephly.net/rekado.pubkey X-PGP-Fingerprint: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC X-ZohoMailClient: External X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Original-Sender: "bug-guile" Xref: news.gmane.org gmane.lisp.guile.bugs:9306 Archived-At: tomas@tuxteam.de writes: > As John has noted, the namespace mappings (i.e. the prefix -> namespace > URI binding) are kind of lexically scoped (I'd call it subtree scoped, > but structurally it is the same). While parsing is "easy" (assuming > well-formed XML), serializing is not unambiguous. The =E2=80=9Cfup=E2=80=9D handler of the parser visits every element and ha= s a list of namespaces that are in scope at this point. Its purpose is to return the SXML representation of that element. At this point we can record the namespaces as attributes. (That=E2=80=99s what the patch does.) When baking XML from SXML we don=E2=80=99t need to do anything special =E2= =80=94 we only need to convert everything to text, including the recorded namespace attributes. This isn=E2=80=99t pretty SXML (nor is it pretty XML), but it appears to be correct as none of the namespace information is lost. To get a better serialized representation the parser needs to do a better job of identifying =E2=80=9Cnew=E2=80=9D namespaces. > In a way, the library might want to be prepared to take hints from the > application (as far as the XML is to be read by humans, there might be > "better" and "worse" serializations). The XML produced when this patch is applied will not be pretty. To generate minimal/pretty XML knowledge of the parent elements=E2=80=99 names= paces is required =E2=80=94 knowledge that the parser=E2=80=99s =E2=80=9Cfup=E2= =80=9D handler does not have. We could try to alter the parser so that it not only passes the list of namespaces that are currently in scope, but also a list of namespaces that are in scope for the parent node. This would allow us to determine the list of *new* namespaces that absolutely must be declared for the current node. If there are no new namespaces we can simply ignore them and produce minimal SXML (and thus minimal XML later when the SXML is serialized). -- Ricardo