From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: =?utf-8?Q?Jostein_Kj=C3=B8nigsen?= Newsgroups: gmane.emacs.devel Subject: Adding new schemas to nxml-mode. Am I doing it right? Date: Sun, 18 Feb 2024 21:12:51 +0100 Message-ID: Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.200.91.1.1\)) Content-Type: multipart/alternative; boundary="Apple-Mail=_9E700C54-2AFA-4F39-B4E7-5B0A3C153BA3" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="13268"; mail-complaints-to="usenet@ciao.gmane.io" To: "Ergus via Emacs development discussions." Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Feb 18 21:13:58 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1rbnXx-0003Fv-1I for ged-emacs-devel@m.gmane-mx.org; Sun, 18 Feb 2024 21:13:58 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rbnXC-00059M-SB; Sun, 18 Feb 2024 15:13:10 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rbnXA-00058W-QU for emacs-devel@gnu.org; Sun, 18 Feb 2024 15:13:08 -0500 Original-Received: from fhigh4-smtp.messagingengine.com ([103.168.172.155]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rbnX8-0005fR-Rx for emacs-devel@gnu.org; Sun, 18 Feb 2024 15:13:08 -0500 Original-Received: from compute6.internal (compute6.nyi.internal [10.202.2.47]) by mailfhigh.nyi.internal (Postfix) with ESMTP id 82E241140064 for ; Sun, 18 Feb 2024 15:13:04 -0500 (EST) Original-Received: from mailfrontend1 ([10.202.2.162]) by compute6.internal (MEProxy); Sun, 18 Feb 2024 15:13:04 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= secure.kjonigsen.net; h=cc:content-type:content-type:date:date :from:from:in-reply-to:message-id:mime-version:reply-to:subject :subject:to:to; s=fm3; t=1708287184; x=1708373584; bh=CmMmdD/qvO kS5F30UurcxB7HL3GWV+x4x9vNgDOglHU=; b=lQ9VU4vbJZxB412RZhAy8EeSU8 G6V1EkdNN1ZuZM+kWo8LfP63s77jxEAdslSCb2UXxj1he8wWPNrjKqNZZK4DAWeR gLwbWooAj/Sj7341QjFA8GLPY9m+FHd3t3s0720obPMyY3BOhuNyr+OTCijl06gW tBop4FYMQjaMbr2GncYOmeuds53rabYQUC+di3OhW58CB9bj5Sayu85jvB7tFK1k wY4j6hLw720928AjffXlYWuUAYsyKgEArUMAVhNhne92BkRJcwD0pkq5VPusRMp+ t2ZBqB5KCUAtKNUWcUNw57jpf7KpdfmdZfcibRFLkIDaD47+OFTckzdLmqyQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:message-id :mime-version:reply-to:subject:subject:to:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t= 1708287184; x=1708373584; bh=CmMmdD/qvOkS5F30UurcxB7HL3GWV+x4x9v NgDOglHU=; b=NT3XuMRJXdXKSauH6IWP9a2AD+y2inUKzg5IMf95iB0HC8gOWM7 PTSaIZ795jmvk3fEJ6SB9F6V+/2/wDxhPgnYBn8Adxkb2DYENpMKieuVHHnIObcS ZkrbtXjdd8rjZOAODusIOceRJfJ9rqTzsIjRRgVlg4SiV6v+zxavdgQd+LAJeuCq iUJ+w+qrBOvICqLQdn8JNS4xdLcmQmr1QbuP6b1/fcaHy9+zKDelEvC9z8dLjqqw Gm51KQPrZFCQ/FL9wy1mDDwIx/OZenBXJPzZf4Nmoo+rCAaRH/gD2nnYqCZG4W5p vtIvgpnRjWR4WvezYKNr5EMj+OKDqvioUuQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrvdeigddufeejucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephfgtggfukfffvffosegrtdhmrehhtdejnecuhfhrohhmpeflohhsthgvihhn ucfmjhppnhhighhsvghnuceojhhoshhtvghinhesshgvtghurhgvrdhkjhhonhhighhsvg hnrdhnvghtqeenucggtffrrghtthgvrhhnpeduiedtkeehhffhgeetfeelkeduudeludeh vdfhudefhfehjefhieegheevveetueenucffohhmrghinheprhgvlhgrgihnghdrohhrgh dpghhithhhuhgsrdgtohhmnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehm rghilhhfrhhomhepjhhoshhtvghinhesshgvtghurhgvrdhkjhhonhhighhsvghnrdhnvg ht X-ME-Proxy: Feedback-ID: ib2f84088:Fastmail Original-Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Sun, 18 Feb 2024 15:13:03 -0500 (EST) X-Mailer: Apple Mail (2.3774.200.91.1.1) Received-SPF: pass client-ip=103.168.172.155; envelope-from=jostein@secure.kjonigsen.net; helo=fhigh4-smtp.messagingengine.com X-Spam_score_int: -26 X-Spam_score: -2.7 X-Spam_bar: -- X-Spam_report: (-2.7 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:316331 Archived-At: --Apple-Mail=_9E700C54-2AFA-4F39-B4E7-5B0A3C153BA3 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hey everyone! I recently discovered that nxml-mode in Emacs supports validating XML = content against XML schemas, which I had never noticed before. It turns = out that none of the files I've edited using nxml-mode had a supported = schema. Looking into this I learned a few things about nxml-mode: XML is not validated against XSD schemas, which are the most common = schema format today. Emacs relies on RNC schemas[1], which are less common and have fewer = tools, but are simpler. Emacs cannot automatically obtain the schemas which are declared in XML = documents. The last point is the most impactful in terms of usability, but it might = require more effort to fix than we can currently manage. As an end-user, effectively only the RNC schemas provided with Emacs are = available. To make matters worse, seemingly these list of supported = schemas have not changed since 2007. Someone correct me if I'm wrong. I would like Emacs to support XSD and automatically fetch schemas at = runtime, but I also want a better nxml-mode experience for the file = formats I use daily, today. To address this, I have created RNC schemas for the formats I depend on. = I have attached an abbreviated diff of the changes. The main changes = are: Updating schemas.xml with typeIDs and conditions for applying them in = the correct order. Generating new RNC schemas based on those typeIDs. I tried to find tooling to convert XSD schemas to RNC, but I couldn't = find anything which actually worked. After a few days of getting = nowhere, I instead decided to use a tool called "jing-trang=E2=80=9D[2] = to infer the XML schema based on existing documents in my possession = (200+ software projects, 50+ GBs). While this method doesn't guarantee the accuracy of the schemas, they = are based on a large number of files, ensuring that most common elements = and attributes are present and specified. It may not be scientifically = accurate (like an actual XSD to RNC translation), but it works for my = purposes. Accurate schema support is a small yet significant feature that can make = a noticeable difference when working with XML. Ideally, Emacs should = have schemas for all XML-based file-types commonly used. As such, I would like to contribute these patches to core Emacs to help = improve the current situation, but I want to make sure I'm doing it = correctly. How can I create accurate, quality schemas in RNC with the current = tooling? What are the criteria for accepting new schemas in Emacs core? Are there = any? I would appreciate any comments or feedback on this matter. Thanks! [1] https://relaxng.org/ [2] https://github.com/relaxng/jing-trang =E2=80=94 Kind Regards Jostein Kj=C3=B8nigsen =EF=BF=BC= --Apple-Mail=_9E700C54-2AFA-4F39-B4E7-5B0A3C153BA3 Content-Type: multipart/mixed; boundary="Apple-Mail=_B2877668-1580-4DB7-A1DB-16E88CDA6919" --Apple-Mail=_B2877668-1580-4DB7-A1DB-16E88CDA6919 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8

Hey everyone!


I recently discovered that nxml-mode in Emacs supports = validating XML content against XML schemas, which I had never noticed = before. It turns out that none of the files I've edited using nxml-mode = had a supported schema.


Looking into this I learned a few things about = nxml-mode:


  • XML is not validated against XSD schemas, = which are the most common schema format today.
  • Emacs relies on = RNC schemas[1], which are less common and have fewer tools, but are = simpler.
  • Emacs cannot automatically obtain the schemas which are = declared in XML documents.


The = last point is the most impactful in terms of usability, but it might = require more effort to fix than we can currently manage.


As an end-user, effectively only the RNC schemas provided = with Emacs are available. To make matters worse, seemingly these list of = supported schemas have not changed since 2007. Someone correct me if I'm = wrong.


I = would like Emacs to support XSD and automatically fetch schemas at = runtime, but I also want a better nxml-mode experience for the file = formats I use daily, today.


To = address this, I have created RNC schemas for the formats I depend on. I = have attached an abbreviated diff of the changes. The main changes = are:


  • Updating schemas.xml with typeIDs and = conditions for applying them in the correct order.
  • Generating = new RNC schemas based on those typeIDs.


I tried to find tooling to convert XSD schemas to RNC, but = I couldn't find anything which actually worked. After a few days of = getting nowhere, I instead decided to use a tool called = "jing-trang=E2=80=9D[2] to infer the XML schema based on existing = documents in my possession (200+ software projects, 50+ GBs).


While this method doesn't guarantee the accuracy of the = schemas, they are based on a large number of files, ensuring that most = common elements and attributes are present and specified. It may not be = scientifically accurate (like an actual XSD to RNC translation), but it = works for my purposes.


Accurate schema support is a small yet significant feature = that can make a noticeable difference when working with XML. Ideally, = Emacs should have schemas for all XML-based file-types commonly = used.


As = such, I would like to contribute these patches to core Emacs to help = improve the current situation, but I want to make sure I'm doing it = correctly.


  • How can I create accurate, quality schemas in = RNC with the current tooling?
  • What are the criteria for = accepting new schemas in Emacs core? Are there any?


I would appreciate any comments or feedback on this = matter.


Thanks!




=E2=80=94
Kind Regards
Jostein = Kj=C3=B8nigsen

= --Apple-Mail=_B2877668-1580-4DB7-A1DB-16E88CDA6919 Content-Disposition: attachment; filename=schemas.patch Content-Type: application/octet-stream; x-unix-mode=0644; name="schemas.patch" Content-Transfer-Encoding: 7bit diff --git a/etc/schema/schemas.xml b/etc/schema/schemas.xml index f04bba849b4..dd1e23a5a8e 100644 --- a/etc/schema/schemas.xml +++ b/etc/schema/schemas.xml @@ -66,4 +66,30 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + --Apple-Mail=_B2877668-1580-4DB7-A1DB-16E88CDA6919 Content-Transfer-Encoding: 7bit Content-Type: text/html; charset=us-ascii
--Apple-Mail=_B2877668-1580-4DB7-A1DB-16E88CDA6919-- --Apple-Mail=_9E700C54-2AFA-4F39-B4E7-5B0A3C153BA3--