unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#46764: Extra ">" sails right past XML validator
@ 2021-02-24 23:43 積丹尼 Dan Jacobson
  2021-02-25 15:48 ` Lars Ingebrigtsen
  2021-02-26  9:21 ` Mattias Engdegård
  0 siblings, 2 replies; 7+ messages in thread
From: 積丹尼 Dan Jacobson @ 2021-02-24 23:43 UTC (permalink / raw)
  To: 46764

$ cat e.xml
<?xml version="1.0" encoding="utf-8" ?>
<M>></M>
$ emacs e.xml
says at the bottom: (nXML Valid)
$ xmllint e.kml
<?xml version="1.0" encoding="utf-8"?>
<M>&gt;</M>





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#46764: Extra ">" sails right past XML validator
  2021-02-24 23:43 bug#46764: Extra ">" sails right past XML validator 積丹尼 Dan Jacobson
@ 2021-02-25 15:48 ` Lars Ingebrigtsen
  2021-02-26  9:21 ` Mattias Engdegård
  1 sibling, 0 replies; 7+ messages in thread
From: Lars Ingebrigtsen @ 2021-02-25 15:48 UTC (permalink / raw)
  To: 積丹尼 Dan Jacobson; +Cc: 46764

積丹尼 Dan Jacobson <jidanni@jidanni.org> writes:

> $ cat e.xml
> <?xml version="1.0" encoding="utf-8" ?>
> <M>></M>
> $ emacs e.xml
> says at the bottom: (nXML Valid)

I can confirm that this problem still exists in Emacs 28.

It seems to stem from this bit of code:

(defun xmltok-forward ()
  (setq xmltok-start (point))
  (let* ((case-fold-search nil)
	 (space-count (skip-chars-forward " \t\r\n"))
	 (ch (char-after)))
    (cond ((eq ch ?\<)
	   (cond ((> space-count 0)
		  (setq xmltok-type 'space))
		 (t
		  (forward-char 1)
		  (xmltok-scan-after-lt))))
	  ((eq ch ?\&)
	   (cond ((> space-count 0)
		  (setq xmltok-type 'space))
		 (t
		  (forward-char 1)
		  (xmltok-scan-after-amp 'xmltok-handle-entity))))
	  ((re-search-forward "[<&]\\|\\(]]>\\)" nil t)
	   (cond ((not (match-beginning 1))

So (xmltok-forward) on the ">" will just return `data'.  Is it checking
just < and & for validity on purpose?  Anybody remember what the thought
process might have been here?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#46764: Extra ">" sails right past XML validator
  2021-02-24 23:43 bug#46764: Extra ">" sails right past XML validator 積丹尼 Dan Jacobson
  2021-02-25 15:48 ` Lars Ingebrigtsen
@ 2021-02-26  9:21 ` Mattias Engdegård
  2021-02-26  9:30   ` Lars Ingebrigtsen
  2021-02-26 16:00   ` bug#46764: [External] : " Drew Adams
  1 sibling, 2 replies; 7+ messages in thread
From: Mattias Engdegård @ 2021-02-26  9:21 UTC (permalink / raw)
  To: 積丹尼 Dan Jacobson; +Cc: Lars Ingebrigtsen, 46764

">" is not a special character at top level in XML; <M>></M> is well-formed.

I agree that it is an easy mistake to make and overlook. Perhaps an optional warning would be helpful?
Note that nxml-mode is carefully written for correctness and performance, which matter because XML is a lot more complex than people think and files can be large. Any tinkering with it has to be done with prudence.






^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#46764: Extra ">" sails right past XML validator
  2021-02-26  9:21 ` Mattias Engdegård
@ 2021-02-26  9:30   ` Lars Ingebrigtsen
  2021-02-26 10:28     ` Mattias Engdegård
  2021-02-27 10:44     ` 積丹尼 Dan Jacobson
  2021-02-26 16:00   ` bug#46764: [External] : " Drew Adams
  1 sibling, 2 replies; 7+ messages in thread
From: Lars Ingebrigtsen @ 2021-02-26  9:30 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 46764, 積丹尼 Dan Jacobson

Mattias Engdegård <mattiase@acm.org> writes:

> ">" is not a special character at top level in XML; <M>></M> is well-formed.
>
> I agree that it is an easy mistake to make and overlook. Perhaps an
> optional warning would be helpful?

Well, if it is valid (and it is), then I don't really see how adding an
optional warning here would be all that helpful, either -- it seems
kinda beyond the remit of the validator here to teach XML syntax?

So I'm closing this bug report.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#46764: Extra ">" sails right past XML validator
  2021-02-26  9:30   ` Lars Ingebrigtsen
@ 2021-02-26 10:28     ` Mattias Engdegård
  2021-02-27 10:44     ` 積丹尼 Dan Jacobson
  1 sibling, 0 replies; 7+ messages in thread
From: Mattias Engdegård @ 2021-02-26 10:28 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 46764, 積丹尼 Dan Jacobson

26 feb. 2021 kl. 10.30 skrev Lars Ingebrigtsen <larsi@gnus.org>:

>> ">" is not a special character at top level in XML; <M>></M> is well-formed.
>> 
>> I agree that it is an easy mistake to make and overlook. Perhaps an
>> optional warning would be helpful?
> 
> Well, if it is valid (and it is), then I don't really see how adding an
> optional warning here would be all that helpful, either -- it seems
> kinda beyond the remit of the validator here to teach XML syntax?

It's useful to prevent mistakes, not just following the standard to the letter. Given that the XML file is in an Emacs buffer there is a fair chance that it was hand-written, and then the extra ">" is likely to be unintended, especially since it can be somewhat hard to spot by a human.

Many other modes do similar things. For example, emacs-lisp-mode warns about useless backslashes in strings even though there is no actual syntax error. Think of it as a useful compiler warning.

> So I'm closing this bug report.

That's fine. Dan can reopen if he thinks nxml-mode really needs improvement.






^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#46764: [External] : bug#46764: Extra ">" sails right past XML validator
  2021-02-26  9:21 ` Mattias Engdegård
  2021-02-26  9:30   ` Lars Ingebrigtsen
@ 2021-02-26 16:00   ` Drew Adams
  1 sibling, 0 replies; 7+ messages in thread
From: Drew Adams @ 2021-02-26 16:00 UTC (permalink / raw)
  To: Mattias Engdegård, 積丹尼 Dan Jacobson
  Cc: Lars Ingebrigtsen, 46764@debbugs.gnu.org

> Note that nxml-mode is carefully written for correctness and
> performance, which matter because XML is a lot more complex than people
> think and files can be large. Any tinkering with it has to be done with
> prudence.

+1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#46764: Extra ">" sails right past XML validator
  2021-02-26  9:30   ` Lars Ingebrigtsen
  2021-02-26 10:28     ` Mattias Engdegård
@ 2021-02-27 10:44     ` 積丹尼 Dan Jacobson
  1 sibling, 0 replies; 7+ messages in thread
From: 積丹尼 Dan Jacobson @ 2021-02-27 10:44 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Mattias Engdegård, 46764

Fine. Better to let the Space Shuttle engines warn about it than catch it
eariler in emacs, valid or not. P.S.,
$ echo '<M>></M>'|xmllint -
<?xml version="1.0"?>
<M>&gt;</M>
So please file a bug against xmllint.





^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-02-27 10:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-24 23:43 bug#46764: Extra ">" sails right past XML validator 積丹尼 Dan Jacobson
2021-02-25 15:48 ` Lars Ingebrigtsen
2021-02-26  9:21 ` Mattias Engdegård
2021-02-26  9:30   ` Lars Ingebrigtsen
2021-02-26 10:28     ` Mattias Engdegård
2021-02-27 10:44     ` 積丹尼 Dan Jacobson
2021-02-26 16:00   ` bug#46764: [External] : " Drew Adams

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).