>>>>> Lars Ingebrigtsen writes: >>>>> Nic Ferrier writes: >> It's certainly the case that definite ending is easier to process. > I don't really know what to say. "HTML parsing is a solved problem"? Granted, my Libxml2 installation may be out of date, but for the HTML5 document MIMEd (valid per http://validator.w3.org/check), libxml-parse-html-region (surprisingly) produces the following: (html ((lang . "en") (dir . "ltr")) (head nil (title nil "HTML parsing")) (body nil (dl nil (dt nil "This\n") (dd nil "is\n" (dd nil "a\n" (dd nil "perfectly\n" (dd nil "valid\n" (dd nil "HTML5\n" (dd nil "document.\n"))))))))) Naturally, SHR rendition of the document would be just as unreasonable as is the tree above. On the contrary, using Lynx to render the very same document results in: $ lynx --dump --stdin --force-html < example.html This is a perfectly valid HTML5 document. $ The relevant part of the specification [1] is as follows. A dt element’s end tag may be omitted if the dt element is immediately followed by another dt element or a dd element. A dd element’s end tag may be omitted if the dd element is immediately followed by another dd element or a dt element, or if there is no more content in the parent element. [1] http://www.w3.org/TR/html5/syntax.html#optional-tags -- FSF associate member #7257 http://boycottsystemd.org/ … 3013 B6A0 230E 334A