From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Ivan Shmakov Newsgroups: gmane.emacs.devel Subject: Re: HTML-Info design Date: Mon, 29 Dec 2014 14:24:44 +0000 Message-ID: <87iogua5zn.fsf@violet.siamics.net> References: <83ioh2nlow.fsf@gnu.org> <87sig6xech.fsf@ferrier.me.uk> <83fvc5ni0u.fsf@gnu.org> <87k31fwwyv.fsf@ferrier.me.uk> <87bnmq9ibf.fsf@ferrier.me.uk> <87lhlrx5fc.fsf@building.gnus.org> <877fxb9821.fsf@ferrier.me.uk> <878uhrg6uu.fsf@building.gnus.org> <871tnj90lt.fsf@ferrier.me.uk> <87mw67elgf.fsf@building.gnus.org> <86bnmn1rwk.fsf@dod.no> <87tx0fczoj.fsf@building.gnus.org> <87ppb28z1q.fsf@building.gnus.org> <87sify64dz.fsf@ferrier.me.uk> <87lhlq8wn8.fsf@building.gnus.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: ger.gmane.org 1419863128 11720 80.91.229.3 (29 Dec 2014 14:25:28 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 29 Dec 2014 14:25:28 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Dec 29 15:25:24 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Y5bG1-000793-8N for ged-emacs-devel@m.gmane.org; Mon, 29 Dec 2014 15:25:17 +0100 Original-Received: from localhost ([::1]:33575 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Y5bG0-0005TI-MV for ged-emacs-devel@m.gmane.org; Mon, 29 Dec 2014 09:25:16 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:59073) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Y5bFj-0005RV-D7 for emacs-devel@gnu.org; Mon, 29 Dec 2014 09:25:03 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Y5bFf-0003lf-BK for emacs-devel@gnu.org; Mon, 29 Dec 2014 09:24:59 -0500 Original-Received: from fely.am-1.org ([2a01:4f8:d15:1b86::2]:52991) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Y5bFf-0003lF-1A for emacs-devel@gnu.org; Mon, 29 Dec 2014 09:24:55 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=siamics.net; s=a2013295; h=Content-Type:MIME-Version:Message-ID:In-Reply-To:Date:Sender:References:Subject:To:From; bh=Wz7YEkRinxcWWMomrAaBnvvxGKpktvpj2fRAdX50g+o=; b=KyVz3WG35o9n0V/q+Q/ssRS8hN0puwu0DjWynbDOpmOvItve/woktWIv9mmL6xUuHcRBqQJHh0P6nMPRL+OQ+ry27t3DGweGjWQuP5HMtKzx8loB2iHWnihsAoQn/F0qdQmC8lQQXScfNXKf6Iwx9drrQq7WQ+NL0uyjF0UqYOo=; Original-Received: from [2a02:2560:6d4:26ca::1:1d] (helo=violet.siamics.net) by fely.am-1.org with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from ) id 1Y5bFc-0002Hp-HI for emacs-devel@gnu.org; Mon, 29 Dec 2014 14:24:53 +0000 Original-Received: from localhost ([::1] helo=violet.siamics.net) by violet.siamics.net with esmtps (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from ) id 1Y5bFV-00024K-CG for emacs-devel@gnu.org; Mon, 29 Dec 2014 21:24:45 +0700 Mail-Followup-To: emacs-devel@gnu.org In-Reply-To: <87lhlq8wn8.fsf@building.gnus.org> (Lars Ingebrigtsen's message of "Mon, 29 Dec 2014 13:31:55 +0100") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2a01:4f8:d15:1b86::2 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:180828 Archived-At: --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable >>>>> Lars Ingebrigtsen writes: >>>>> Nic Ferrier writes: >> It's certainly the case that definite ending is easier to process. > I don't really know what to say. "HTML parsing is a solved problem"? Granted, my Libxml2 installation may be out of date, but for the HTML5 document MIMEd (valid per http://validator.w3.org/check), libxml-parse-html-region (surprisingly) produces the following: (html ((lang . "en") (dir . "ltr")) (head nil (title nil "HTML parsing")) (body nil (dl nil (dt nil "This\n") (dd nil "is\n" (dd nil "a\n" (dd nil "perfectly\n" (dd nil "valid\n" (dd nil "HTML5\n" (dd nil "document.\n"))))))))) Naturally, SHR rendition of the document would be just as unreasonable as is the tree above. On the contrary, using Lynx to render the very same document results in: $ lynx --dump --stdin --force-html < example.html=20 This is a perfectly valid HTML5 document. $=20 The relevant part of the specification [1] is as follows. A dt element=E2=80=99s end tag may be omitted if the dt element is immediately followed by another dt element or a dd element. A dd element=E2=80=99s end tag may be omitted if the dd element is immediately followed by another dd element or a dt element, or if there is no more content in the parent element. [1] http://www.w3.org/TR/html5/syntax.html#optional-tags --=20 FSF associate member #7257 http://boycottsystemd.org/ =E2=80=A6 3013 B6A0= 230E 334A --=-=-= Content-Type: text/html Content-Disposition: inline HTML parsing
This
is
a
perfectly
valid
HTML5
document.
--=-=-=--