From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Katsumi Yamaoka Newsgroups: gmane.emacs.bugs Subject: bug#24831: shr mangling messages Date: Fri, 04 Nov 2016 16:19:12 +0900 Organization: Emacsen advocacy group Message-ID: References: <87shrgvt8y.fsf@jidanni.org> <87shrd6xsp.fsf_-_@jidanni.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: blaine.gmane.org 1478244021 26575 195.159.176.226 (4 Nov 2016 07:20:21 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 4 Nov 2016 07:20:21 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (i686-pc-cygwin) Cc: 24831@debbugs.gnu.org, jidanni@jidanni.org To: Lars Ingebrigtsen Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Fri Nov 04 08:20:17 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c2YnJ-0005W3-Vj for geb-bug-gnu-emacs@m.gmane.org; Fri, 04 Nov 2016 08:20:10 +0100 Original-Received: from localhost ([::1]:36839 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c2YnM-00046Q-Nq for geb-bug-gnu-emacs@m.gmane.org; Fri, 04 Nov 2016 03:20:12 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:36485) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c2YnF-000450-U6 for bug-gnu-emacs@gnu.org; Fri, 04 Nov 2016 03:20:06 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c2YnC-0007Yq-PV for bug-gnu-emacs@gnu.org; Fri, 04 Nov 2016 03:20:05 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:55635) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1c2YnC-0007YV-MW for bug-gnu-emacs@gnu.org; Fri, 04 Nov 2016 03:20:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1c2YnB-0007Lp-RC for bug-gnu-emacs@gnu.org; Fri, 04 Nov 2016 03:20:02 -0400 X-Loop: help-debbugs@gnu.org In-Reply-To: <87shrd6xsp.fsf_-_@jidanni.org> Resent-From: Katsumi Yamaoka Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 04 Nov 2016 07:20:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24831 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 24831-submit@debbugs.gnu.org id=B24831.147824396428192 (code B ref 24831); Fri, 04 Nov 2016 07:20:01 +0000 Original-Received: (at 24831) by debbugs.gnu.org; 4 Nov 2016 07:19:24 +0000 Original-Received: from localhost ([127.0.0.1]:42796 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c2Yma-0007Ke-IQ for submit@debbugs.gnu.org; Fri, 04 Nov 2016 03:19:24 -0400 Original-Received: from mail-hampton.hostforweb.net ([205.234.186.191]:50462 helo=hampton.hostforweb.net) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c2YmZ-0007KQ-Jk for 24831@debbugs.gnu.org; Fri, 04 Nov 2016 03:19:23 -0400 Original-Received: from s70.gtokyofl21.vectant.ne.jp ([202.215.75.70]:60000 helo=localhost) by hampton.hostforweb.net with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.87) (envelope-from ) id 1c2YmR-000fRq-76; Fri, 04 Nov 2016 02:19:16 -0500 X-Face: #kKnN,xUnmKia.'[pp`; Omh}odZK)?7wQSl"4o04=EixTF+V[""w~iNbM9ZL+.b*_CxUmFk B#Fu[*?MZZH@IkN:!"\w%I_zt>[$nm7nQosZ<3eu; B:$Q_:p!',P.c0-_Cy[dz4oIpw0ESA^D*1Lw= L&i*6&( Cancel-Lock: sha1:T98tGkD7IYAVyQ3D/A0ZMuw+wqM= X-OutGoing-Spam-Status: No, score=-2.9 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - hampton.hostforweb.net X-AntiAbuse: Original Domain - debbugs.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - jpl.org X-Get-Message-Sender-Via: hampton.hostforweb.net: authenticated_id: yamaoka/from_h X-Authenticated-Sender: hampton.hostforweb.net: yamaoka@jpl.org X-Source: X-Source-Args: X-Source-Dir: X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:125320 Archived-At: --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Wed, 02 Nov 2016 18:49:58 +0900, Katsumi Yamaoka wrote: > On Tue, 01 Nov 2016 19:43:23 +0100, Lars Ingebrigtsen wrote: >> And thinking about it a bit more, I think that would perhaps be the most >> likely solution for shr, too. That is, `shr-tag-table' could, at the >> end there, go through and find all non-blank non-td/th elements and >> insert them at the end. > Thanks. I'm trying it but not succeeded yet though,... I did it. A patch is below. Bad things in this version I know at least are: =E3=83=BBIt does not support styles -- font, color, etc. =E3=83=BBNo way to exclude text existing outside of .... Thers is no such problems in the first version I posted. ;-) --=-=-= Content-Type: text/x-patch Content-Disposition: inline --- shr.el~ 2016-11-01 02:35:57.788777000 +0000 +++ shr.el 2016-11-04 07:17:19.789855000 +0000 @@ -1897,11 +1897,48 @@ (when (zerop shr-table-depth) (save-excursion (shr-expand-alignments start (point))) + ;; Insert also non-td/th strings excluding comments and styles. + (save-restriction + (narrow-to-region (point) (point)) + (insert (mapconcat #'identity + (shr-collect-extra-strings-in-table dom) + "\n")) + (shr-fill-lines (point-min) (point-max))) (dolist (elem (dom-by-tag dom 'object)) (shr-tag-object elem)) (dolist (elem (dom-by-tag dom 'img)) (shr-tag-img elem))))) +(defun shr-collect-extra-strings-in-table (dom &optional flags) + "Return extra strings in DOM of which the root is a table clause. +FLAGS is a cons of two flags that control whether to collect strings." + ;; If and only if the cdr is not set, the car will be set to t when + ;; a or a clause is found in the children of DOM, and reset + ;; to nil when a clause is found in the children of DOM. + ;; The cdr will be set to t when a
clause is found if the car + ;; is not set then, and will never be reset. + ;; This function collects strings if the car of FLAGS is not set. + (unless flags (setq flags (cons nil nil))) + (cl-loop for child in (dom-children dom) + if (stringp child) + when (and (not (car flags)) + (string-match "\\(?:[^\t\n\r ]+[\t\n\r ]+\\)*[^\t\n\r ]+" + child)) + collect (match-string 0 child) + end + else + unless (let ((tag (dom-tag child))) + (or (memq tag '(comment style)) + (progn + (cond ((memq tag '(td th)) + (unless (cdr flags) (setcar flags t))) + ((eq tag 'table) + (if (car flags) + (unless (cdr flags) (setcar flags nil)) + (setcdr flags t)))) + nil))) + append (shr-collect-extra-strings-in-table child flags))) + (defun shr-insert-table (table widths) (let* ((collapse (equal (cdr (assq 'border-collapse shr-stylesheet)) "collapse")) --=-=-=--