all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Katsumi Yamaoka <yamaoka@jpl.org>
To: Lars Ingebrigtsen <larsi@gnus.org>
Cc: 24831@debbugs.gnu.org, jidanni@jidanni.org
Subject: bug#24831: shr mangling messages
Date: Fri, 04 Nov 2016 16:19:12 +0900	[thread overview]
Message-ID: <b4mtwbniufj.fsf@jpl.org> (raw)
In-Reply-To: <87shrd6xsp.fsf_-_@jidanni.org>

[-- Attachment #1: Type: text/plain, Size: 703 bytes --]

On Wed, 02 Nov 2016 18:49:58 +0900, Katsumi Yamaoka wrote:
> On Tue, 01 Nov 2016 19:43:23 +0100, Lars Ingebrigtsen wrote:
>> And thinking about it a bit more, I think that would perhaps be the most
>> likely solution for shr, too.  That is, `shr-tag-table' could, at the
>> end there, go through and find all non-blank non-td/th elements and
>> insert them at the end.

> Thanks.  I'm trying it but not succeeded yet though,...

I did it.  A patch is below.  Bad things in this version I know
at least are:

・It does not support styles -- font, color, etc.
・No way to exclude text existing outside of <html>...</html>.

Thers is no such problems in the first version I posted. ;-)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 2060 bytes --]

--- shr.el~	2016-11-01 02:35:57.788777000 +0000
+++ shr.el	2016-11-04 07:17:19.789855000 +0000
@@ -1897,11 +1897,48 @@
     (when (zerop shr-table-depth)
       (save-excursion
 	(shr-expand-alignments start (point)))
+      ;; Insert also non-td/th strings excluding comments and styles.
+      (save-restriction
+	(narrow-to-region (point) (point))
+	(insert (mapconcat #'identity
+			   (shr-collect-extra-strings-in-table dom)
+			   "\n"))
+	(shr-fill-lines (point-min) (point-max)))
       (dolist (elem (dom-by-tag dom 'object))
 	(shr-tag-object elem))
       (dolist (elem (dom-by-tag dom 'img))
 	(shr-tag-img elem)))))
 
+(defun shr-collect-extra-strings-in-table (dom &optional flags)
+  "Return extra strings in DOM of which the root is a table clause.
+FLAGS is a cons of two flags that control whether to collect strings."
+  ;; If and only if the cdr is not set, the car will be set to t when
+  ;; a <td> or a <th> clause is found in the children of DOM, and reset
+  ;; to nil when a <table> clause is found in the children of DOM.
+  ;; The cdr will be set to t when a <table> clause is found if the car
+  ;; is not set then, and will never be reset.
+  ;; This function collects strings if the car of FLAGS is not set.
+  (unless flags (setq flags (cons nil nil)))
+  (cl-loop for child in (dom-children dom)
+	   if (stringp child)
+	     when (and (not (car flags))
+		       (string-match "\\(?:[^\t\n\r ]+[\t\n\r ]+\\)*[^\t\n\r ]+"
+				     child))
+	       collect (match-string 0 child)
+	     end
+	   else
+	     unless (let ((tag (dom-tag child)))
+		      (or (memq tag '(comment style))
+			  (progn
+			    (cond ((memq tag '(td th))
+				   (unless (cdr flags) (setcar flags t)))
+				  ((eq tag 'table)
+				   (if (car flags)
+				       (unless (cdr flags) (setcar flags nil))
+				     (setcdr flags t))))
+			    nil)))
+	       append (shr-collect-extra-strings-in-table child flags)))
+
 (defun shr-insert-table (table widths)
   (let* ((collapse (equal (cdr (assq 'border-collapse shr-stylesheet))
 			  "collapse"))

  reply	other threads:[~2016-11-04  7:19 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <87shrgvt8y.fsf@jidanni.org>
2016-10-31  2:45 ` bug#24831: shr mangling messages 積丹尼 Dan Jacobson
2016-11-01  1:39   ` Katsumi Yamaoka
2016-11-01  9:59     ` Katsumi Yamaoka
2016-11-01 10:06       ` Lars Ingebrigtsen
2016-11-01 10:12         ` Lars Ingebrigtsen
2016-11-01 18:43         ` Lars Ingebrigtsen
2016-11-02  9:49           ` Katsumi Yamaoka
2016-11-04  7:19             ` Katsumi Yamaoka [this message]
2016-11-04  8:51               ` Lars Ingebrigtsen
2016-11-04 10:28                 ` Katsumi Yamaoka
2016-11-04 11:17                   ` Lars Ingebrigtsen
2016-11-06 23:32                     ` Katsumi Yamaoka
2016-11-01 11:22   ` 積丹尼 Dan Jacobson
2016-11-01 17:16     ` Richard Stallman
2016-11-04 18:18       ` Ted Zlatanov
2016-11-01 11:24   ` 積丹尼 Dan Jacobson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b4mtwbniufj.fsf@jpl.org \
    --to=yamaoka@jpl.org \
    --cc=24831@debbugs.gnu.org \
    --cc=jidanni@jidanni.org \
    --cc=larsi@gnus.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.