unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [PATCH v1 0/3] Improve the acquisition of text parts.
@ 2016-03-08 17:12 David Edmondson
  2016-03-08 17:12 ` [PATCH v1 1/3] emacs: `notmuch-show-insert-part-multipart/encrypted' should not assume the presence of a button David Edmondson
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: David Edmondson @ 2016-03-08 17:12 UTC (permalink / raw)
  To: notmuch


Improve the acquisition of text parts.

This affects the new "reply" behaviour and the rendering of
application/octet-stream parts that are treated as text.


David Edmondson (3):
  emacs: `notmuch-show-insert-part-multipart/encrypted' should not
    assume     the presence of a button.
  emacs: Neaten `notmuch-show-insert-bodypart-internal'.
  emacs: Improve the acquisition of text parts.

 emacs/notmuch-lib.el  | 73 ++++++++++++++++++++++-----------------------------
 emacs/notmuch-show.el | 28 +++++++++-----------
 2 files changed, 43 insertions(+), 58 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v1 1/3] emacs: `notmuch-show-insert-part-multipart/encrypted' should not assume the presence of a button.
  2016-03-08 17:12 [PATCH v1 0/3] Improve the acquisition of text parts David Edmondson
@ 2016-03-08 17:12 ` David Edmondson
  2016-03-08 17:12 ` [PATCH v1 2/3] emacs: Neaten `notmuch-show-insert-bodypart-internal' David Edmondson
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: David Edmondson @ 2016-03-08 17:12 UTC (permalink / raw)
  To: notmuch

Missed in c802d12a1e43fe69f2fcf7a2f7d44018a55bfb65.
---
 emacs/notmuch-show.el | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index 371e62d..6e268f9 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -678,7 +678,8 @@ will return nil if the CID is unknown or cannot be retrieved."
 	      (notmuch-crypto-insert-sigstatus-button sigstatus from))))
     ;; If we're not adding the encryption status, tell the user how
     ;; they can get it.
-    (button-put button 'help-echo "Set notmuch-crypto-process-mime to process cryptographic MIME parts."))
+    (when button
+      (button-put button 'help-echo "Set notmuch-crypto-process-mime to process cryptographic MIME parts.")))
 
   (let ((inner-parts (plist-get part :content))
 	(start (point)))
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v1 2/3] emacs: Neaten `notmuch-show-insert-bodypart-internal'.
  2016-03-08 17:12 [PATCH v1 0/3] Improve the acquisition of text parts David Edmondson
  2016-03-08 17:12 ` [PATCH v1 1/3] emacs: `notmuch-show-insert-part-multipart/encrypted' should not assume the presence of a button David Edmondson
@ 2016-03-08 17:12 ` David Edmondson
  2016-03-08 17:12 ` [PATCH v1 3/3] emacs: Improve the acquisition of text parts David Edmondson
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: David Edmondson @ 2016-03-08 17:12 UTC (permalink / raw)
  To: notmuch

---
 emacs/notmuch-show.el | 25 ++++++++++---------------
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index 6e268f9..f739987 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -848,21 +848,16 @@ will return nil if the CID is unknown or cannot be retrieved."
 ;; \f
 
 (defun notmuch-show-insert-bodypart-internal (msg part content-type nth depth button)
-  (let ((handlers (notmuch-show-handlers-for content-type)))
-    ;; Run the content handlers until one of them returns a non-nil
-    ;; value.
-    (while (and handlers
-		(not (condition-case err
-			 (funcall (car handlers) msg part content-type nth depth button)
-		       ;; Specifying `debug' here lets the debugger
-		       ;; run if `debug-on-error' is non-nil.
-		       ((debug error)
-			(progn
-				(insert "!!! Bodypart insert error: ")
-				(insert (error-message-string err))
-				(insert " !!!\n") nil)))))
-      (setq handlers (cdr handlers))))
-  t)
+  ;; Run the handlers until one of them succeeds.
+  (loop for handler in (notmuch-show-handlers-for content-type)
+	until (condition-case err
+		  (funcall handler msg part content-type nth depth button)
+		;; Specifying `debug' here lets the debugger run if
+		;; `debug-on-error' is non-nil.
+		((debug error)
+		 (insert "!!! Bodypart handler `" (prin1-to-string handler) "' threw an error:\n"
+			 "!!! " (error-message-string err) "\n")
+		 nil))))
 
 (defun notmuch-show-create-part-overlays (button beg end)
   "Add an overlay to the part between BEG and END"
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v1 3/3] emacs: Improve the acquisition of text parts.
  2016-03-08 17:12 [PATCH v1 0/3] Improve the acquisition of text parts David Edmondson
  2016-03-08 17:12 ` [PATCH v1 1/3] emacs: `notmuch-show-insert-part-multipart/encrypted' should not assume the presence of a button David Edmondson
  2016-03-08 17:12 ` [PATCH v1 2/3] emacs: Neaten `notmuch-show-insert-bodypart-internal' David Edmondson
@ 2016-03-08 17:12 ` David Edmondson
  2016-03-13 15:22 ` [PATCH v1 0/3] " Mark Walters
  2016-03-27 20:48 ` David Bremner
  4 siblings, 0 replies; 9+ messages in thread
From: David Edmondson @ 2016-03-08 17:12 UTC (permalink / raw)
  To: notmuch

`notmuch-get-bodypart-text' assumed that it is always possible to
acquire text/* parts via the sexp output format. This is not true if the
part in question has a content type of application/octet-stream but is
being interpreted as text/* based on the extension of the part filename.

Rework `notmuch-get-bodypart-text' to use the raw output format to
address this and make the implementation common with that of
`notmuch-get-bodypart-binary'.
---
 emacs/notmuch-lib.el | 73 ++++++++++++++++++++++------------------------------
 1 file changed, 31 insertions(+), 42 deletions(-)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index 89c01a5..75a3706 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -537,6 +537,34 @@ the given type."
    (lambda (part) (notmuch-match-content-type (plist-get part :content-type) type))
    parts))
 
+(defun notmuch--get-bodypart-raw (msg part process-crypto binaryp cache)
+  (let* ((plist-elem (if binaryp :content-binary :content))
+	 (data (or (plist-get part plist-elem)
+		   (with-temp-buffer
+		     ;; Emacs internally uses a UTF-8-like multibyte string
+		     ;; representation by default (regardless of the coding
+		     ;; system, which only affects how it goes from outside data
+		     ;; to this internal representation).  This *almost* never
+		     ;; matters.  Annoyingly, it does matter if we use this data
+		     ;; in an image descriptor, since Emacs will use its internal
+		     ;; data buffer directly and this multibyte representation
+		     ;; corrupts binary image formats.  Since the caller is
+		     ;; asking for binary data, a unibyte string is a more
+		     ;; appropriate representation anyway.
+		     (when binaryp
+		       (set-buffer-multibyte nil))
+		     (let ((args `("show" "--format=raw"
+				   ,(format "--part=%s" (plist-get part :id))
+				   ,@(when process-crypto '("--decrypt"))
+				   ,(notmuch-id-to-query (plist-get msg :id))))
+			   (coding-system-for-read
+			    (if binaryp 'no-conversion 'utf-8)))
+		       (apply #'call-process notmuch-command nil '(t nil) nil args)
+		       (buffer-string))))))
+    (when (and cache data)
+      (plist-put part plist-elem data))
+    data))
+
 (defun notmuch-get-bodypart-binary (msg part process-crypto &optional cache)
   "Return the unprocessed content of PART in MSG as a unibyte string.
 
@@ -547,57 +575,18 @@ this does no charset conversion.
 
 If CACHE is non-nil, the content of this part will be saved in
 MSG (if it isn't already)."
-  (let ((data (plist-get part :binary-content)))
-    (when (not data)
-      (let ((args `("show" "--format=raw"
-		    ,(format "--part=%d" (plist-get part :id))
-		    ,@(when process-crypto '("--decrypt"))
-		    ,(notmuch-id-to-query (plist-get msg :id)))))
-	(with-temp-buffer
-	  ;; Emacs internally uses a UTF-8-like multibyte string
-	  ;; representation by default (regardless of the coding
-	  ;; system, which only affects how it goes from outside data
-	  ;; to this internal representation).  This *almost* never
-	  ;; matters.  Annoyingly, it does matter if we use this data
-	  ;; in an image descriptor, since Emacs will use its internal
-	  ;; data buffer directly and this multibyte representation
-	  ;; corrupts binary image formats.  Since the caller is
-	  ;; asking for binary data, a unibyte string is a more
-	  ;; appropriate representation anyway.
-	  (set-buffer-multibyte nil)
-	  (let ((coding-system-for-read 'no-conversion))
-	    (apply #'call-process notmuch-command nil '(t nil) nil args)
-	    (setq data (buffer-string)))))
-      (when cache
-	;; Cheat.  part is non-nil, and `plist-put' always modifies
-	;; the list in place if it's non-nil.
-	(plist-put part :binary-content data)))
-    data))
+  (notmuch--get-bodypart-raw msg part process-crypto t cache))
 
 (defun notmuch-get-bodypart-text (msg part process-crypto &optional cache)
   "Return the text content of PART in MSG.
 
 This returns the content of the given part as a multibyte Lisp
 string after performing content transfer decoding and any
-necessary charset decoding.  It is an error to use this for
-non-text/* parts.
+necessary charset decoding.
 
 If CACHE is non-nil, the content of this part will be saved in
 MSG (if it isn't already)."
-  (let ((content (plist-get part :content)))
-    (when (not content)
-      ;; Use show --format=sexp to fetch decoded content
-      (let* ((args `("show" "--format=sexp" "--include-html"
-		     ,(format "--part=%s" (plist-get part :id))
-		     ,@(when process-crypto '("--decrypt"))
-		     ,(notmuch-id-to-query (plist-get msg :id))))
-	     (npart (apply #'notmuch-call-notmuch-sexp args)))
-	(setq content (plist-get npart :content))
-	(when (not content)
-	  (error "Internal error: No :content from %S" args)))
-      (when cache
-	(plist-put part :content content)))
-    content))
+  (notmuch--get-bodypart-raw msg part process-crypto nil cache))
 
 ;; Workaround: The call to `mm-display-part' below triggers a bug in
 ;; Emacs 24 if it attempts to use the shr renderer to display an HTML
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 0/3] Improve the acquisition of text parts.
  2016-03-08 17:12 [PATCH v1 0/3] Improve the acquisition of text parts David Edmondson
                   ` (2 preceding siblings ...)
  2016-03-08 17:12 ` [PATCH v1 3/3] emacs: Improve the acquisition of text parts David Edmondson
@ 2016-03-13 15:22 ` Mark Walters
  2016-03-14  8:31   ` David Edmondson
  2016-03-27 20:48 ` David Bremner
  4 siblings, 1 reply; 9+ messages in thread
From: Mark Walters @ 2016-03-13 15:22 UTC (permalink / raw)
  To: David Edmondson, notmuch


This looks good to me +1. However, it would be sensible to get testing 
in a greater variety of charsets/encodings

Best wishes

Mark


On Tue, 08 Mar 2016, David Edmondson <dme@dme.org> wrote:
> Improve the acquisition of text parts.
>
> This affects the new "reply" behaviour and the rendering of
> application/octet-stream parts that are treated as text.
>
>
> David Edmondson (3):
>   emacs: `notmuch-show-insert-part-multipart/encrypted' should not
>     assume     the presence of a button.
>   emacs: Neaten `notmuch-show-insert-bodypart-internal'.
>   emacs: Improve the acquisition of text parts.
>
>  emacs/notmuch-lib.el  | 73 ++++++++++++++++++++++-----------------------------
>  emacs/notmuch-show.el | 28 +++++++++-----------
>  2 files changed, 43 insertions(+), 58 deletions(-)
>
> -- 
> 2.1.4
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> https://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 0/3] Improve the acquisition of text parts.
  2016-03-13 15:22 ` [PATCH v1 0/3] " Mark Walters
@ 2016-03-14  8:31   ` David Edmondson
  2016-03-14 11:49     ` David Bremner
  0 siblings, 1 reply; 9+ messages in thread
From: David Edmondson @ 2016-03-14  8:31 UTC (permalink / raw)
  To: Mark Walters, notmuch

On Sun, Mar 13 2016, Mark Walters wrote:
> However, it would be sensible to get testing in a greater variety of
> charsets/encodings

Agreed. Does anyone have suggestions on how we might achieve this? A
corpus of mail that we could use?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 0/3] Improve the acquisition of text parts.
  2016-03-14  8:31   ` David Edmondson
@ 2016-03-14 11:49     ` David Bremner
  2016-03-26  9:18       ` Mark Walters
  0 siblings, 1 reply; 9+ messages in thread
From: David Bremner @ 2016-03-14 11:49 UTC (permalink / raw)
  To: David Edmondson, Mark Walters, notmuch

David Edmondson <dme@dme.org> writes:

> On Sun, Mar 13 2016, Mark Walters wrote:
>> However, it would be sensible to get testing in a greater variety of
>> charsets/encodings
>
> Agreed. Does anyone have suggestions on how we might achieve this? A
> corpus of mail that we could use?

Maybe the notmuch performance corpus, particularly the lkml sample.

grep -R charset= performance-test/corpus/mail/lkml | sed -e 's/^.*charset=//' -e 's/;.*//' -e 's/"//g' | tr '[A-Z]' '[a-z]' | sort -u

gives

euc-kr
gb2312
iso-2022-jp
iso-2022-jp-2
iso-8859-1
iso-8859-14
iso 8859-15
iso-8859-15
iso-8859-1
iso-8859-2
iso-8859-6
iso-8859-7
iso-8859-9
koi8-r
koi8-u
ks_c_5601-1987
shift_jis
unknown
unknown-8bit
us-ascii
utf8
utf-8
windows-1250
windows-1251
windows-1252
windows-1255


to unpack the corpus

cd performance-test
make download-corpus
./T00-new.sh --large

probably interrupt the test once notmuch-new starts running.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 0/3] Improve the acquisition of text parts.
  2016-03-14 11:49     ` David Bremner
@ 2016-03-26  9:18       ` Mark Walters
  0 siblings, 0 replies; 9+ messages in thread
From: Mark Walters @ 2016-03-26  9:18 UTC (permalink / raw)
  To: David Bremner, David Edmondson, notmuch


Hi

Sorry this email ended up rather long:

Summary: I have run a test (see below) on all of the lkml part of the
performance-corpus, and all the changes look expected. So this series
looks good to me.

First note how we do the bodypart-insertion: for a mime type of
text/plain we first try the text/plain handler, then a text/* handler,
and finally a */* handler until one succeeds. Before this series, when
the part is application/octet-stream but is detected as text/plain, 
text/plain handler fails with a "bodypart insertion error" because
notmuch-get-bodypart-text fails can't get the text (because it's not
officially text). Thus we fall back on the */* handler and that inserts
the part. 

With this series notmuch-get-bodypart-text succeeds and we stop.

Thus in most cases the only change is that we don't get a "bodypart
insertion error", but all the text looks the same. In a couple of cases
the text/plain handler wraps lines/replaces ^M by unix newlines, whereas
as the */* handler does not. This is an improvement.

There is one more "difference" but I think this is actually something
random. Sometimes when the part is application/tar or application/zip I
get "Bodypart insert error: Symbol's function definition is void:
gnus-recursive-directory-files". If I load gnus this goes away. In my
first batch of tests this only occurred when using this series, but
since then I have reproduced it on mainline. I think something else I
did when setting up the test on mainline caused gnus to be loaded, but i
have not worked out what is going on there.

Finally, the test was as follows. I downloaded the performance corpus,
configured a separate notmuch config file to use the
performance-test/corpus/mail/lkml as the mailstore, went into
notmuch-emacs and to the inbox (which contained all messages) and ran
the following lisp function


(defun my-save-all-show ()
  (interactive)
  (goto-char (point-min))
  (let ((count 0))
    (while (notmuch-search-find-thread-id)
      (let ((thread-id (notmuch-search-find-thread-id)))
        (setq count (1+ count))
        (message "Thread %s: %s" count thread-id)
        (notmuch-show thread-id)
        (let ((text (buffer-string))
              (coding-system-for-write 'no-conversion))
          (with-temp-file (concat "OUTPUT-" thread-id) (insert text)))
        (kill-buffer))
      (notmuch-search-next-thread))))

I moved the OUTPUT files elsewhere and repeated with this series applied
and then ran diff on the output. This gave 7 threads with a change (each
an individual message) from the 16000 threads/ 100000 messages which I
looked at individually as above.

Best wishes

Mark






On Mon, 14 Mar 2016, David Bremner <david@tethera.net> wrote:
> David Edmondson <dme@dme.org> writes:
>
>> On Sun, Mar 13 2016, Mark Walters wrote:
>>> However, it would be sensible to get testing in a greater variety of
>>> charsets/encodings
>>
>> Agreed. Does anyone have suggestions on how we might achieve this? A
>> corpus of mail that we could use?
>
> Maybe the notmuch performance corpus, particularly the lkml sample.
>
> grep -R charset= performance-test/corpus/mail/lkml | sed -e 's/^.*charset=//' -e 's/;.*//' -e 's/"//g' | tr '[A-Z]' '[a-z]' | sort -u
>
> gives
>
> euc-kr
> gb2312
> iso-2022-jp
> iso-2022-jp-2
> iso-8859-1
> iso-8859-14
> iso 8859-15
> iso-8859-15
> iso-8859-1
> iso-8859-2
> iso-8859-6
> iso-8859-7
> iso-8859-9
> koi8-r
> koi8-u
> ks_c_5601-1987
> shift_jis
> unknown
> unknown-8bit
> us-ascii
> utf8
> utf-8
> windows-1250
> windows-1251
> windows-1252
> windows-1255
>
>
> to unpack the corpus
>
> cd performance-test
> make download-corpus
> ./T00-new.sh --large
>
> probably interrupt the test once notmuch-new starts running.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 0/3] Improve the acquisition of text parts.
  2016-03-08 17:12 [PATCH v1 0/3] Improve the acquisition of text parts David Edmondson
                   ` (3 preceding siblings ...)
  2016-03-13 15:22 ` [PATCH v1 0/3] " Mark Walters
@ 2016-03-27 20:48 ` David Bremner
  4 siblings, 0 replies; 9+ messages in thread
From: David Bremner @ 2016-03-27 20:48 UTC (permalink / raw)
  To: David Edmondson, notmuch

David Edmondson <dme@dme.org> writes:

> Improve the acquisition of text parts.
>
> This affects the new "reply" behaviour and the rendering of
> application/octet-stream parts that are treated as text.
>

pushed.

d

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-03-27 20:48 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-08 17:12 [PATCH v1 0/3] Improve the acquisition of text parts David Edmondson
2016-03-08 17:12 ` [PATCH v1 1/3] emacs: `notmuch-show-insert-part-multipart/encrypted' should not assume the presence of a button David Edmondson
2016-03-08 17:12 ` [PATCH v1 2/3] emacs: Neaten `notmuch-show-insert-bodypart-internal' David Edmondson
2016-03-08 17:12 ` [PATCH v1 3/3] emacs: Improve the acquisition of text parts David Edmondson
2016-03-13 15:22 ` [PATCH v1 0/3] " Mark Walters
2016-03-14  8:31   ` David Edmondson
2016-03-14 11:49     ` David Bremner
2016-03-26  9:18       ` Mark Walters
2016-03-27 20:48 ` David Bremner

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).