* [PATCH v1 1/3] emacs: `notmuch-show-insert-part-multipart/encrypted' should not assume the presence of a button.
2016-03-08 17:12 [PATCH v1 0/3] Improve the acquisition of text parts David Edmondson
@ 2016-03-08 17:12 ` David Edmondson
2016-03-08 17:12 ` [PATCH v1 2/3] emacs: Neaten `notmuch-show-insert-bodypart-internal' David Edmondson
` (3 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: David Edmondson @ 2016-03-08 17:12 UTC (permalink / raw)
To: notmuch
Missed in c802d12a1e43fe69f2fcf7a2f7d44018a55bfb65.
---
emacs/notmuch-show.el | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index 371e62d..6e268f9 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -678,7 +678,8 @@ will return nil if the CID is unknown or cannot be retrieved."
(notmuch-crypto-insert-sigstatus-button sigstatus from))))
;; If we're not adding the encryption status, tell the user how
;; they can get it.
- (button-put button 'help-echo "Set notmuch-crypto-process-mime to process cryptographic MIME parts."))
+ (when button
+ (button-put button 'help-echo "Set notmuch-crypto-process-mime to process cryptographic MIME parts.")))
(let ((inner-parts (plist-get part :content))
(start (point)))
--
2.1.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v1 2/3] emacs: Neaten `notmuch-show-insert-bodypart-internal'.
2016-03-08 17:12 [PATCH v1 0/3] Improve the acquisition of text parts David Edmondson
2016-03-08 17:12 ` [PATCH v1 1/3] emacs: `notmuch-show-insert-part-multipart/encrypted' should not assume the presence of a button David Edmondson
@ 2016-03-08 17:12 ` David Edmondson
2016-03-08 17:12 ` [PATCH v1 3/3] emacs: Improve the acquisition of text parts David Edmondson
` (2 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: David Edmondson @ 2016-03-08 17:12 UTC (permalink / raw)
To: notmuch
---
emacs/notmuch-show.el | 25 ++++++++++---------------
1 file changed, 10 insertions(+), 15 deletions(-)
diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index 6e268f9..f739987 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -848,21 +848,16 @@ will return nil if the CID is unknown or cannot be retrieved."
;; \f
(defun notmuch-show-insert-bodypart-internal (msg part content-type nth depth button)
- (let ((handlers (notmuch-show-handlers-for content-type)))
- ;; Run the content handlers until one of them returns a non-nil
- ;; value.
- (while (and handlers
- (not (condition-case err
- (funcall (car handlers) msg part content-type nth depth button)
- ;; Specifying `debug' here lets the debugger
- ;; run if `debug-on-error' is non-nil.
- ((debug error)
- (progn
- (insert "!!! Bodypart insert error: ")
- (insert (error-message-string err))
- (insert " !!!\n") nil)))))
- (setq handlers (cdr handlers))))
- t)
+ ;; Run the handlers until one of them succeeds.
+ (loop for handler in (notmuch-show-handlers-for content-type)
+ until (condition-case err
+ (funcall handler msg part content-type nth depth button)
+ ;; Specifying `debug' here lets the debugger run if
+ ;; `debug-on-error' is non-nil.
+ ((debug error)
+ (insert "!!! Bodypart handler `" (prin1-to-string handler) "' threw an error:\n"
+ "!!! " (error-message-string err) "\n")
+ nil))))
(defun notmuch-show-create-part-overlays (button beg end)
"Add an overlay to the part between BEG and END"
--
2.1.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v1 3/3] emacs: Improve the acquisition of text parts.
2016-03-08 17:12 [PATCH v1 0/3] Improve the acquisition of text parts David Edmondson
2016-03-08 17:12 ` [PATCH v1 1/3] emacs: `notmuch-show-insert-part-multipart/encrypted' should not assume the presence of a button David Edmondson
2016-03-08 17:12 ` [PATCH v1 2/3] emacs: Neaten `notmuch-show-insert-bodypart-internal' David Edmondson
@ 2016-03-08 17:12 ` David Edmondson
2016-03-13 15:22 ` [PATCH v1 0/3] " Mark Walters
2016-03-27 20:48 ` David Bremner
4 siblings, 0 replies; 9+ messages in thread
From: David Edmondson @ 2016-03-08 17:12 UTC (permalink / raw)
To: notmuch
`notmuch-get-bodypart-text' assumed that it is always possible to
acquire text/* parts via the sexp output format. This is not true if the
part in question has a content type of application/octet-stream but is
being interpreted as text/* based on the extension of the part filename.
Rework `notmuch-get-bodypart-text' to use the raw output format to
address this and make the implementation common with that of
`notmuch-get-bodypart-binary'.
---
emacs/notmuch-lib.el | 73 ++++++++++++++++++++++------------------------------
1 file changed, 31 insertions(+), 42 deletions(-)
diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index 89c01a5..75a3706 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -537,6 +537,34 @@ the given type."
(lambda (part) (notmuch-match-content-type (plist-get part :content-type) type))
parts))
+(defun notmuch--get-bodypart-raw (msg part process-crypto binaryp cache)
+ (let* ((plist-elem (if binaryp :content-binary :content))
+ (data (or (plist-get part plist-elem)
+ (with-temp-buffer
+ ;; Emacs internally uses a UTF-8-like multibyte string
+ ;; representation by default (regardless of the coding
+ ;; system, which only affects how it goes from outside data
+ ;; to this internal representation). This *almost* never
+ ;; matters. Annoyingly, it does matter if we use this data
+ ;; in an image descriptor, since Emacs will use its internal
+ ;; data buffer directly and this multibyte representation
+ ;; corrupts binary image formats. Since the caller is
+ ;; asking for binary data, a unibyte string is a more
+ ;; appropriate representation anyway.
+ (when binaryp
+ (set-buffer-multibyte nil))
+ (let ((args `("show" "--format=raw"
+ ,(format "--part=%s" (plist-get part :id))
+ ,@(when process-crypto '("--decrypt"))
+ ,(notmuch-id-to-query (plist-get msg :id))))
+ (coding-system-for-read
+ (if binaryp 'no-conversion 'utf-8)))
+ (apply #'call-process notmuch-command nil '(t nil) nil args)
+ (buffer-string))))))
+ (when (and cache data)
+ (plist-put part plist-elem data))
+ data))
+
(defun notmuch-get-bodypart-binary (msg part process-crypto &optional cache)
"Return the unprocessed content of PART in MSG as a unibyte string.
@@ -547,57 +575,18 @@ this does no charset conversion.
If CACHE is non-nil, the content of this part will be saved in
MSG (if it isn't already)."
- (let ((data (plist-get part :binary-content)))
- (when (not data)
- (let ((args `("show" "--format=raw"
- ,(format "--part=%d" (plist-get part :id))
- ,@(when process-crypto '("--decrypt"))
- ,(notmuch-id-to-query (plist-get msg :id)))))
- (with-temp-buffer
- ;; Emacs internally uses a UTF-8-like multibyte string
- ;; representation by default (regardless of the coding
- ;; system, which only affects how it goes from outside data
- ;; to this internal representation). This *almost* never
- ;; matters. Annoyingly, it does matter if we use this data
- ;; in an image descriptor, since Emacs will use its internal
- ;; data buffer directly and this multibyte representation
- ;; corrupts binary image formats. Since the caller is
- ;; asking for binary data, a unibyte string is a more
- ;; appropriate representation anyway.
- (set-buffer-multibyte nil)
- (let ((coding-system-for-read 'no-conversion))
- (apply #'call-process notmuch-command nil '(t nil) nil args)
- (setq data (buffer-string)))))
- (when cache
- ;; Cheat. part is non-nil, and `plist-put' always modifies
- ;; the list in place if it's non-nil.
- (plist-put part :binary-content data)))
- data))
+ (notmuch--get-bodypart-raw msg part process-crypto t cache))
(defun notmuch-get-bodypart-text (msg part process-crypto &optional cache)
"Return the text content of PART in MSG.
This returns the content of the given part as a multibyte Lisp
string after performing content transfer decoding and any
-necessary charset decoding. It is an error to use this for
-non-text/* parts.
+necessary charset decoding.
If CACHE is non-nil, the content of this part will be saved in
MSG (if it isn't already)."
- (let ((content (plist-get part :content)))
- (when (not content)
- ;; Use show --format=sexp to fetch decoded content
- (let* ((args `("show" "--format=sexp" "--include-html"
- ,(format "--part=%s" (plist-get part :id))
- ,@(when process-crypto '("--decrypt"))
- ,(notmuch-id-to-query (plist-get msg :id))))
- (npart (apply #'notmuch-call-notmuch-sexp args)))
- (setq content (plist-get npart :content))
- (when (not content)
- (error "Internal error: No :content from %S" args)))
- (when cache
- (plist-put part :content content)))
- content))
+ (notmuch--get-bodypart-raw msg part process-crypto nil cache))
;; Workaround: The call to `mm-display-part' below triggers a bug in
;; Emacs 24 if it attempts to use the shr renderer to display an HTML
--
2.1.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v1 0/3] Improve the acquisition of text parts.
2016-03-08 17:12 [PATCH v1 0/3] Improve the acquisition of text parts David Edmondson
` (2 preceding siblings ...)
2016-03-08 17:12 ` [PATCH v1 3/3] emacs: Improve the acquisition of text parts David Edmondson
@ 2016-03-13 15:22 ` Mark Walters
2016-03-14 8:31 ` David Edmondson
2016-03-27 20:48 ` David Bremner
4 siblings, 1 reply; 9+ messages in thread
From: Mark Walters @ 2016-03-13 15:22 UTC (permalink / raw)
To: David Edmondson, notmuch
This looks good to me +1. However, it would be sensible to get testing
in a greater variety of charsets/encodings
Best wishes
Mark
On Tue, 08 Mar 2016, David Edmondson <dme@dme.org> wrote:
> Improve the acquisition of text parts.
>
> This affects the new "reply" behaviour and the rendering of
> application/octet-stream parts that are treated as text.
>
>
> David Edmondson (3):
> emacs: `notmuch-show-insert-part-multipart/encrypted' should not
> assume the presence of a button.
> emacs: Neaten `notmuch-show-insert-bodypart-internal'.
> emacs: Improve the acquisition of text parts.
>
> emacs/notmuch-lib.el | 73 ++++++++++++++++++++++-----------------------------
> emacs/notmuch-show.el | 28 +++++++++-----------
> 2 files changed, 43 insertions(+), 58 deletions(-)
>
> --
> 2.1.4
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> https://notmuchmail.org/mailman/listinfo/notmuch
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v1 0/3] Improve the acquisition of text parts.
2016-03-13 15:22 ` [PATCH v1 0/3] " Mark Walters
@ 2016-03-14 8:31 ` David Edmondson
2016-03-14 11:49 ` David Bremner
0 siblings, 1 reply; 9+ messages in thread
From: David Edmondson @ 2016-03-14 8:31 UTC (permalink / raw)
To: Mark Walters, notmuch
On Sun, Mar 13 2016, Mark Walters wrote:
> However, it would be sensible to get testing in a greater variety of
> charsets/encodings
Agreed. Does anyone have suggestions on how we might achieve this? A
corpus of mail that we could use?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v1 0/3] Improve the acquisition of text parts.
2016-03-14 8:31 ` David Edmondson
@ 2016-03-14 11:49 ` David Bremner
2016-03-26 9:18 ` Mark Walters
0 siblings, 1 reply; 9+ messages in thread
From: David Bremner @ 2016-03-14 11:49 UTC (permalink / raw)
To: David Edmondson, Mark Walters, notmuch
David Edmondson <dme@dme.org> writes:
> On Sun, Mar 13 2016, Mark Walters wrote:
>> However, it would be sensible to get testing in a greater variety of
>> charsets/encodings
>
> Agreed. Does anyone have suggestions on how we might achieve this? A
> corpus of mail that we could use?
Maybe the notmuch performance corpus, particularly the lkml sample.
grep -R charset= performance-test/corpus/mail/lkml | sed -e 's/^.*charset=//' -e 's/;.*//' -e 's/"//g' | tr '[A-Z]' '[a-z]' | sort -u
gives
euc-kr
gb2312
iso-2022-jp
iso-2022-jp-2
iso-8859-1
iso-8859-14
iso 8859-15
iso-8859-15
iso-8859-1
iso-8859-2
iso-8859-6
iso-8859-7
iso-8859-9
koi8-r
koi8-u
ks_c_5601-1987
shift_jis
unknown
unknown-8bit
us-ascii
utf8
utf-8
windows-1250
windows-1251
windows-1252
windows-1255
to unpack the corpus
cd performance-test
make download-corpus
./T00-new.sh --large
probably interrupt the test once notmuch-new starts running.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v1 0/3] Improve the acquisition of text parts.
2016-03-14 11:49 ` David Bremner
@ 2016-03-26 9:18 ` Mark Walters
0 siblings, 0 replies; 9+ messages in thread
From: Mark Walters @ 2016-03-26 9:18 UTC (permalink / raw)
To: David Bremner, David Edmondson, notmuch
Hi
Sorry this email ended up rather long:
Summary: I have run a test (see below) on all of the lkml part of the
performance-corpus, and all the changes look expected. So this series
looks good to me.
First note how we do the bodypart-insertion: for a mime type of
text/plain we first try the text/plain handler, then a text/* handler,
and finally a */* handler until one succeeds. Before this series, when
the part is application/octet-stream but is detected as text/plain,
text/plain handler fails with a "bodypart insertion error" because
notmuch-get-bodypart-text fails can't get the text (because it's not
officially text). Thus we fall back on the */* handler and that inserts
the part.
With this series notmuch-get-bodypart-text succeeds and we stop.
Thus in most cases the only change is that we don't get a "bodypart
insertion error", but all the text looks the same. In a couple of cases
the text/plain handler wraps lines/replaces ^M by unix newlines, whereas
as the */* handler does not. This is an improvement.
There is one more "difference" but I think this is actually something
random. Sometimes when the part is application/tar or application/zip I
get "Bodypart insert error: Symbol's function definition is void:
gnus-recursive-directory-files". If I load gnus this goes away. In my
first batch of tests this only occurred when using this series, but
since then I have reproduced it on mainline. I think something else I
did when setting up the test on mainline caused gnus to be loaded, but i
have not worked out what is going on there.
Finally, the test was as follows. I downloaded the performance corpus,
configured a separate notmuch config file to use the
performance-test/corpus/mail/lkml as the mailstore, went into
notmuch-emacs and to the inbox (which contained all messages) and ran
the following lisp function
(defun my-save-all-show ()
(interactive)
(goto-char (point-min))
(let ((count 0))
(while (notmuch-search-find-thread-id)
(let ((thread-id (notmuch-search-find-thread-id)))
(setq count (1+ count))
(message "Thread %s: %s" count thread-id)
(notmuch-show thread-id)
(let ((text (buffer-string))
(coding-system-for-write 'no-conversion))
(with-temp-file (concat "OUTPUT-" thread-id) (insert text)))
(kill-buffer))
(notmuch-search-next-thread))))
I moved the OUTPUT files elsewhere and repeated with this series applied
and then ran diff on the output. This gave 7 threads with a change (each
an individual message) from the 16000 threads/ 100000 messages which I
looked at individually as above.
Best wishes
Mark
On Mon, 14 Mar 2016, David Bremner <david@tethera.net> wrote:
> David Edmondson <dme@dme.org> writes:
>
>> On Sun, Mar 13 2016, Mark Walters wrote:
>>> However, it would be sensible to get testing in a greater variety of
>>> charsets/encodings
>>
>> Agreed. Does anyone have suggestions on how we might achieve this? A
>> corpus of mail that we could use?
>
> Maybe the notmuch performance corpus, particularly the lkml sample.
>
> grep -R charset= performance-test/corpus/mail/lkml | sed -e 's/^.*charset=//' -e 's/;.*//' -e 's/"//g' | tr '[A-Z]' '[a-z]' | sort -u
>
> gives
>
> euc-kr
> gb2312
> iso-2022-jp
> iso-2022-jp-2
> iso-8859-1
> iso-8859-14
> iso 8859-15
> iso-8859-15
> iso-8859-1
> iso-8859-2
> iso-8859-6
> iso-8859-7
> iso-8859-9
> koi8-r
> koi8-u
> ks_c_5601-1987
> shift_jis
> unknown
> unknown-8bit
> us-ascii
> utf8
> utf-8
> windows-1250
> windows-1251
> windows-1252
> windows-1255
>
>
> to unpack the corpus
>
> cd performance-test
> make download-corpus
> ./T00-new.sh --large
>
> probably interrupt the test once notmuch-new starts running.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v1 0/3] Improve the acquisition of text parts.
2016-03-08 17:12 [PATCH v1 0/3] Improve the acquisition of text parts David Edmondson
` (3 preceding siblings ...)
2016-03-13 15:22 ` [PATCH v1 0/3] " Mark Walters
@ 2016-03-27 20:48 ` David Bremner
4 siblings, 0 replies; 9+ messages in thread
From: David Bremner @ 2016-03-27 20:48 UTC (permalink / raw)
To: David Edmondson, notmuch
David Edmondson <dme@dme.org> writes:
> Improve the acquisition of text parts.
>
> This affects the new "reply" behaviour and the rendering of
> application/octet-stream parts that are treated as text.
>
pushed.
d
^ permalink raw reply [flat|nested] 9+ messages in thread