unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [PATCH 00/11] Improve charset and cid: handling
@ 2014-04-21 18:37 Austin Clements
  2014-04-21 18:37 ` [PATCH 01/11] emacs: Remove redundant NTH argument from `notmuch-get-bodypart-content' Austin Clements
                   ` (12 more replies)
  0 siblings, 13 replies; 32+ messages in thread
From: Austin Clements @ 2014-04-21 18:37 UTC (permalink / raw)
  To: notmuch

I set out to quickly add support for cid: links in the shr renderer
and wound up making our charset handling more robust and rewriting our
content-ID handling.  The test introduced in patch 2 passes in all but
one really obscure case, but only because of many unwritten and
potentially fragile assumptions that Emacs and the CLI make about each
other.

The first three patches could reasonably go in to 0.18.  The rest of
this series is certainly post-0.18, but I didn't want to lose track of
it.

This series comes in three stages.  Each depends on the earlier ones,
but each prefix makes sense on its own and could be pushed without the
later stages.

Patch 1 is a simple clean up patch.

Patches 2 through 7 robust-ify our charset handling in Emacs, mostly
by splitting the broken `notmuch-get-bodypart-content' API into
`notmuch-get-bodypart-binary' and `notmuch-get-bodypart-text' so a
caller can explicitly convey their requirements.

The remaining patches improve our content-ID handling and add support
for cid: links for shr.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 01/11] emacs: Remove redundant NTH argument from `notmuch-get-bodypart-content'.
  2014-04-21 18:37 [PATCH 00/11] Improve charset and cid: handling Austin Clements
@ 2014-04-21 18:37 ` Austin Clements
  2014-07-10  9:46   ` David Bremner
  2014-04-21 18:37 ` [PATCH 02/11] test: New tests for Emacs charset handling Austin Clements
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Austin Clements @ 2014-04-21 18:37 UTC (permalink / raw)
  To: notmuch

This can be derived from the PART argument (which is arguably
canonical), so there's no sense in giving the caller an extra foot
gun.
---
 emacs/notmuch-lib.el  | 9 +++++----
 emacs/notmuch-show.el | 6 +++---
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index 2941da3..35b9065 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -518,9 +518,10 @@ (defun notmuch-get-bodypart-internal (query part-number process-crypto)
 	  (apply 'call-process (append (list notmuch-command nil (list t nil) nil) args))
 	  (buffer-string))))))
 
-(defun notmuch-get-bodypart-content (msg part nth process-crypto)
+(defun notmuch-get-bodypart-content (msg part process-crypto)
   (or (plist-get part :content)
-      (notmuch-get-bodypart-internal (notmuch-id-to-query (plist-get msg :id)) nth process-crypto)))
+      (notmuch-get-bodypart-internal (notmuch-id-to-query (plist-get msg :id))
+				     (plist-get part :id) process-crypto)))
 
 ;; Workaround: The call to `mm-display-part' below triggers a bug in
 ;; Emacs 24 if it attempts to use the shr renderer to display an HTML
@@ -536,7 +537,7 @@ (defun notmuch-get-bodypart-content (msg part nth process-crypto)
       (ad-disable-advice 'mm-shr 'before 'load-gnus-arts)
       (ad-activate 'mm-shr)))
 
-(defun notmuch-mm-display-part-inline (msg part nth content-type process-crypto)
+(defun notmuch-mm-display-part-inline (msg part content-type process-crypto)
   "Use the mm-decode/mm-view functions to display a part in the
 current buffer, if possible."
   (let ((display-buffer (current-buffer)))
@@ -552,7 +553,7 @@ (defun notmuch-mm-display-part-inline (msg part nth content-type process-crypto)
 	;; test whether we are able to inline it (which includes both
 	;; capability and suitability tests).
 	(when (mm-inlined-p handle)
-	  (insert (notmuch-get-bodypart-content msg part nth process-crypto))
+	  (insert (notmuch-get-bodypart-content msg part process-crypto))
 	  (when (mm-inlinable-p handle)
 	    (set-buffer display-buffer)
 	    (mm-display-part handle)
diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index df10d4b..949ac09 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -695,7 +695,7 @@ (defun notmuch-show-insert-part-text/plain (msg part content-type nth depth butt
   (let ((start (if button
 		   (button-start button)
 		 (point))))
-    (insert (notmuch-get-bodypart-content msg part nth notmuch-show-process-crypto))
+    (insert (notmuch-get-bodypart-content msg part notmuch-show-process-crypto))
     (save-excursion
       (save-restriction
 	(narrow-to-region start (point-max))
@@ -704,7 +704,7 @@ (defun notmuch-show-insert-part-text/plain (msg part content-type nth depth butt
 
 (defun notmuch-show-insert-part-text/calendar (msg part content-type nth depth button)
   (insert (with-temp-buffer
-	    (insert (notmuch-get-bodypart-content msg part nth notmuch-show-process-crypto))
+	    (insert (notmuch-get-bodypart-content msg part notmuch-show-process-crypto))
 	    ;; notmuch-get-bodypart-content provides "raw", non-converted
 	    ;; data. Replace CRLF with LF before icalendar can use it.
 	    (goto-char (point-min))
@@ -756,7 +756,7 @@ (defun notmuch-show-insert-part-text/html (msg part content-type nth depth butto
 
 (defun notmuch-show-insert-part-*/* (msg part content-type nth depth button)
   ;; This handler _must_ succeed - it is the handler of last resort.
-  (notmuch-mm-display-part-inline msg part nth content-type notmuch-show-process-crypto)
+  (notmuch-mm-display-part-inline msg part content-type notmuch-show-process-crypto)
   t)
 
 ;; Functions for determining how to handle MIME parts.
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 02/11] test: New tests for Emacs charset handling
  2014-04-21 18:37 [PATCH 00/11] Improve charset and cid: handling Austin Clements
  2014-04-21 18:37 ` [PATCH 01/11] emacs: Remove redundant NTH argument from `notmuch-get-bodypart-content' Austin Clements
@ 2014-04-21 18:37 ` Austin Clements
  2014-04-24 14:38   ` Mark Walters
  2014-04-21 18:37 ` [PATCH 03/11] emacs: Fix coding system in `notmuch-show-view-raw-message' Austin Clements
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Austin Clements @ 2014-04-21 18:37 UTC (permalink / raw)
  To: notmuch

The test of viewing 8bit messages is known-broken.  The rest pass, but
for very fragile reasons.  The next several commits will fix the
known-broken test and make our charset handling robust.
---
 test/T455-emacs-charsets.sh | 141 ++++++++++++++++++++++++++++++++++++++++++++
 test/test-lib.el            |   4 +-
 2 files changed, 144 insertions(+), 1 deletion(-)
 create mode 100755 test/T455-emacs-charsets.sh

diff --git a/test/T455-emacs-charsets.sh b/test/T455-emacs-charsets.sh
new file mode 100755
index 0000000..a42a1d2
--- /dev/null
+++ b/test/T455-emacs-charsets.sh
@@ -0,0 +1,141 @@
+#!/usr/bin/env bash
+
+test_description="emacs notmuch-show charset handling"
+. ./test-lib.sh
+
+
+UTF8_YEN=$'\xef\xbf\xa5'
+BIG5_YEN=$'\xa2\x44'
+
+# Add four messages with unusual encoding requirements:
+#
+# 1) text/plain in quoted-printable big5
+generate_message \
+    [id]=test-plain@example.com \
+    '[content-type]="text/plain; charset=big5"' \
+    '[content-transfer-encoding]=quoted-printable' \
+    '[body]="Yen: =A2=44"'
+
+# 2) text/plain in 8bit big5
+generate_message \
+    [id]=test-plain-8bit@example.com \
+    '[content-type]="text/plain; charset=big5"' \
+    '[content-transfer-encoding]=8bit' \
+    '[body]="Yen: '$BIG5_YEN'"'
+
+# 3) text/html in quoted-printable big5
+generate_message \
+    [id]=test-html@example.com \
+    '[content-type]="text/html; charset=big5"' \
+    '[content-transfer-encoding]=quoted-printable' \
+    '[body]="<html><body>Yen: =A2=44</body></html>"'
+
+# 4) application/octet-stream in quoted-printable of big5 text
+generate_message \
+    [id]=test-binary@example.com \
+    '[content-type]="application/octet-stream"' \
+    '[content-transfer-encoding]=quoted-printable' \
+    '[body]="Yen: =A2=44"'
+
+notmuch new > /dev/null
+
+# Test rendering
+
+test_begin_subtest "Text parts are decoded when rendering"
+test_emacs '(notmuch-show "id:test-plain@example.com")
+	    (test-visible-output "OUTPUT.raw")'
+awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT
+cat <<EOF >EXPECTED
+Yen: $UTF8_YEN
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
+test_begin_subtest "8bit text parts are decoded when rendering"
+test_emacs '(notmuch-show "id:test-plain-8bit@example.com")
+	    (test-visible-output "OUTPUT.raw")'
+awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT
+cat <<EOF >EXPECTED
+Yen: $UTF8_YEN
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
+test_begin_subtest "HTML parts are decoded when rendering"
+test_emacs '(notmuch-show "id:test-html@example.com")
+	    (test-visible-output "OUTPUT.raw")'
+awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT
+cat <<EOF >EXPECTED
+[ text/html ]
+Yen: $UTF8_YEN
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
+# Test saving
+
+test_begin_subtest "Text parts are not decoded when saving"
+rm -f part
+test_emacs '(notmuch-show "id:test-plain@example.com")
+	    (search-forward "Yen")
+	    (let ((standard-input "\"part\""))
+	       (notmuch-show-save-part))'
+cat <<EOF >EXPECTED
+Yen: $BIG5_YEN
+EOF
+test_expect_equal_file part EXPECTED
+
+test_begin_subtest "8bit text parts are not decoded when saving"
+rm -f part
+test_emacs '(notmuch-show "id:test-plain-8bit@example.com")
+	    (search-forward "Yen")
+	    (let ((standard-input "\"part\""))
+	       (notmuch-show-save-part))'
+cat <<EOF >EXPECTED
+Yen: $BIG5_YEN
+EOF
+test_expect_equal_file part EXPECTED
+
+test_begin_subtest "HTML parts are not decoded when saving"
+rm -f part
+test_emacs '(notmuch-show "id:test-html@example.com")
+	    (search-forward "Yen")
+	    (let ((standard-input "\"part\""))
+	       (notmuch-show-save-part))'
+cat <<EOF >EXPECTED
+<html><body>Yen: $BIG5_YEN</body></html>
+EOF
+test_expect_equal_file part EXPECTED
+
+test_begin_subtest "Binary parts are not decoded when saving"
+rm -f part
+test_emacs '(notmuch-show "id:test-binary@example.com")
+	    (search-forward "application/")
+	    (let ((standard-input "\"part\""))
+	       (notmuch-show-save-part))'
+cat <<EOF >EXPECTED
+Yen: $BIG5_YEN
+EOF
+test_expect_equal_file part EXPECTED
+
+# Test message viewing
+
+test_begin_subtest "Text message are not decoded when viewing"
+test_emacs '(notmuch-show "id:test-plain@example.com")
+	    (notmuch-show-view-raw-message)
+	    (test-visible-output "OUTPUT.raw")'
+awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT
+cat <<EOF >EXPECTED
+Yen: =A2=44
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
+test_begin_subtest "8bit text message are not decoded when viewing"
+test_subtest_known_broken
+test_emacs '(notmuch-show "id:test-plain-8bit@example.com")
+	    (notmuch-show-view-raw-message)
+	    (test-visible-output "OUTPUT.raw")'
+awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT
+cat <<EOF >EXPECTED
+Yen: $BIG5_YEN
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
+test_done
diff --git a/test/test-lib.el b/test/test-lib.el
index 437f83f..dd3446e 100644
--- a/test/test-lib.el
+++ b/test/test-lib.el
@@ -57,7 +57,9 @@ (defun test-output (&optional filename)
 (defun test-visible-output (&optional filename)
   "Save visible text in current buffer to file FILENAME.  Default
 FILENAME is OUTPUT."
-  (let ((text (visible-buffer-string)))
+  (let ((text (visible-buffer-string))
+	;; Tests expect output in UTF-8 encoding
+	(coding-system-for-write 'utf-8))
     (with-temp-file (or filename "OUTPUT") (insert text))))
 
 (defun visible-buffer-string ()
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 03/11] emacs: Fix coding system in `notmuch-show-view-raw-message'
  2014-04-21 18:37 [PATCH 00/11] Improve charset and cid: handling Austin Clements
  2014-04-21 18:37 ` [PATCH 01/11] emacs: Remove redundant NTH argument from `notmuch-get-bodypart-content' Austin Clements
  2014-04-21 18:37 ` [PATCH 02/11] test: New tests for Emacs charset handling Austin Clements
@ 2014-04-21 18:37 ` Austin Clements
  2014-07-10  9:48   ` David Bremner
  2014-09-23 17:59   ` David Bremner
  2014-04-21 18:37 ` [PATCH 04/11] emacs: Track full message and part descriptor in w3m CID store Austin Clements
                   ` (9 subsequent siblings)
  12 siblings, 2 replies; 32+ messages in thread
From: Austin Clements @ 2014-04-21 18:37 UTC (permalink / raw)
  To: notmuch

This fixes the known-broken test of viewing 8bit messages added by the
previous commit.
---
 emacs/notmuch-show.el       | 5 +++--
 test/T455-emacs-charsets.sh | 1 -
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index 949ac09..2b225df 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -1748,11 +1748,12 @@ (defun notmuch-show-previous-open-message ()
   (notmuch-show-message-adjust))
 
 (defun notmuch-show-view-raw-message ()
-  "View the file holding the current message."
+  "View the original source of the current message."
   (interactive)
   (let* ((id (notmuch-show-get-message-id))
 	 (buf (get-buffer-create (concat "*notmuch-raw-" id "*"))))
-    (call-process notmuch-command nil buf nil "show" "--format=raw" id)
+    (let ((coding-system-for-read 'no-conversion))
+      (call-process notmuch-command nil buf nil "show" "--format=raw" id))
     (switch-to-buffer buf)
     (goto-char (point-min))
     (set-buffer-modified-p nil)
diff --git a/test/T455-emacs-charsets.sh b/test/T455-emacs-charsets.sh
index a42a1d2..3078f9c 100755
--- a/test/T455-emacs-charsets.sh
+++ b/test/T455-emacs-charsets.sh
@@ -128,7 +128,6 @@ EOF
 test_expect_equal_file OUTPUT EXPECTED
 
 test_begin_subtest "8bit text message are not decoded when viewing"
-test_subtest_known_broken
 test_emacs '(notmuch-show "id:test-plain-8bit@example.com")
 	    (notmuch-show-view-raw-message)
 	    (test-visible-output "OUTPUT.raw")'
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 04/11] emacs: Track full message and part descriptor in w3m CID store
  2014-04-21 18:37 [PATCH 00/11] Improve charset and cid: handling Austin Clements
                   ` (2 preceding siblings ...)
  2014-04-21 18:37 ` [PATCH 03/11] emacs: Fix coding system in `notmuch-show-view-raw-message' Austin Clements
@ 2014-04-21 18:37 ` Austin Clements
  2014-07-10  9:52   ` David Bremner
  2014-04-21 18:37 ` [PATCH 05/11] emacs: Create an API for fetching parts as undecoded binary Austin Clements
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Austin Clements @ 2014-04-21 18:37 UTC (permalink / raw)
  To: notmuch

This will simplify later changes.
---
 emacs/notmuch-show.el | 33 ++++++++++-----------------------
 1 file changed, 10 insertions(+), 23 deletions(-)

diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index 2b225df..455cfee 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -540,35 +540,26 @@ (defvar w3m-current-buffer) ;; From `w3m.el'.
 (defvar notmuch-show-w3m-cid-store nil)
 (make-variable-buffer-local 'notmuch-show-w3m-cid-store)
 
-(defun notmuch-show-w3m-cid-store-internal (content-id
-					    message-id
-					    part-number
-					    content-type
-					    content)
-  (push (list content-id
-	      message-id
-	      part-number
-	      content-type
-	      content)
+(defun notmuch-show-w3m-cid-store-internal (content-id msg part content)
+  (push (list content-id msg part content)
 	notmuch-show-w3m-cid-store))
 
 (defun notmuch-show-w3m-cid-store (msg part)
   (let ((content-id (plist-get part :content-id)))
     (when content-id
       (notmuch-show-w3m-cid-store-internal (concat "cid:" content-id)
-					   (plist-get msg :id)
-					   (plist-get part :id)
-					   (plist-get part :content-type)
-					   nil))))
+					   msg part nil))))
 
 (defun notmuch-show-w3m-cid-retrieve (url &rest args)
   (let ((matching-part (with-current-buffer w3m-current-buffer
 			 (assoc url notmuch-show-w3m-cid-store))))
     (if matching-part
-	(let ((message-id (nth 1 matching-part))
-	      (part-number (nth 2 matching-part))
-	      (content-type (nth 3 matching-part))
-	      (content (nth 4 matching-part)))
+	(let* ((msg (nth 1 matching-part))
+	       (part (nth 2 matching-part))
+	       (content (nth 3 matching-part))
+	       (message-id (plist-get msg :id))
+	       (part-number (plist-get part :id))
+	       (content-type (plist-get part :content-type)))
 	  ;; If we don't already have the content, get it and cache
 	  ;; it, as some messages reference the same cid: part many
 	  ;; times (hundreds!), which results in many calls to
@@ -577,11 +568,7 @@ (defun notmuch-show-w3m-cid-retrieve (url &rest args)
 	    (setq content (notmuch-get-bodypart-internal (notmuch-id-to-query message-id)
 							      part-number notmuch-show-process-crypto))
 	    (with-current-buffer w3m-current-buffer
-	      (notmuch-show-w3m-cid-store-internal url
-						   message-id
-						   part-number
-						   content-type
-						   content)))
+	      (notmuch-show-w3m-cid-store-internal url msg part content)))
 	  (insert content)
 	  content-type)
       nil)))
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 05/11] emacs: Create an API for fetching parts as undecoded binary
  2014-04-21 18:37 [PATCH 00/11] Improve charset and cid: handling Austin Clements
                   ` (3 preceding siblings ...)
  2014-04-21 18:37 ` [PATCH 04/11] emacs: Track full message and part descriptor in w3m CID store Austin Clements
@ 2014-04-21 18:37 ` Austin Clements
  2014-04-21 18:37 ` [PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' API Austin Clements
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Austin Clements @ 2014-04-21 18:37 UTC (permalink / raw)
  To: notmuch

The new function, `notmuch-get-bodypart-binary', replaces
`notmuch-get-bodypart-internal'.  Whereas the old function was really
meant for internal use in `notmuch-get-bodypart-content', it was used
in a few other places.  Since the difference between
`notmuch-get-bodypart-content' and `notmuch-get-bodypart-internal' was
unclear, these other uses were always confusing and potentially
inconsistent.  The new call clearly requests the part as undecoded
binary.

This is step 1 of 2 in separating `notmuch-get-bodypart-content' into
two APIs for retrieving either undecoded binary or decoded text.
---
 emacs/notmuch-lib.el  | 28 ++++++++++++++--------------
 emacs/notmuch-show.el | 22 +++++++++-------------
 2 files changed, 23 insertions(+), 27 deletions(-)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index 35b9065..4711098 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -503,25 +503,25 @@ (defun notmuch-parts-filter-by-type (parts type)
    (lambda (part) (notmuch-match-content-type (plist-get part :content-type) type))
    parts))
 
-;; Helper for parts which are generally not included in the default
-;; SEXP output.
-(defun notmuch-get-bodypart-internal (query part-number process-crypto)
-  (let ((args '("show" "--format=raw"))
-	(part-arg (format "--part=%s" part-number)))
-    (setq args (append args (list part-arg)))
-    (if process-crypto
-	(setq args (append args '("--decrypt"))))
-    (setq args (append args (list query)))
+(defun notmuch-get-bodypart-binary (msg part process-crypto)
+  "Return the unprocessed content of PART in MSG.
+
+This returns the \"raw\" content of the given part after content
+transfer decoding, but with no further processing (see the
+discussion of --format=raw in man notmuch-show).  In particular,
+this does no charset conversion."
+  (let ((args `("show" "--format=raw"
+		,(format "--part=%d" (plist-get part :id))
+		,@(when process-crypto '("--decrypt"))
+		,(notmuch-id-to-query (plist-get msg :id)))))
     (with-temp-buffer
       (let ((coding-system-for-read 'no-conversion))
-	(progn
-	  (apply 'call-process (append (list notmuch-command nil (list t nil) nil) args))
-	  (buffer-string))))))
+	(apply #'call-process notmuch-command nil '(t nil) nil args)
+	(buffer-string)))))
 
 (defun notmuch-get-bodypart-content (msg part process-crypto)
   (or (plist-get part :content)
-      (notmuch-get-bodypart-internal (notmuch-id-to-query (plist-get msg :id))
-				     (plist-get part :id) process-crypto)))
+      (notmuch-get-bodypart-binary msg part process-crypto)))
 
 ;; Workaround: The call to `mm-display-part' below triggers a bug in
 ;; Emacs 24 if it attempts to use the shr renderer to display an HTML
diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index 455cfee..e9867fd 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -557,16 +557,14 @@ (defun notmuch-show-w3m-cid-retrieve (url &rest args)
 	(let* ((msg (nth 1 matching-part))
 	       (part (nth 2 matching-part))
 	       (content (nth 3 matching-part))
-	       (message-id (plist-get msg :id))
-	       (part-number (plist-get part :id))
 	       (content-type (plist-get part :content-type)))
 	  ;; If we don't already have the content, get it and cache
 	  ;; it, as some messages reference the same cid: part many
 	  ;; times (hundreds!), which results in many calls to
 	  ;; `notmuch part'.
 	  (unless content
-	    (setq content (notmuch-get-bodypart-internal (notmuch-id-to-query message-id)
-							      part-number notmuch-show-process-crypto))
+	    (setq content (notmuch-get-bodypart-binary
+			   msg part notmuch-show-process-crypto))
 	    (with-current-buffer w3m-current-buffer
 	      (notmuch-show-w3m-cid-store-internal url msg part content)))
 	  (insert content)
@@ -2067,15 +2065,14 @@ (defun notmuch-show-stash-mlarchive-link-and-go (&optional mla)
 
 ;; Interactive part functions and their helpers
 
-(defun notmuch-show-generate-part-buffer (message-id nth)
+(defun notmuch-show-generate-part-buffer (msg part)
   "Return a temporary buffer containing the specified part's content."
   (let ((buf (generate-new-buffer " *notmuch-part*"))
 	(process-crypto notmuch-show-process-crypto))
     (with-current-buffer buf
-      (setq notmuch-show-process-crypto process-crypto)
-      ;; Always acquires the part via `notmuch part', even if it is
-      ;; available in the SEXP output.
-      (insert (notmuch-get-bodypart-internal message-id nth notmuch-show-process-crypto)))
+      ;; This is always used in the content of mm handles, which
+      ;; expect undecoded, binary part content.
+      (insert (notmuch-get-bodypart-binary msg part process-crypto)))
     buf))
 
 (defun notmuch-show-current-part-handle ()
@@ -2083,10 +2080,9 @@ (defun notmuch-show-current-part-handle ()
 
 This creates a temporary buffer for the part's content; the
 caller is responsible for killing this buffer as appropriate."
-  (let* ((part (notmuch-show-get-part-properties))
-	 (message-id (notmuch-show-get-message-id))
-	 (nth (plist-get part :id))
-	 (buf (notmuch-show-generate-part-buffer message-id nth))
+  (let* ((msg (notmuch-show-get-message-properties))
+	 (part (notmuch-show-get-part-properties))
+	 (buf (notmuch-show-generate-part-buffer msg part))
 	 (computed-type (plist-get part :computed-type))
 	 (filename (plist-get part :filename))
 	 (disposition (if filename `(attachment (filename . ,filename)))))
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' API
  2014-04-21 18:37 [PATCH 00/11] Improve charset and cid: handling Austin Clements
                   ` (4 preceding siblings ...)
  2014-04-21 18:37 ` [PATCH 05/11] emacs: Create an API for fetching parts as undecoded binary Austin Clements
@ 2014-04-21 18:37 ` Austin Clements
  2014-07-11 11:48   ` David Bremner
  2014-04-21 18:37 ` [PATCH 07/11] emacs: Return unibyte strings for binary part data Austin Clements
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Austin Clements @ 2014-04-21 18:37 UTC (permalink / raw)
  To: notmuch

`notmuch-get-bodypart-content' could do two very different things,
depending on conditions: for text/* parts other than text/html, it
would return the part content as a multibyte Lisp string *after*
charset conversion, while for other parts (including text/html), it
would return binary part content without charset conversion.

This commit completes the split of `notmuch-get-bodypart-content' into
two different and explicit APIs: `notmuch-get-bodypart-binary' and
`notmuch-get-bodypart-text'.  It updates all callers to use one or the
other depending on what's appropriate.
---
 emacs/notmuch-lib.el  | 37 ++++++++++++++++++++++++++++---------
 emacs/notmuch-show.el |  8 ++++----
 2 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index 4711098..ece29c3 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -519,9 +519,25 @@ (defun notmuch-get-bodypart-binary (msg part process-crypto)
 	(apply #'call-process notmuch-command nil '(t nil) nil args)
 	(buffer-string)))))
 
-(defun notmuch-get-bodypart-content (msg part process-crypto)
-  (or (plist-get part :content)
-      (notmuch-get-bodypart-binary msg part process-crypto)))
+(defun notmuch-get-bodypart-text (msg part process-crypto)
+  "Return the text content of PART in MSG.
+
+This returns the content of the given part as a multibyte Lisp
+string after performing content transfer decoding and any
+necessary charset decoding.  It is an error to use this for
+non-text/* parts."
+  (let ((content (plist-get part :content)))
+    (when (not content)
+      ;; Use show --format=sexp to fetch decoded content
+      (let* ((args `("show" "--format=sexp" "--include-html"
+		     ,(format "--part=%s" (plist-get part :id))
+		     ,@(when process-crypto '("--decrypt"))
+		     ,(notmuch-id-to-query (plist-get msg :id))))
+	     (npart (apply #'notmuch-call-notmuch-sexp args)))
+	(setq content (plist-get npart :content))
+	(when (not content)
+	  (error "Internal error: No :content from %S" args))))
+    content))
 
 ;; Workaround: The call to `mm-display-part' below triggers a bug in
 ;; Emacs 24 if it attempts to use the shr renderer to display an HTML
@@ -542,18 +558,21 @@ (defun notmuch-mm-display-part-inline (msg part content-type process-crypto)
 current buffer, if possible."
   (let ((display-buffer (current-buffer)))
     (with-temp-buffer
-      ;; In case there is :content, the content string is already converted
-      ;; into emacs internal format. `gnus-decoded' is a fake charset,
-      ;; which means no further decoding (to be done by mm- functions).
-      (let* ((charset (if (plist-member part :content)
-			  'gnus-decoded
+      ;; In case we already have :content, use it and tell mm-* that
+      ;; it's already been charset-decoded by using the fake
+      ;; `gnus-decoded' charset.  Otherwise, we'll fetch the binary
+      ;; part content and let mm-* decode it.
+      (let* ((have-content (plist-member part :content))
+	     (charset (if have-content 'gnus-decoded
 			(plist-get part :content-charset)))
 	     (handle (mm-make-handle (current-buffer) `(,content-type (charset . ,charset)))))
 	;; If the user wants the part inlined, insert the content and
 	;; test whether we are able to inline it (which includes both
 	;; capability and suitability tests).
 	(when (mm-inlined-p handle)
-	  (insert (notmuch-get-bodypart-content msg part process-crypto))
+	  (if have-content
+	      (insert (notmuch-get-bodypart-text msg part process-crypto))
+	    (insert (notmuch-get-bodypart-binary msg part process-crypto)))
 	  (when (mm-inlinable-p handle)
 	    (set-buffer display-buffer)
 	    (mm-display-part handle)
diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index e9867fd..537c558 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -680,7 +680,7 @@ (defun notmuch-show-insert-part-text/plain (msg part content-type nth depth butt
   (let ((start (if button
 		   (button-start button)
 		 (point))))
-    (insert (notmuch-get-bodypart-content msg part notmuch-show-process-crypto))
+    (insert (notmuch-get-bodypart-text msg part notmuch-show-process-crypto))
     (save-excursion
       (save-restriction
 	(narrow-to-region start (point-max))
@@ -689,9 +689,9 @@ (defun notmuch-show-insert-part-text/plain (msg part content-type nth depth butt
 
 (defun notmuch-show-insert-part-text/calendar (msg part content-type nth depth button)
   (insert (with-temp-buffer
-	    (insert (notmuch-get-bodypart-content msg part notmuch-show-process-crypto))
-	    ;; notmuch-get-bodypart-content provides "raw", non-converted
-	    ;; data. Replace CRLF with LF before icalendar can use it.
+	    (insert (notmuch-get-bodypart-text msg part notmuch-show-process-crypto))
+	    ;; notmuch-get-bodypart-text does no newline conversion.
+	    ;; Replace CRLF with LF before icalendar can use it.
 	    (goto-char (point-min))
 	    (while (re-search-forward "\r\n" nil t)
 	      (replace-match "\n" nil nil))
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 07/11] emacs: Return unibyte strings for binary part data
  2014-04-21 18:37 [PATCH 00/11] Improve charset and cid: handling Austin Clements
                   ` (5 preceding siblings ...)
  2014-04-21 18:37 ` [PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' API Austin Clements
@ 2014-04-21 18:37 ` Austin Clements
  2014-04-21 18:37 ` [PATCH 08/11] emacs: Support caching in notmuch-get-bodypart-{binary, text} Austin Clements
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Austin Clements @ 2014-04-21 18:37 UTC (permalink / raw)
  To: notmuch

Unibyte strings are meant for representing binary data.  In practice,
using unibyte versus multibyte strings affects *almost* nothing.  It
does happen to matter if we use the binary data in an image descriptor
(which is, helpfully, not documented anywhere and getting it wrong
results in opaque errors like "Not a PNG image: <giant binary spew
that is, in fact, a PNG image>").
---
 emacs/notmuch-lib.el | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index ece29c3..fc67b14 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -504,7 +504,7 @@ (defun notmuch-parts-filter-by-type (parts type)
    parts))
 
 (defun notmuch-get-bodypart-binary (msg part process-crypto)
-  "Return the unprocessed content of PART in MSG.
+  "Return the unprocessed content of PART in MSG as a unibyte string.
 
 This returns the \"raw\" content of the given part after content
 transfer decoding, but with no further processing (see the
@@ -515,6 +515,16 @@ (defun notmuch-get-bodypart-binary (msg part process-crypto)
 		,@(when process-crypto '("--decrypt"))
 		,(notmuch-id-to-query (plist-get msg :id)))))
     (with-temp-buffer
+      ;; Emacs internally uses a UTF-8-like multibyte string
+      ;; representation by default (regardless of the coding system,
+      ;; which only affects how it goes from outside data to this
+      ;; internal representation).  This *almost* never matters.
+      ;; Annoyingly, it does matter if we use this data in an image
+      ;; descriptor, since Emacs will use its internal data buffer
+      ;; directly and this multibyte representation corrupts binary
+      ;; image formats.  Since the caller is asking for binary data, a
+      ;; unibyte string is a more appropriate representation anyway.
+      (set-buffer-multibyte nil)
       (let ((coding-system-for-read 'no-conversion))
 	(apply #'call-process notmuch-command nil '(t nil) nil args)
 	(buffer-string)))))
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 08/11] emacs: Support caching in notmuch-get-bodypart-{binary, text}
  2014-04-21 18:37 [PATCH 00/11] Improve charset and cid: handling Austin Clements
                   ` (6 preceding siblings ...)
  2014-04-21 18:37 ` [PATCH 07/11] emacs: Return unibyte strings for binary part data Austin Clements
@ 2014-04-21 18:37 ` Austin Clements
  2014-04-24 10:46   ` Mark Walters
  2014-04-21 18:37 ` [PATCH 09/11] emacs: Use generalized content caching in w3m CID code Austin Clements
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Austin Clements @ 2014-04-21 18:37 UTC (permalink / raw)
  To: notmuch

(The actual code change here is small, but requires re-indenting
existing code.)
---
 emacs/notmuch-lib.el | 52 ++++++++++++++++++++++++++++++----------------------
 1 file changed, 30 insertions(+), 22 deletions(-)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index fc67b14..fee8512 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -503,33 +503,39 @@ (defun notmuch-parts-filter-by-type (parts type)
    (lambda (part) (notmuch-match-content-type (plist-get part :content-type) type))
    parts))
 
-(defun notmuch-get-bodypart-binary (msg part process-crypto)
+(defun notmuch-get-bodypart-binary (msg part process-crypto &optional cache)
   "Return the unprocessed content of PART in MSG as a unibyte string.
 
 This returns the \"raw\" content of the given part after content
 transfer decoding, but with no further processing (see the
 discussion of --format=raw in man notmuch-show).  In particular,
 this does no charset conversion."
-  (let ((args `("show" "--format=raw"
-		,(format "--part=%d" (plist-get part :id))
-		,@(when process-crypto '("--decrypt"))
-		,(notmuch-id-to-query (plist-get msg :id)))))
-    (with-temp-buffer
-      ;; Emacs internally uses a UTF-8-like multibyte string
-      ;; representation by default (regardless of the coding system,
-      ;; which only affects how it goes from outside data to this
-      ;; internal representation).  This *almost* never matters.
-      ;; Annoyingly, it does matter if we use this data in an image
-      ;; descriptor, since Emacs will use its internal data buffer
-      ;; directly and this multibyte representation corrupts binary
-      ;; image formats.  Since the caller is asking for binary data, a
-      ;; unibyte string is a more appropriate representation anyway.
-      (set-buffer-multibyte nil)
-      (let ((coding-system-for-read 'no-conversion))
-	(apply #'call-process notmuch-command nil '(t nil) nil args)
-	(buffer-string)))))
-
-(defun notmuch-get-bodypart-text (msg part process-crypto)
+  (let ((data (plist-get part :binary-content)))
+    (when (not data)
+      (let ((args `("show" "--format=raw"
+		    ,(format "--part=%d" (plist-get part :id))
+		    ,@(when process-crypto '("--decrypt"))
+		    ,(notmuch-id-to-query (plist-get msg :id)))))
+	(with-temp-buffer
+	  ;; Emacs internally uses a UTF-8-like multibyte string
+	  ;; representation by default (regardless of the coding
+	  ;; system, which only affects how it goes from outside data
+	  ;; to this internal representation).  This *almost* never
+	  ;; matters.  Annoyingly, it does matter if we use this data
+	  ;; in an image descriptor, since Emacs will use its internal
+	  ;; data buffer directly and this multibyte representation
+	  ;; corrupts binary image formats.  Since the caller is
+	  ;; asking for binary data, a unibyte string is a more
+	  ;; appropriate representation anyway.
+	  (set-buffer-multibyte nil)
+	  (let ((coding-system-for-read 'no-conversion))
+	    (apply #'call-process notmuch-command nil '(t nil) nil args)
+	    (setq data (buffer-string)))))
+      (when cache
+	(plist-put part :binary-content data)))
+    data))
+
+(defun notmuch-get-bodypart-text (msg part process-crypto &optional cache)
   "Return the text content of PART in MSG.
 
 This returns the content of the given part as a multibyte Lisp
@@ -546,7 +552,9 @@ (defun notmuch-get-bodypart-text (msg part process-crypto)
 	     (npart (apply #'notmuch-call-notmuch-sexp args)))
 	(setq content (plist-get npart :content))
 	(when (not content)
-	  (error "Internal error: No :content from %S" args))))
+	  (error "Internal error: No :content from %S" args)))
+      (when cache
+	(plist-put part :content content)))
     content))
 
 ;; Workaround: The call to `mm-display-part' below triggers a bug in
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 09/11] emacs: Use generalized content caching in w3m CID code
  2014-04-21 18:37 [PATCH 00/11] Improve charset and cid: handling Austin Clements
                   ` (7 preceding siblings ...)
  2014-04-21 18:37 ` [PATCH 08/11] emacs: Support caching in notmuch-get-bodypart-{binary, text} Austin Clements
@ 2014-04-21 18:37 ` Austin Clements
  2014-04-21 18:37 ` [PATCH 10/11] emacs: Rewrite content ID handling Austin Clements
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Austin Clements @ 2014-04-21 18:37 UTC (permalink / raw)
  To: notmuch

Previously this did its own caching, but this is now supported by more
generally by `notmuch-get-bodypart-binary'.
---
 emacs/notmuch-show.el | 23 ++++++++---------------
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index 537c558..9411c9a 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -540,15 +540,14 @@ (defvar w3m-current-buffer) ;; From `w3m.el'.
 (defvar notmuch-show-w3m-cid-store nil)
 (make-variable-buffer-local 'notmuch-show-w3m-cid-store)
 
-(defun notmuch-show-w3m-cid-store-internal (content-id msg part content)
-  (push (list content-id msg part content)
-	notmuch-show-w3m-cid-store))
+(defun notmuch-show-w3m-cid-store-internal (content-id msg part)
+  (push (list content-id msg part) notmuch-show-w3m-cid-store))
 
 (defun notmuch-show-w3m-cid-store (msg part)
   (let ((content-id (plist-get part :content-id)))
     (when content-id
       (notmuch-show-w3m-cid-store-internal (concat "cid:" content-id)
-					   msg part nil))))
+					   msg part))))
 
 (defun notmuch-show-w3m-cid-retrieve (url &rest args)
   (let ((matching-part (with-current-buffer w3m-current-buffer
@@ -556,18 +555,12 @@ (defun notmuch-show-w3m-cid-retrieve (url &rest args)
     (if matching-part
 	(let* ((msg (nth 1 matching-part))
 	       (part (nth 2 matching-part))
-	       (content (nth 3 matching-part))
 	       (content-type (plist-get part :content-type)))
-	  ;; If we don't already have the content, get it and cache
-	  ;; it, as some messages reference the same cid: part many
-	  ;; times (hundreds!), which results in many calls to
-	  ;; `notmuch part'.
-	  (unless content
-	    (setq content (notmuch-get-bodypart-binary
-			   msg part notmuch-show-process-crypto))
-	    (with-current-buffer w3m-current-buffer
-	      (notmuch-show-w3m-cid-store-internal url msg part content)))
-	  (insert content)
+	  ;; Request content caching, as some messages reference the
+	  ;; same cid: part many times (hundreds!), which results in
+	  ;; many calls to `notmuch show'.
+	  (insert (notmuch-get-bodypart-binary
+		   msg part notmuch-show-process-crypto 'cache))
 	  content-type)
       nil)))
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 10/11] emacs: Rewrite content ID handling
  2014-04-21 18:37 [PATCH 00/11] Improve charset and cid: handling Austin Clements
                   ` (8 preceding siblings ...)
  2014-04-21 18:37 ` [PATCH 09/11] emacs: Use generalized content caching in w3m CID code Austin Clements
@ 2014-04-21 18:37 ` Austin Clements
  2014-04-21 18:37 ` [PATCH 11/11] emacs: Support cid: references with shr renderer Austin Clements
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Austin Clements @ 2014-04-21 18:37 UTC (permalink / raw)
  To: notmuch

Besides generally cleaning up the code and separating the general
content ID handling from the w3m-specific code, this fixes several
problems.

Foremost is that, previously, the code roughly assumed that referenced
parts would be in the same multipart/related as the reference.
According to RFC 2392, nothing could be further from the truth:
content IDs are supposed to be globally unique and globally
addressable.  This is nonsense, but this patch at least fixes things
so content IDs can be anywhere in the same message.

As a side-effect of the above, this handles multipart/alternate
content-IDs more in line with RFC 2046 section 5.1.2 (not that I've
ever seen this in the wild).  This also properly URL-decodes cid:
URLs, as per RFC 2392 (the previous code did not), and applies crypto
settings from the show buffer (the previous code used the global
crypto settings).
---
 emacs/notmuch-show.el | 120 +++++++++++++++++++++++++++++++-------------------
 1 file changed, 74 insertions(+), 46 deletions(-)

diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index 9411c9a..f758091 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -503,6 +503,73 @@ (defun notmuch-show-toggle-part-invisibility (&optional button)
 	  (overlay-put overlay 'invisible (not show))
 	  t)))))
 
+;; Part content ID handling
+
+(defvar notmuch-show--cids nil
+  "Alist from raw content ID to (MSG PART).")
+(make-variable-buffer-local 'notmuch-show--cids)
+
+(defun notmuch-show--register-cids (msg part)
+  "Register content-IDs in PART and all of PART's sub-parts."
+  (let ((content-id (plist-get part :content-id)))
+    (when content-id
+      ;; Note that content-IDs are globally unique, except when they
+      ;; aren't: RFC 2046 section 5.1.4 permits children of a
+      ;; multipart/alternative to have the same content-ID, in which
+      ;; case the MUA is supposed to pick the best one it can render.
+      ;; We simply add the content-ID to the beginning of our alist;
+      ;; so if this happens, we'll take the last (and "best")
+      ;; alternative (even if we can't render it).
+      (push (list content-id msg part) notmuch-show--cids)))
+  ;; Recurse on sub-parts
+  (let ((ctype (notmuch-split-content-type
+		(downcase (plist-get part :content-type)))))
+    (cond ((equal (first ctype) "multipart")
+	   (mapc (apply-partially #'notmuch-show--register-cids msg)
+		 (plist-get part :content)))
+	  ((equal ctype '("message" "rfc822"))
+	   (notmuch-show--register-cids
+	    msg
+	    (first (plist-get (first (plist-get part :content)) :body)))))))
+
+(defun notmuch-show--get-cid-content (cid)
+  "Return a list (CID-content content-type) or nil.
+
+This will only find parts from messages that have been inserted
+into the current buffer.  CID must be a raw content ID, without
+enclosing angle brackets, a cid: prefix, or URL encoding.  This
+will return nil if the CID is unknown or cannot be retrieved."
+  (let ((descriptor (cdr (assoc cid notmuch-show--cids))))
+    (when descriptor
+      (let* ((msg (first descriptor))
+	     (part (second descriptor))
+	     ;; Request caching for this content, as some messages
+	     ;; reference the same cid: part many times (hundreds!).
+	     (content (notmuch-get-bodypart-binary
+		       msg part notmuch-show-process-crypto 'cache))
+	     (content-type (plist-get part :content-type)))
+	(list content content-type)))))
+
+(defun notmuch-show-setup-w3m ()
+  "Instruct w3m how to retrieve content from a \"related\" part of a message."
+  (interactive)
+  (if (boundp 'w3m-cid-retrieve-function-alist)
+    (unless (assq 'notmuch-show-mode w3m-cid-retrieve-function-alist)
+      (push (cons 'notmuch-show-mode #'notmuch-show--cid-w3m-retrieve)
+	    w3m-cid-retrieve-function-alist)))
+  (setq mm-inline-text-html-with-images t))
+
+(defvar w3m-current-buffer) ;; From `w3m.el'.
+(defun notmuch-show--cid-w3m-retrieve (url &rest args)
+  ;; url includes the cid: prefix and is URL encoded (see RFC 2392).
+  (let* ((cid (url-unhex-string (substring url 4)))
+	 (content-and-type
+	  (with-current-buffer w3m-current-buffer
+	    (notmuch-show--get-cid-content cid))))
+    (when content-and-type
+      (insert (first content-and-type))
+      (second content-and-type))))
+
 ;; MIME part renderers
 
 (defun notmuch-show-multipart/*-to-list (part)
@@ -527,56 +594,11 @@ (defun notmuch-show-insert-part-multipart/alternative (msg part content-type nth
       (indent-rigidly start (point) 1)))
   t)
 
-(defun notmuch-show-setup-w3m ()
-  "Instruct w3m how to retrieve content from a \"related\" part of a message."
-  (interactive)
-  (if (boundp 'w3m-cid-retrieve-function-alist)
-    (unless (assq 'notmuch-show-mode w3m-cid-retrieve-function-alist)
-      (push (cons 'notmuch-show-mode 'notmuch-show-w3m-cid-retrieve)
-	    w3m-cid-retrieve-function-alist)))
-  (setq mm-inline-text-html-with-images t))
-
-(defvar w3m-current-buffer) ;; From `w3m.el'.
-(defvar notmuch-show-w3m-cid-store nil)
-(make-variable-buffer-local 'notmuch-show-w3m-cid-store)
-
-(defun notmuch-show-w3m-cid-store-internal (content-id msg part)
-  (push (list content-id msg part) notmuch-show-w3m-cid-store))
-
-(defun notmuch-show-w3m-cid-store (msg part)
-  (let ((content-id (plist-get part :content-id)))
-    (when content-id
-      (notmuch-show-w3m-cid-store-internal (concat "cid:" content-id)
-					   msg part))))
-
-(defun notmuch-show-w3m-cid-retrieve (url &rest args)
-  (let ((matching-part (with-current-buffer w3m-current-buffer
-			 (assoc url notmuch-show-w3m-cid-store))))
-    (if matching-part
-	(let* ((msg (nth 1 matching-part))
-	       (part (nth 2 matching-part))
-	       (content-type (plist-get part :content-type)))
-	  ;; Request content caching, as some messages reference the
-	  ;; same cid: part many times (hundreds!), which results in
-	  ;; many calls to `notmuch show'.
-	  (insert (notmuch-get-bodypart-binary
-		   msg part notmuch-show-process-crypto 'cache))
-	  content-type)
-      nil)))
-
 (defun notmuch-show-insert-part-multipart/related (msg part content-type nth depth button)
   (let ((inner-parts (plist-get part :content))
 	(start (point)))
 
-    ;; We assume that the first part is text/html and the remainder
-    ;; things that it references.
-
-    ;; Stash the non-primary parts.
-    (mapc (lambda (part)
-	    (notmuch-show-w3m-cid-store msg part))
-	  (cdr inner-parts))
-
-    ;; Render the primary part.
+    ;; Render the primary part.  FIXME: Support RFC 2387 Start header.
     (notmuch-show-insert-bodypart msg (car inner-parts) depth)
     ;; Add hidden buttons for the rest
     (mapc (lambda (inner-part)
@@ -888,6 +910,12 @@ (defun notmuch-show-insert-bodypart (msg part depth &optional hide)
 
 (defun notmuch-show-insert-body (msg body depth)
   "Insert the body BODY at depth DEPTH in the current thread."
+
+  ;; Register all content IDs for this message.  According to RFC
+  ;; 2392, content IDs are *global*, but it's okay if an MUA treats
+  ;; them as only global within a message.
+  (notmuch-show--register-cids msg (first body))
+
   (mapc (lambda (part) (notmuch-show-insert-bodypart msg part depth)) body))
 
 (defun notmuch-show-make-symbol (type)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 11/11] emacs: Support cid: references with shr renderer
  2014-04-21 18:37 [PATCH 00/11] Improve charset and cid: handling Austin Clements
                   ` (9 preceding siblings ...)
  2014-04-21 18:37 ` [PATCH 10/11] emacs: Rewrite content ID handling Austin Clements
@ 2014-04-21 18:37 ` Austin Clements
  2014-05-01  9:20   ` David Edmondson
  2014-04-21 20:26 ` [PATCH 00/11] Improve charset and cid: handling Tomi Ollila
  2014-04-26 10:50 ` Mark Walters
  12 siblings, 1 reply; 32+ messages in thread
From: Austin Clements @ 2014-04-21 18:37 UTC (permalink / raw)
  To: notmuch

shr has really nice support for inline image rendering, but previously
we only had the hooks for w3m cid: references.
---
 emacs/notmuch-show.el | 41 +++++++++++++++++++++++++++++++++--------
 1 file changed, 33 insertions(+), 8 deletions(-)

diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index f758091..a489279 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -745,14 +745,39 @@ (defun notmuch-show-get-mime-type-of-application/octet-stream (part)
 	  nil))))
 
 (defun notmuch-show-insert-part-text/html (msg part content-type nth depth button)
-  ;; text/html handler to work around bugs in renderers and our
-  ;; invisibile parts code. In particular w3m sets up a keymap which
-  ;; "leaks" outside the invisible region and causes strange effects
-  ;; in notmuch. We set mm-inline-text-html-with-w3m-keymap to nil to
-  ;; tell w3m not to set a keymap (so the normal notmuch-show-mode-map
-  ;; remains).
-  (let ((mm-inline-text-html-with-w3m-keymap nil))
-    (notmuch-show-insert-part-*/* msg part content-type nth depth button)))
+  (if (eq mm-text-html-renderer 'shr)
+      ;; It's easier to drive shr ourselves than to work around the
+      ;; goofy things `mm-shr' does (like irreversibly taking over
+      ;; content ID handling).
+      (notmuch-show--insert-part-text/html-shr msg part)
+    ;; Otherwise, let message-mode do the heavy lifting
+    ;;
+    ;; w3m sets up a keymap which "leaks" outside the invisible region
+    ;; and causes strange effects in notmuch. We set
+    ;; mm-inline-text-html-with-w3m-keymap to nil to tell w3m not to
+    ;; set a keymap (so the normal notmuch-show-mode-map remains).
+    (let ((mm-inline-text-html-with-w3m-keymap nil))
+      (notmuch-show-insert-part-*/* msg part content-type nth depth button))))
+
+(defun notmuch-show--insert-part-text/html-shr (msg part)
+  ;; Make sure shr is loaded before we start let-binding its globals
+  (require 'shr)
+  (let ((dom (let (process-crypto notmuch-show-process-crypto)
+	       (with-temp-buffer
+		 (insert (notmuch-get-bodypart-text msg part process-crypto))
+		 (libxml-parse-html-region (point-min) (point-max)))))
+	(shr-content-function
+	 (lambda (url)
+	   ;; shr strips the "cid:" part of URL, but doesn't
+	   ;; URL-decode it (see RFC 2392).
+	   (let ((cid (url-unhex-string url)))
+	     (first (notmuch-show--get-cid-content cid)))))
+	;; Block all external images to prevent privacy leaks and
+	;; potential attacks.  FIXME: If we block an image, offer a
+	;; button to load external images.
+	(shr-blocked-images "."))
+    (shr-insert-document dom)
+    t))
 
 (defun notmuch-show-insert-part-*/* (msg part content-type nth depth button)
   ;; This handler _must_ succeed - it is the handler of last resort.
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/11] Improve charset and cid: handling
  2014-04-21 18:37 [PATCH 00/11] Improve charset and cid: handling Austin Clements
                   ` (10 preceding siblings ...)
  2014-04-21 18:37 ` [PATCH 11/11] emacs: Support cid: references with shr renderer Austin Clements
@ 2014-04-21 20:26 ` Tomi Ollila
  2014-04-26 10:50 ` Mark Walters
  12 siblings, 0 replies; 32+ messages in thread
From: Tomi Ollila @ 2014-04-21 20:26 UTC (permalink / raw)
  To: Austin Clements, notmuch

On Mon, Apr 21 2014, Austin Clements <amdragon@MIT.EDU> wrote:

> I set out to quickly add support for cid: links in the shr renderer
> and wound up making our charset handling more robust and rewriting our
> content-ID handling.  The test introduced in patch 2 passes in all but
> one really obscure case, but only because of many unwritten and
> potentially fragile assumptions that Emacs and the CLI make about each
> other.
>
> The first three patches could reasonably go in to 0.18.  The rest of
> this series is certainly post-0.18, but I didn't want to lose track of
> it.

Patches 1-3 looks good and tests pass. I've also (suffessfully) used
this 'no-conversion (although I don't remember why ;/ -- this conversion
stuff is not something simple to comprehend...)

So, +1 for getting 1-3 to 0.18.

Tomi


>
> This series comes in three stages.  Each depends on the earlier ones,
> but each prefix makes sense on its own and could be pushed without the
> later stages.
>
> Patch 1 is a simple clean up patch.
>
> Patches 2 through 7 robust-ify our charset handling in Emacs, mostly
> by splitting the broken `notmuch-get-bodypart-content' API into
> `notmuch-get-bodypart-binary' and `notmuch-get-bodypart-text' so a
> caller can explicitly convey their requirements.
>
> The remaining patches improve our content-ID handling and add support
> for cid: links for shr.
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/11] emacs: Support caching in notmuch-get-bodypart-{binary, text}
  2014-04-21 18:37 ` [PATCH 08/11] emacs: Support caching in notmuch-get-bodypart-{binary, text} Austin Clements
@ 2014-04-24 10:46   ` Mark Walters
  2014-04-24 18:12     ` Austin Clements
  0 siblings, 1 reply; 32+ messages in thread
From: Mark Walters @ 2014-04-24 10:46 UTC (permalink / raw)
  To: Austin Clements, notmuch


On Mon, 21 Apr 2014, Austin Clements <amdragon@MIT.EDU> wrote:
> (The actual code change here is small, but requires re-indenting
> existing code.)
> ---
>  emacs/notmuch-lib.el | 52 ++++++++++++++++++++++++++++++----------------------
>  1 file changed, 30 insertions(+), 22 deletions(-)
>
> diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
> index fc67b14..fee8512 100644
> --- a/emacs/notmuch-lib.el
> +++ b/emacs/notmuch-lib.el
> @@ -503,33 +503,39 @@ (defun notmuch-parts-filter-by-type (parts type)
>     (lambda (part) (notmuch-match-content-type (plist-get part :content-type) type))
>     parts))
>  
> -(defun notmuch-get-bodypart-binary (msg part process-crypto)
> +(defun notmuch-get-bodypart-binary (msg part process-crypto &optional cache)
>    "Return the unprocessed content of PART in MSG as a unibyte string.
>  
>  This returns the \"raw\" content of the given part after content
>  transfer decoding, but with no further processing (see the
>  discussion of --format=raw in man notmuch-show).  In particular,
>  this does no charset conversion."
> -  (let ((args `("show" "--format=raw"
> -		,(format "--part=%d" (plist-get part :id))
> -		,@(when process-crypto '("--decrypt"))
> -		,(notmuch-id-to-query (plist-get msg :id)))))
> -    (with-temp-buffer
> -      ;; Emacs internally uses a UTF-8-like multibyte string
> -      ;; representation by default (regardless of the coding system,
> -      ;; which only affects how it goes from outside data to this
> -      ;; internal representation).  This *almost* never matters.
> -      ;; Annoyingly, it does matter if we use this data in an image
> -      ;; descriptor, since Emacs will use its internal data buffer
> -      ;; directly and this multibyte representation corrupts binary
> -      ;; image formats.  Since the caller is asking for binary data, a
> -      ;; unibyte string is a more appropriate representation anyway.
> -      (set-buffer-multibyte nil)
> -      (let ((coding-system-for-read 'no-conversion))
> -	(apply #'call-process notmuch-command nil '(t nil) nil args)
> -	(buffer-string)))))
> -
> -(defun notmuch-get-bodypart-text (msg part process-crypto)
> +  (let ((data (plist-get part :binary-content)))
> +    (when (not data)
> +      (let ((args `("show" "--format=raw"
> +		    ,(format "--part=%d" (plist-get part :id))
> +		    ,@(when process-crypto '("--decrypt"))
> +		    ,(notmuch-id-to-query (plist-get msg :id)))))
> +	(with-temp-buffer
> +	  ;; Emacs internally uses a UTF-8-like multibyte string
> +	  ;; representation by default (regardless of the coding
> +	  ;; system, which only affects how it goes from outside data
> +	  ;; to this internal representation).  This *almost* never
> +	  ;; matters.  Annoyingly, it does matter if we use this data
> +	  ;; in an image descriptor, since Emacs will use its internal
> +	  ;; data buffer directly and this multibyte representation
> +	  ;; corrupts binary image formats.  Since the caller is
> +	  ;; asking for binary data, a unibyte string is a more
> +	  ;; appropriate representation anyway.
> +	  (set-buffer-multibyte nil)
> +	  (let ((coding-system-for-read 'no-conversion))
> +	    (apply #'call-process notmuch-command nil '(t nil) nil args)
> +	    (setq data (buffer-string)))))
> +      (when cache
> +	(plist-put part :binary-content data)))
> +    data))

I am a little puzzled by this but that could be lack of familiarity with
elisp. As far as I can see plist-put will sometimes modify the original
plist and sometimes return a new plist. If the latter happens then I
think it works out as if we hadn't cached anything as the part passed to
the function is unmodified. That might not matter in this case (though I
find the lack of determinism disturbing).

Or is something else going on?

Best wishes

Mark



> +
> +(defun notmuch-get-bodypart-text (msg part process-crypto &optional cache)
>    "Return the text content of PART in MSG.
>  
>  This returns the content of the given part as a multibyte Lisp
> @@ -546,7 +552,9 @@ (defun notmuch-get-bodypart-text (msg part process-crypto)
>  	     (npart (apply #'notmuch-call-notmuch-sexp args)))
>  	(setq content (plist-get npart :content))
>  	(when (not content)
> -	  (error "Internal error: No :content from %S" args))))
> +	  (error "Internal error: No :content from %S" args)))
> +      (when cache
> +	(plist-put part :content content)))
>      content))
>  
>  ;; Workaround: The call to `mm-display-part' below triggers a bug in
> -- 
> 1.9.1
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/11] test: New tests for Emacs charset handling
  2014-04-21 18:37 ` [PATCH 02/11] test: New tests for Emacs charset handling Austin Clements
@ 2014-04-24 14:38   ` Mark Walters
  2014-04-24 18:29     ` Austin Clements
  0 siblings, 1 reply; 32+ messages in thread
From: Mark Walters @ 2014-04-24 14:38 UTC (permalink / raw)
  To: Austin Clements, notmuch


On Mon, 21 Apr 2014, Austin Clements <amdragon@MIT.EDU> wrote:
> The test of viewing 8bit messages is known-broken.  The rest pass, but
> for very fragile reasons.  The next several commits will fix the
> known-broken test and make our charset handling robust.

Hi 

On one of my systems one of these (non-broken) tests fails. I am not
sure whether I messed up my emacs/environment when doing stuff remotely
recently so it could just be my system


> ---
>  test/T455-emacs-charsets.sh | 141 ++++++++++++++++++++++++++++++++++++++++++++
>  test/test-lib.el            |   4 +-
>  2 files changed, 144 insertions(+), 1 deletion(-)
>  create mode 100755 test/T455-emacs-charsets.sh
>
> diff --git a/test/T455-emacs-charsets.sh b/test/T455-emacs-charsets.sh
> new file mode 100755
> index 0000000..a42a1d2
> --- /dev/null
> +++ b/test/T455-emacs-charsets.sh
> @@ -0,0 +1,141 @@
> +#!/usr/bin/env bash
> +
> +test_description="emacs notmuch-show charset handling"
> +. ./test-lib.sh
> +
> +
> +UTF8_YEN=$'\xef\xbf\xa5'
> +BIG5_YEN=$'\xa2\x44'
> +
> +# Add four messages with unusual encoding requirements:
> +#
> +# 1) text/plain in quoted-printable big5
> +generate_message \
> +    [id]=test-plain@example.com \
> +    '[content-type]="text/plain; charset=big5"' \
> +    '[content-transfer-encoding]=quoted-printable' \
> +    '[body]="Yen: =A2=44"'
> +
> +# 2) text/plain in 8bit big5
> +generate_message \
> +    [id]=test-plain-8bit@example.com \
> +    '[content-type]="text/plain; charset=big5"' \
> +    '[content-transfer-encoding]=8bit' \
> +    '[body]="Yen: '$BIG5_YEN'"'
> +
> +# 3) text/html in quoted-printable big5
> +generate_message \
> +    [id]=test-html@example.com \
> +    '[content-type]="text/html; charset=big5"' \
> +    '[content-transfer-encoding]=quoted-printable' \
> +    '[body]="<html><body>Yen: =A2=44</body></html>"'
> +
> +# 4) application/octet-stream in quoted-printable of big5 text
> +generate_message \
> +    [id]=test-binary@example.com \
> +    '[content-type]="application/octet-stream"' \
> +    '[content-transfer-encoding]=quoted-printable' \
> +    '[body]="Yen: =A2=44"'
> +
> +notmuch new > /dev/null
> +
> +# Test rendering
> +
> +test_begin_subtest "Text parts are decoded when rendering"
> +test_emacs '(notmuch-show "id:test-plain@example.com")
> +	    (test-visible-output "OUTPUT.raw")'
> +awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT
> +cat <<EOF >EXPECTED
> +Yen: $UTF8_YEN
> +EOF
> +test_expect_equal_file OUTPUT EXPECTED
> +
> +test_begin_subtest "8bit text parts are decoded when rendering"
> +test_emacs '(notmuch-show "id:test-plain-8bit@example.com")
> +	    (test-visible-output "OUTPUT.raw")'
> +awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT
> +cat <<EOF >EXPECTED
> +Yen: $UTF8_YEN
> +EOF
> +test_expect_equal_file OUTPUT EXPECTED
> +
> +test_begin_subtest "HTML parts are decoded when rendering"
> +test_emacs '(notmuch-show "id:test-html@example.com")
> +	    (test-visible-output "OUTPUT.raw")'
> +awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT
> +cat <<EOF >EXPECTED
> +[ text/html ]
> +Yen: $UTF8_YEN
> +EOF
> +test_expect_equal_file OUTPUT EXPECTED

It's this test: I get an extra newline after the UFT8_YEN. 

This is with emacs 23.4.1 on a Debian wheezy(ish) system.

But as said it could be my fault: I think some odd things happened when
I tried to install emacs 24 and 23 simultaneously to help testing
things.

Best wishes

Mark


> +
> +# Test saving
> +
> +test_begin_subtest "Text parts are not decoded when saving"
> +rm -f part
> +test_emacs '(notmuch-show "id:test-plain@example.com")
> +	    (search-forward "Yen")
> +	    (let ((standard-input "\"part\""))
> +	       (notmuch-show-save-part))'
> +cat <<EOF >EXPECTED
> +Yen: $BIG5_YEN
> +EOF
> +test_expect_equal_file part EXPECTED
> +
> +test_begin_subtest "8bit text parts are not decoded when saving"
> +rm -f part
> +test_emacs '(notmuch-show "id:test-plain-8bit@example.com")
> +	    (search-forward "Yen")
> +	    (let ((standard-input "\"part\""))
> +	       (notmuch-show-save-part))'
> +cat <<EOF >EXPECTED
> +Yen: $BIG5_YEN
> +EOF
> +test_expect_equal_file part EXPECTED
> +
> +test_begin_subtest "HTML parts are not decoded when saving"
> +rm -f part
> +test_emacs '(notmuch-show "id:test-html@example.com")
> +	    (search-forward "Yen")
> +	    (let ((standard-input "\"part\""))
> +	       (notmuch-show-save-part))'
> +cat <<EOF >EXPECTED
> +<html><body>Yen: $BIG5_YEN</body></html>
> +EOF
> +test_expect_equal_file part EXPECTED
> +
> +test_begin_subtest "Binary parts are not decoded when saving"
> +rm -f part
> +test_emacs '(notmuch-show "id:test-binary@example.com")
> +	    (search-forward "application/")
> +	    (let ((standard-input "\"part\""))
> +	       (notmuch-show-save-part))'
> +cat <<EOF >EXPECTED
> +Yen: $BIG5_YEN
> +EOF
> +test_expect_equal_file part EXPECTED
> +
> +# Test message viewing
> +
> +test_begin_subtest "Text message are not decoded when viewing"
> +test_emacs '(notmuch-show "id:test-plain@example.com")
> +	    (notmuch-show-view-raw-message)
> +	    (test-visible-output "OUTPUT.raw")'
> +awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT
> +cat <<EOF >EXPECTED
> +Yen: =A2=44
> +EOF
> +test_expect_equal_file OUTPUT EXPECTED
> +
> +test_begin_subtest "8bit text message are not decoded when viewing"
> +test_subtest_known_broken
> +test_emacs '(notmuch-show "id:test-plain-8bit@example.com")
> +	    (notmuch-show-view-raw-message)
> +	    (test-visible-output "OUTPUT.raw")'
> +awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT
> +cat <<EOF >EXPECTED
> +Yen: $BIG5_YEN
> +EOF
> +test_expect_equal_file OUTPUT EXPECTED
> +
> +test_done
> diff --git a/test/test-lib.el b/test/test-lib.el
> index 437f83f..dd3446e 100644
> --- a/test/test-lib.el
> +++ b/test/test-lib.el
> @@ -57,7 +57,9 @@ (defun test-output (&optional filename)
>  (defun test-visible-output (&optional filename)
>    "Save visible text in current buffer to file FILENAME.  Default
>  FILENAME is OUTPUT."
> -  (let ((text (visible-buffer-string)))
> +  (let ((text (visible-buffer-string))
> +	;; Tests expect output in UTF-8 encoding
> +	(coding-system-for-write 'utf-8))
>      (with-temp-file (or filename "OUTPUT") (insert text))))
>  
>  (defun visible-buffer-string ()
> -- 
> 1.9.1
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/11] emacs: Support caching in notmuch-get-bodypart-{binary, text}
  2014-04-24 10:46   ` Mark Walters
@ 2014-04-24 18:12     ` Austin Clements
  2014-04-25  6:42       ` Mark Walters
  0 siblings, 1 reply; 32+ messages in thread
From: Austin Clements @ 2014-04-24 18:12 UTC (permalink / raw)
  To: Mark Walters; +Cc: notmuch

Quoth Mark Walters on Apr 24 at 11:46 am:
> 
> On Mon, 21 Apr 2014, Austin Clements <amdragon@MIT.EDU> wrote:
> > (The actual code change here is small, but requires re-indenting
> > existing code.)
> > ---
> >  emacs/notmuch-lib.el | 52 ++++++++++++++++++++++++++++++----------------------
> >  1 file changed, 30 insertions(+), 22 deletions(-)
> >
> > diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
> > index fc67b14..fee8512 100644
> > --- a/emacs/notmuch-lib.el
> > +++ b/emacs/notmuch-lib.el
> > @@ -503,33 +503,39 @@ (defun notmuch-parts-filter-by-type (parts type)
> >     (lambda (part) (notmuch-match-content-type (plist-get part :content-type) type))
> >     parts))
> >  
> > -(defun notmuch-get-bodypart-binary (msg part process-crypto)
> > +(defun notmuch-get-bodypart-binary (msg part process-crypto &optional cache)
> >    "Return the unprocessed content of PART in MSG as a unibyte string.
> >  
> >  This returns the \"raw\" content of the given part after content
> >  transfer decoding, but with no further processing (see the
> >  discussion of --format=raw in man notmuch-show).  In particular,
> >  this does no charset conversion."
> > -  (let ((args `("show" "--format=raw"
> > -		,(format "--part=%d" (plist-get part :id))
> > -		,@(when process-crypto '("--decrypt"))
> > -		,(notmuch-id-to-query (plist-get msg :id)))))
> > -    (with-temp-buffer
> > -      ;; Emacs internally uses a UTF-8-like multibyte string
> > -      ;; representation by default (regardless of the coding system,
> > -      ;; which only affects how it goes from outside data to this
> > -      ;; internal representation).  This *almost* never matters.
> > -      ;; Annoyingly, it does matter if we use this data in an image
> > -      ;; descriptor, since Emacs will use its internal data buffer
> > -      ;; directly and this multibyte representation corrupts binary
> > -      ;; image formats.  Since the caller is asking for binary data, a
> > -      ;; unibyte string is a more appropriate representation anyway.
> > -      (set-buffer-multibyte nil)
> > -      (let ((coding-system-for-read 'no-conversion))
> > -	(apply #'call-process notmuch-command nil '(t nil) nil args)
> > -	(buffer-string)))))
> > -
> > -(defun notmuch-get-bodypart-text (msg part process-crypto)
> > +  (let ((data (plist-get part :binary-content)))
> > +    (when (not data)
> > +      (let ((args `("show" "--format=raw"
> > +		    ,(format "--part=%d" (plist-get part :id))
> > +		    ,@(when process-crypto '("--decrypt"))
> > +		    ,(notmuch-id-to-query (plist-get msg :id)))))
> > +	(with-temp-buffer
> > +	  ;; Emacs internally uses a UTF-8-like multibyte string
> > +	  ;; representation by default (regardless of the coding
> > +	  ;; system, which only affects how it goes from outside data
> > +	  ;; to this internal representation).  This *almost* never
> > +	  ;; matters.  Annoyingly, it does matter if we use this data
> > +	  ;; in an image descriptor, since Emacs will use its internal
> > +	  ;; data buffer directly and this multibyte representation
> > +	  ;; corrupts binary image formats.  Since the caller is
> > +	  ;; asking for binary data, a unibyte string is a more
> > +	  ;; appropriate representation anyway.
> > +	  (set-buffer-multibyte nil)
> > +	  (let ((coding-system-for-read 'no-conversion))
> > +	    (apply #'call-process notmuch-command nil '(t nil) nil args)
> > +	    (setq data (buffer-string)))))
> > +      (when cache
> > +	(plist-put part :binary-content data)))
> > +    data))
> 
> I am a little puzzled by this but that could be lack of familiarity with
> elisp. As far as I can see plist-put will sometimes modify the original
> plist and sometimes return a new plist. If the latter happens then I
> think it works out as if we hadn't cached anything as the part passed to
> the function is unmodified. That might not matter in this case (though I
> find the lack of determinism disturbing).
> 
> Or is something else going on?

No, your familiarity with Elisp serves you well.  I'm completely
cheating here.  According to the specification of plist-put, it's
allowed to return a new list but in reality this only happens when the
original plist is nil.  We lean on this already all over the
notmuch-emacs code, but maybe that doesn't excuse me adding one more
cheat.

I could add a comment here explaining what's going on, I could
manually do the list insertion in a way that's guaranteed to mutate it
in place, or I could add a nil :binary-content property when parts are
created (since plist-put is guaranteed to mutate existing keys in
place).

> Best wishes
> 
> Mark
> 
> 
> 
> > +
> > +(defun notmuch-get-bodypart-text (msg part process-crypto &optional cache)
> >    "Return the text content of PART in MSG.
> >  
> >  This returns the content of the given part as a multibyte Lisp
> > @@ -546,7 +552,9 @@ (defun notmuch-get-bodypart-text (msg part process-crypto)
> >  	     (npart (apply #'notmuch-call-notmuch-sexp args)))
> >  	(setq content (plist-get npart :content))
> >  	(when (not content)
> > -	  (error "Internal error: No :content from %S" args))))
> > +	  (error "Internal error: No :content from %S" args)))
> > +      (when cache
> > +	(plist-put part :content content)))
> >      content))
> >  
> >  ;; Workaround: The call to `mm-display-part' below triggers a bug in

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/11] test: New tests for Emacs charset handling
  2014-04-24 14:38   ` Mark Walters
@ 2014-04-24 18:29     ` Austin Clements
  2014-04-25  6:18       ` Mark Walters
  2014-04-25 13:57       ` Tomi Ollila
  0 siblings, 2 replies; 32+ messages in thread
From: Austin Clements @ 2014-04-24 18:29 UTC (permalink / raw)
  To: Mark Walters; +Cc: notmuch

Quoth Mark Walters on Apr 24 at  3:38 pm:
> 
> On Mon, 21 Apr 2014, Austin Clements <amdragon@MIT.EDU> wrote:
> > The test of viewing 8bit messages is known-broken.  The rest pass, but
> > for very fragile reasons.  The next several commits will fix the
> > known-broken test and make our charset handling robust.
> 
> Hi 
> 
> On one of my systems one of these (non-broken) tests fails. I am not
> sure whether I messed up my emacs/environment when doing stuff remotely
> recently so it could just be my system
> 
> 
> > ---
> >  test/T455-emacs-charsets.sh | 141 ++++++++++++++++++++++++++++++++++++++++++++
> >  test/test-lib.el            |   4 +-
> >  2 files changed, 144 insertions(+), 1 deletion(-)
> >  create mode 100755 test/T455-emacs-charsets.sh
> >
> > diff --git a/test/T455-emacs-charsets.sh b/test/T455-emacs-charsets.sh
> > new file mode 100755
> > index 0000000..a42a1d2
> > --- /dev/null
> > +++ b/test/T455-emacs-charsets.sh
> > @@ -0,0 +1,141 @@
> > +#!/usr/bin/env bash
> > +
> > +test_description="emacs notmuch-show charset handling"
> > +. ./test-lib.sh
> > +
> > +
> > +UTF8_YEN=$'\xef\xbf\xa5'
> > +BIG5_YEN=$'\xa2\x44'
> > +
> > +# Add four messages with unusual encoding requirements:
> > +#
> > +# 1) text/plain in quoted-printable big5
> > +generate_message \
> > +    [id]=test-plain@example.com \
> > +    '[content-type]="text/plain; charset=big5"' \
> > +    '[content-transfer-encoding]=quoted-printable' \
> > +    '[body]="Yen: =A2=44"'
> > +
> > +# 2) text/plain in 8bit big5
> > +generate_message \
> > +    [id]=test-plain-8bit@example.com \
> > +    '[content-type]="text/plain; charset=big5"' \
> > +    '[content-transfer-encoding]=8bit' \
> > +    '[body]="Yen: '$BIG5_YEN'"'
> > +
> > +# 3) text/html in quoted-printable big5
> > +generate_message \
> > +    [id]=test-html@example.com \
> > +    '[content-type]="text/html; charset=big5"' \
> > +    '[content-transfer-encoding]=quoted-printable' \
> > +    '[body]="<html><body>Yen: =A2=44</body></html>"'
> > +
> > +# 4) application/octet-stream in quoted-printable of big5 text
> > +generate_message \
> > +    [id]=test-binary@example.com \
> > +    '[content-type]="application/octet-stream"' \
> > +    '[content-transfer-encoding]=quoted-printable' \
> > +    '[body]="Yen: =A2=44"'
> > +
> > +notmuch new > /dev/null
> > +
> > +# Test rendering
> > +
> > +test_begin_subtest "Text parts are decoded when rendering"
> > +test_emacs '(notmuch-show "id:test-plain@example.com")
> > +	    (test-visible-output "OUTPUT.raw")'
> > +awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT
> > +cat <<EOF >EXPECTED
> > +Yen: $UTF8_YEN
> > +EOF
> > +test_expect_equal_file OUTPUT EXPECTED
> > +
> > +test_begin_subtest "8bit text parts are decoded when rendering"
> > +test_emacs '(notmuch-show "id:test-plain-8bit@example.com")
> > +	    (test-visible-output "OUTPUT.raw")'
> > +awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT
> > +cat <<EOF >EXPECTED
> > +Yen: $UTF8_YEN
> > +EOF
> > +test_expect_equal_file OUTPUT EXPECTED
> > +
> > +test_begin_subtest "HTML parts are decoded when rendering"
> > +test_emacs '(notmuch-show "id:test-html@example.com")
> > +	    (test-visible-output "OUTPUT.raw")'
> > +awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT
> > +cat <<EOF >EXPECTED
> > +[ text/html ]
> > +Yen: $UTF8_YEN
> > +EOF
> > +test_expect_equal_file OUTPUT EXPECTED
> 
> It's this test: I get an extra newline after the UFT8_YEN. 
> 
> This is with emacs 23.4.1 on a Debian wheezy(ish) system.
> 
> But as said it could be my fault: I think some odd things happened when
> I tried to install emacs 24 and 23 simultaneously to help testing
> things.

I can reproduce this.  Tests that involve HTML rendering are always
finicky like this since there are so many different HTML renderers.
Maybe I could normalize the spacing?  sed '/^$/d;s/  */ /g' or so?

> Best wishes
> 
> Mark
> 
> 
> > +
> > +# Test saving
> > +
> > +test_begin_subtest "Text parts are not decoded when saving"
> > +rm -f part
> > +test_emacs '(notmuch-show "id:test-plain@example.com")
> > +	    (search-forward "Yen")
> > +	    (let ((standard-input "\"part\""))
> > +	       (notmuch-show-save-part))'
> > +cat <<EOF >EXPECTED
> > +Yen: $BIG5_YEN
> > +EOF
> > +test_expect_equal_file part EXPECTED
> > +
> > +test_begin_subtest "8bit text parts are not decoded when saving"
> > +rm -f part
> > +test_emacs '(notmuch-show "id:test-plain-8bit@example.com")
> > +	    (search-forward "Yen")
> > +	    (let ((standard-input "\"part\""))
> > +	       (notmuch-show-save-part))'
> > +cat <<EOF >EXPECTED
> > +Yen: $BIG5_YEN
> > +EOF
> > +test_expect_equal_file part EXPECTED
> > +
> > +test_begin_subtest "HTML parts are not decoded when saving"
> > +rm -f part
> > +test_emacs '(notmuch-show "id:test-html@example.com")
> > +	    (search-forward "Yen")
> > +	    (let ((standard-input "\"part\""))
> > +	       (notmuch-show-save-part))'
> > +cat <<EOF >EXPECTED
> > +<html><body>Yen: $BIG5_YEN</body></html>
> > +EOF
> > +test_expect_equal_file part EXPECTED
> > +
> > +test_begin_subtest "Binary parts are not decoded when saving"
> > +rm -f part
> > +test_emacs '(notmuch-show "id:test-binary@example.com")
> > +	    (search-forward "application/")
> > +	    (let ((standard-input "\"part\""))
> > +	       (notmuch-show-save-part))'
> > +cat <<EOF >EXPECTED
> > +Yen: $BIG5_YEN
> > +EOF
> > +test_expect_equal_file part EXPECTED
> > +
> > +# Test message viewing
> > +
> > +test_begin_subtest "Text message are not decoded when viewing"
> > +test_emacs '(notmuch-show "id:test-plain@example.com")
> > +	    (notmuch-show-view-raw-message)
> > +	    (test-visible-output "OUTPUT.raw")'
> > +awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT
> > +cat <<EOF >EXPECTED
> > +Yen: =A2=44
> > +EOF
> > +test_expect_equal_file OUTPUT EXPECTED
> > +
> > +test_begin_subtest "8bit text message are not decoded when viewing"
> > +test_subtest_known_broken
> > +test_emacs '(notmuch-show "id:test-plain-8bit@example.com")
> > +	    (notmuch-show-view-raw-message)
> > +	    (test-visible-output "OUTPUT.raw")'
> > +awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT
> > +cat <<EOF >EXPECTED
> > +Yen: $BIG5_YEN
> > +EOF
> > +test_expect_equal_file OUTPUT EXPECTED
> > +
> > +test_done
> > diff --git a/test/test-lib.el b/test/test-lib.el
> > index 437f83f..dd3446e 100644
> > --- a/test/test-lib.el
> > +++ b/test/test-lib.el
> > @@ -57,7 +57,9 @@ (defun test-output (&optional filename)
> >  (defun test-visible-output (&optional filename)
> >    "Save visible text in current buffer to file FILENAME.  Default
> >  FILENAME is OUTPUT."
> > -  (let ((text (visible-buffer-string)))
> > +  (let ((text (visible-buffer-string))
> > +	;; Tests expect output in UTF-8 encoding
> > +	(coding-system-for-write 'utf-8))
> >      (with-temp-file (or filename "OUTPUT") (insert text))))
> >  
> >  (defun visible-buffer-string ()

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/11] test: New tests for Emacs charset handling
  2014-04-24 18:29     ` Austin Clements
@ 2014-04-25  6:18       ` Mark Walters
  2014-04-25 13:57       ` Tomi Ollila
  1 sibling, 0 replies; 32+ messages in thread
From: Mark Walters @ 2014-04-25  6:18 UTC (permalink / raw)
  To: Austin Clements; +Cc: notmuch

On Thu, 24 Apr 2014, Austin Clements <amdragon@MIT.EDU> wrote:
> Quoth Mark Walters on Apr 24 at  3:38 pm:
>> 
>> On Mon, 21 Apr 2014, Austin Clements <amdragon@MIT.EDU> wrote:
>> > The test of viewing 8bit messages is known-broken.  The rest pass, but
>> > for very fragile reasons.  The next several commits will fix the
>> > known-broken test and make our charset handling robust.
>> 
>> Hi 
>> 
>> On one of my systems one of these (non-broken) tests fails. I am not
>> sure whether I messed up my emacs/environment when doing stuff remotely
>> recently so it could just be my system
>> 
>> 
>> > ---
>> >  test/T455-emacs-charsets.sh | 141 ++++++++++++++++++++++++++++++++++++++++++++
>> >  test/test-lib.el            |   4 +-
>> >  2 files changed, 144 insertions(+), 1 deletion(-)
>> >  create mode 100755 test/T455-emacs-charsets.sh
>> >
>> > diff --git a/test/T455-emacs-charsets.sh b/test/T455-emacs-charsets.sh
>> > new file mode 100755
>> > index 0000000..a42a1d2
>> > --- /dev/null
>> > +++ b/test/T455-emacs-charsets.sh
>> > @@ -0,0 +1,141 @@
>> > +#!/usr/bin/env bash
>> > +
>> > +test_description="emacs notmuch-show charset handling"
>> > +. ./test-lib.sh
>> > +
>> > +
>> > +UTF8_YEN=$'\xef\xbf\xa5'
>> > +BIG5_YEN=$'\xa2\x44'
>> > +
>> > +# Add four messages with unusual encoding requirements:
>> > +#
>> > +# 1) text/plain in quoted-printable big5
>> > +generate_message \
>> > +    [id]=test-plain@example.com \
>> > +    '[content-type]="text/plain; charset=big5"' \
>> > +    '[content-transfer-encoding]=quoted-printable' \
>> > +    '[body]="Yen: =A2=44"'
>> > +
>> > +# 2) text/plain in 8bit big5
>> > +generate_message \
>> > +    [id]=test-plain-8bit@example.com \
>> > +    '[content-type]="text/plain; charset=big5"' \
>> > +    '[content-transfer-encoding]=8bit' \
>> > +    '[body]="Yen: '$BIG5_YEN'"'
>> > +
>> > +# 3) text/html in quoted-printable big5
>> > +generate_message \
>> > +    [id]=test-html@example.com \
>> > +    '[content-type]="text/html; charset=big5"' \
>> > +    '[content-transfer-encoding]=quoted-printable' \
>> > +    '[body]="<html><body>Yen: =A2=44</body></html>"'
>> > +
>> > +# 4) application/octet-stream in quoted-printable of big5 text
>> > +generate_message \
>> > +    [id]=test-binary@example.com \
>> > +    '[content-type]="application/octet-stream"' \
>> > +    '[content-transfer-encoding]=quoted-printable' \
>> > +    '[body]="Yen: =A2=44"'
>> > +
>> > +notmuch new > /dev/null
>> > +
>> > +# Test rendering
>> > +
>> > +test_begin_subtest "Text parts are decoded when rendering"
>> > +test_emacs '(notmuch-show "id:test-plain@example.com")
>> > +	    (test-visible-output "OUTPUT.raw")'
>> > +awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT
>> > +cat <<EOF >EXPECTED
>> > +Yen: $UTF8_YEN
>> > +EOF
>> > +test_expect_equal_file OUTPUT EXPECTED
>> > +
>> > +test_begin_subtest "8bit text parts are decoded when rendering"
>> > +test_emacs '(notmuch-show "id:test-plain-8bit@example.com")
>> > +	    (test-visible-output "OUTPUT.raw")'
>> > +awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT
>> > +cat <<EOF >EXPECTED
>> > +Yen: $UTF8_YEN
>> > +EOF
>> > +test_expect_equal_file OUTPUT EXPECTED
>> > +
>> > +test_begin_subtest "HTML parts are decoded when rendering"
>> > +test_emacs '(notmuch-show "id:test-html@example.com")
>> > +	    (test-visible-output "OUTPUT.raw")'
>> > +awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT
>> > +cat <<EOF >EXPECTED
>> > +[ text/html ]
>> > +Yen: $UTF8_YEN
>> > +EOF
>> > +test_expect_equal_file OUTPUT EXPECTED
>> 
>> It's this test: I get an extra newline after the UFT8_YEN. 
>> 
>> This is with emacs 23.4.1 on a Debian wheezy(ish) system.
>> 
>> But as said it could be my fault: I think some odd things happened when
>> I tried to install emacs 24 and 23 simultaneously to help testing
>> things.
>
> I can reproduce this.  Tests that involve HTML rendering are always
> finicky like this since there are so many different HTML renderers.
> Maybe I could normalize the spacing?  sed '/^$/d;s/  */ /g' or so?

That sounds plausible. What output would you get if the test genuinely
fails? I guess the key thing is to try and make sure the test doesn't
pass in that case.

Best wishes

Mark

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/11] emacs: Support caching in notmuch-get-bodypart-{binary, text}
  2014-04-24 18:12     ` Austin Clements
@ 2014-04-25  6:42       ` Mark Walters
  2015-01-24 18:06         ` Austin Clements
  0 siblings, 1 reply; 32+ messages in thread
From: Mark Walters @ 2014-04-25  6:42 UTC (permalink / raw)
  To: Austin Clements; +Cc: notmuch


On Thu, 24 Apr 2014, Austin Clements <amdragon@MIT.EDU> wrote:
> Quoth Mark Walters on Apr 24 at 11:46 am:
>> 
>> On Mon, 21 Apr 2014, Austin Clements <amdragon@MIT.EDU> wrote:
>> > (The actual code change here is small, but requires re-indenting
>> > existing code.)
>> > ---
>> >  emacs/notmuch-lib.el | 52 ++++++++++++++++++++++++++++++----------------------
>> >  1 file changed, 30 insertions(+), 22 deletions(-)
>> >
>> > diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
>> > index fc67b14..fee8512 100644
>> > --- a/emacs/notmuch-lib.el
>> > +++ b/emacs/notmuch-lib.el
>> > @@ -503,33 +503,39 @@ (defun notmuch-parts-filter-by-type (parts type)
>> >     (lambda (part) (notmuch-match-content-type (plist-get part :content-type) type))
>> >     parts))
>> >  
>> > -(defun notmuch-get-bodypart-binary (msg part process-crypto)
>> > +(defun notmuch-get-bodypart-binary (msg part process-crypto &optional cache)
>> >    "Return the unprocessed content of PART in MSG as a unibyte string.
>> >  
>> >  This returns the \"raw\" content of the given part after content
>> >  transfer decoding, but with no further processing (see the
>> >  discussion of --format=raw in man notmuch-show).  In particular,
>> >  this does no charset conversion."
>> > -  (let ((args `("show" "--format=raw"
>> > -		,(format "--part=%d" (plist-get part :id))
>> > -		,@(when process-crypto '("--decrypt"))
>> > -		,(notmuch-id-to-query (plist-get msg :id)))))
>> > -    (with-temp-buffer
>> > -      ;; Emacs internally uses a UTF-8-like multibyte string
>> > -      ;; representation by default (regardless of the coding system,
>> > -      ;; which only affects how it goes from outside data to this
>> > -      ;; internal representation).  This *almost* never matters.
>> > -      ;; Annoyingly, it does matter if we use this data in an image
>> > -      ;; descriptor, since Emacs will use its internal data buffer
>> > -      ;; directly and this multibyte representation corrupts binary
>> > -      ;; image formats.  Since the caller is asking for binary data, a
>> > -      ;; unibyte string is a more appropriate representation anyway.
>> > -      (set-buffer-multibyte nil)
>> > -      (let ((coding-system-for-read 'no-conversion))
>> > -	(apply #'call-process notmuch-command nil '(t nil) nil args)
>> > -	(buffer-string)))))
>> > -
>> > -(defun notmuch-get-bodypart-text (msg part process-crypto)
>> > +  (let ((data (plist-get part :binary-content)))
>> > +    (when (not data)
>> > +      (let ((args `("show" "--format=raw"
>> > +		    ,(format "--part=%d" (plist-get part :id))
>> > +		    ,@(when process-crypto '("--decrypt"))
>> > +		    ,(notmuch-id-to-query (plist-get msg :id)))))
>> > +	(with-temp-buffer
>> > +	  ;; Emacs internally uses a UTF-8-like multibyte string
>> > +	  ;; representation by default (regardless of the coding
>> > +	  ;; system, which only affects how it goes from outside data
>> > +	  ;; to this internal representation).  This *almost* never
>> > +	  ;; matters.  Annoyingly, it does matter if we use this data
>> > +	  ;; in an image descriptor, since Emacs will use its internal
>> > +	  ;; data buffer directly and this multibyte representation
>> > +	  ;; corrupts binary image formats.  Since the caller is
>> > +	  ;; asking for binary data, a unibyte string is a more
>> > +	  ;; appropriate representation anyway.
>> > +	  (set-buffer-multibyte nil)
>> > +	  (let ((coding-system-for-read 'no-conversion))
>> > +	    (apply #'call-process notmuch-command nil '(t nil) nil args)
>> > +	    (setq data (buffer-string)))))
>> > +      (when cache
>> > +	(plist-put part :binary-content data)))
>> > +    data))
>> 
>> I am a little puzzled by this but that could be lack of familiarity with
>> elisp. As far as I can see plist-put will sometimes modify the original
>> plist and sometimes return a new plist. If the latter happens then I
>> think it works out as if we hadn't cached anything as the part passed to
>> the function is unmodified. That might not matter in this case (though I
>> find the lack of determinism disturbing).
>> 
>> Or is something else going on?
>
> No, your familiarity with Elisp serves you well.  I'm completely
> cheating here.  According to the specification of plist-put, it's
> allowed to return a new list but in reality this only happens when the
> original plist is nil.  We lean on this already all over the
> notmuch-emacs code, but maybe that doesn't excuse me adding one more
> cheat.
>
> I could add a comment here explaining what's going on, I could
> manually do the list insertion in a way that's guaranteed to mutate it
> in place, or I could add a nil :binary-content property when parts are
> created (since plist-put is guaranteed to mutate existing keys in
> place).

I think a comment is fine. 

(Incidentally what is the best way of telling if emacs has changed an
object or returned a new one for other commands? Something like (setq
oldobject object) (setq object (operation-on object)) (if (eq object
oldobject) ... ))

Also, I think the function should have a comment about the lifetime of
the caching. I think in some cases the addition of :binary-content could
occur on load and thus the plist with binary content added would get
saved in the buffer when the msg plist was saved as a
text-property. However, maybe in other cases this gets called after the
initial insertion and thus the cached value is just used during this
operation on msg? Sorry that is a little incoherent as I haven't checked
all callers.

Best wishes

Mark



>> Best wishes
>> 
>> Mark
>> 
>> 
>> 
>> > +
>> > +(defun notmuch-get-bodypart-text (msg part process-crypto &optional cache)
>> >    "Return the text content of PART in MSG.
>> >  
>> >  This returns the content of the given part as a multibyte Lisp
>> > @@ -546,7 +552,9 @@ (defun notmuch-get-bodypart-text (msg part process-crypto)
>> >  	     (npart (apply #'notmuch-call-notmuch-sexp args)))
>> >  	(setq content (plist-get npart :content))
>> >  	(when (not content)
>> > -	  (error "Internal error: No :content from %S" args))))
>> > +	  (error "Internal error: No :content from %S" args)))
>> > +      (when cache
>> > +	(plist-put part :content content)))
>> >      content))
>> >  
>> >  ;; Workaround: The call to `mm-display-part' below triggers a bug in

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/11] test: New tests for Emacs charset handling
  2014-04-24 18:29     ` Austin Clements
  2014-04-25  6:18       ` Mark Walters
@ 2014-04-25 13:57       ` Tomi Ollila
  1 sibling, 0 replies; 32+ messages in thread
From: Tomi Ollila @ 2014-04-25 13:57 UTC (permalink / raw)
  To: Austin Clements, Mark Walters; +Cc: notmuch

On Thu, Apr 24 2014, Austin Clements <amdragon@MIT.EDU> wrote:

> Quoth Mark Walters on Apr 24 at  3:38 pm:
>> 
>> On Mon, 21 Apr 2014, Austin Clements <amdragon@MIT.EDU> wrote:
>> > The test of viewing 8bit messages is known-broken.  The rest pass, but
>> > for very fragile reasons.  The next several commits will fix the
>> > known-broken test and make our charset handling robust.
>> 
>> Hi 
>> 
>> On one of my systems one of these (non-broken) tests fails. I am not
>> sure whether I messed up my emacs/environment when doing stuff remotely
>> recently so it could just be my system
>> 
>> 
>> > ---
>> >  test/T455-emacs-charsets.sh | 141 ++++++++++++++++++++++++++++++++++++++++++++
>> >  test/test-lib.el            |   4 +-
>> >  2 files changed, 144 insertions(+), 1 deletion(-)
>> >  create mode 100755 test/T455-emacs-charsets.sh
>> >
>> > diff --git a/test/T455-emacs-charsets.sh b/test/T455-emacs-charsets.sh
>> > new file mode 100755
>> > index 0000000..a42a1d2
>> > --- /dev/null
>> > +++ b/test/T455-emacs-charsets.sh
>> > @@ -0,0 +1,141 @@
>> > +#!/usr/bin/env bash
>> > +
>> > +test_description="emacs notmuch-show charset handling"
>> > +. ./test-lib.sh
>> > +
>> > +
>> > +UTF8_YEN=$'\xef\xbf\xa5'
>> > +BIG5_YEN=$'\xa2\x44'
>> > +
>> > +# Add four messages with unusual encoding requirements:
>> > +#
>> > +# 1) text/plain in quoted-printable big5
>> > +generate_message \
>> > +    [id]=test-plain@example.com \
>> > +    '[content-type]="text/plain; charset=big5"' \
>> > +    '[content-transfer-encoding]=quoted-printable' \
>> > +    '[body]="Yen: =A2=44"'
>> > +
>> > +# 2) text/plain in 8bit big5
>> > +generate_message \
>> > +    [id]=test-plain-8bit@example.com \
>> > +    '[content-type]="text/plain; charset=big5"' \
>> > +    '[content-transfer-encoding]=8bit' \
>> > +    '[body]="Yen: '$BIG5_YEN'"'
>> > +
>> > +# 3) text/html in quoted-printable big5
>> > +generate_message \
>> > +    [id]=test-html@example.com \
>> > +    '[content-type]="text/html; charset=big5"' \
>> > +    '[content-transfer-encoding]=quoted-printable' \
>> > +    '[body]="<html><body>Yen: =A2=44</body></html>"'
>> > +
>> > +# 4) application/octet-stream in quoted-printable of big5 text
>> > +generate_message \
>> > +    [id]=test-binary@example.com \
>> > +    '[content-type]="application/octet-stream"' \
>> > +    '[content-transfer-encoding]=quoted-printable' \
>> > +    '[body]="Yen: =A2=44"'
>> > +
>> > +notmuch new > /dev/null
>> > +
>> > +# Test rendering
>> > +
>> > +test_begin_subtest "Text parts are decoded when rendering"
>> > +test_emacs '(notmuch-show "id:test-plain@example.com")
>> > +	    (test-visible-output "OUTPUT.raw")'
>> > +awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT
>> > +cat <<EOF >EXPECTED
>> > +Yen: $UTF8_YEN
>> > +EOF
>> > +test_expect_equal_file OUTPUT EXPECTED
>> > +
>> > +test_begin_subtest "8bit text parts are decoded when rendering"
>> > +test_emacs '(notmuch-show "id:test-plain-8bit@example.com")
>> > +	    (test-visible-output "OUTPUT.raw")'
>> > +awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT
>> > +cat <<EOF >EXPECTED
>> > +Yen: $UTF8_YEN
>> > +EOF
>> > +test_expect_equal_file OUTPUT EXPECTED
>> > +
>> > +test_begin_subtest "HTML parts are decoded when rendering"
>> > +test_emacs '(notmuch-show "id:test-html@example.com")
>> > +	    (test-visible-output "OUTPUT.raw")'
>> > +awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT
>> > +cat <<EOF >EXPECTED
>> > +[ text/html ]
>> > +Yen: $UTF8_YEN
>> > +EOF
>> > +test_expect_equal_file OUTPUT EXPECTED
>> 
>> It's this test: I get an extra newline after the UFT8_YEN. 
>> 
>> This is with emacs 23.4.1 on a Debian wheezy(ish) system.
>> 
>> But as said it could be my fault: I think some odd things happened when
>> I tried to install emacs 24 and 23 simultaneously to help testing
>> things.
>
> I can reproduce this.  Tests that involve HTML rendering are always
> finicky like this since there are so many different HTML renderers.
> Maybe I could normalize the spacing?  sed '/^$/d;s/  */ /g' or so?


IIRC something was done there to unify renderers...

id:1336316175-17852-1-git-send-email-awg+notmuch@xvx.ca

... that (or it's alternative(*) ;) could be made global

Tomi

(*) id:m2fwbc4irf.fsf@guru.guru-group.fi


>
>> Best wishes
>> 
>> Mark
>> 
>> 
>> > +
>> > +# Test saving
>> > +
>> > +test_begin_subtest "Text parts are not decoded when saving"
>> > +rm -f part
>> > +test_emacs '(notmuch-show "id:test-plain@example.com")
>> > +	    (search-forward "Yen")
>> > +	    (let ((standard-input "\"part\""))
>> > +	       (notmuch-show-save-part))'
>> > +cat <<EOF >EXPECTED
>> > +Yen: $BIG5_YEN
>> > +EOF
>> > +test_expect_equal_file part EXPECTED
>> > +
>> > +test_begin_subtest "8bit text parts are not decoded when saving"
>> > +rm -f part
>> > +test_emacs '(notmuch-show "id:test-plain-8bit@example.com")
>> > +	    (search-forward "Yen")
>> > +	    (let ((standard-input "\"part\""))
>> > +	       (notmuch-show-save-part))'
>> > +cat <<EOF >EXPECTED
>> > +Yen: $BIG5_YEN
>> > +EOF
>> > +test_expect_equal_file part EXPECTED
>> > +
>> > +test_begin_subtest "HTML parts are not decoded when saving"
>> > +rm -f part
>> > +test_emacs '(notmuch-show "id:test-html@example.com")
>> > +	    (search-forward "Yen")
>> > +	    (let ((standard-input "\"part\""))
>> > +	       (notmuch-show-save-part))'
>> > +cat <<EOF >EXPECTED
>> > +<html><body>Yen: $BIG5_YEN</body></html>
>> > +EOF
>> > +test_expect_equal_file part EXPECTED
>> > +
>> > +test_begin_subtest "Binary parts are not decoded when saving"
>> > +rm -f part
>> > +test_emacs '(notmuch-show "id:test-binary@example.com")
>> > +	    (search-forward "application/")
>> > +	    (let ((standard-input "\"part\""))
>> > +	       (notmuch-show-save-part))'
>> > +cat <<EOF >EXPECTED
>> > +Yen: $BIG5_YEN
>> > +EOF
>> > +test_expect_equal_file part EXPECTED
>> > +
>> > +# Test message viewing
>> > +
>> > +test_begin_subtest "Text message are not decoded when viewing"
>> > +test_emacs '(notmuch-show "id:test-plain@example.com")
>> > +	    (notmuch-show-view-raw-message)
>> > +	    (test-visible-output "OUTPUT.raw")'
>> > +awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT
>> > +cat <<EOF >EXPECTED
>> > +Yen: =A2=44
>> > +EOF
>> > +test_expect_equal_file OUTPUT EXPECTED
>> > +
>> > +test_begin_subtest "8bit text message are not decoded when viewing"
>> > +test_subtest_known_broken
>> > +test_emacs '(notmuch-show "id:test-plain-8bit@example.com")
>> > +	    (notmuch-show-view-raw-message)
>> > +	    (test-visible-output "OUTPUT.raw")'
>> > +awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT
>> > +cat <<EOF >EXPECTED
>> > +Yen: $BIG5_YEN
>> > +EOF
>> > +test_expect_equal_file OUTPUT EXPECTED
>> > +
>> > +test_done
>> > diff --git a/test/test-lib.el b/test/test-lib.el
>> > index 437f83f..dd3446e 100644
>> > --- a/test/test-lib.el
>> > +++ b/test/test-lib.el
>> > @@ -57,7 +57,9 @@ (defun test-output (&optional filename)
>> >  (defun test-visible-output (&optional filename)
>> >    "Save visible text in current buffer to file FILENAME.  Default
>> >  FILENAME is OUTPUT."
>> > -  (let ((text (visible-buffer-string)))
>> > +  (let ((text (visible-buffer-string))
>> > +	;; Tests expect output in UTF-8 encoding
>> > +	(coding-system-for-write 'utf-8))
>> >      (with-temp-file (or filename "OUTPUT") (insert text))))
>> >  
>> >  (defun visible-buffer-string ()
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/11] Improve charset and cid: handling
  2014-04-21 18:37 [PATCH 00/11] Improve charset and cid: handling Austin Clements
                   ` (11 preceding siblings ...)
  2014-04-21 20:26 ` [PATCH 00/11] Improve charset and cid: handling Tomi Ollila
@ 2014-04-26 10:50 ` Mark Walters
  2015-01-24 17:29   ` Austin Clements
  12 siblings, 1 reply; 32+ messages in thread
From: Mark Walters @ 2014-04-26 10:50 UTC (permalink / raw)
  To: Austin Clements, notmuch


Aside from the minor comments I mentioned in previous emails and one
more comment below this looks good. 

The extra comment is that on emacs23 I get the following when compiling:

In end of data:
notmuch-show.el:2188:1:Warning: the following functions are not known to be
    defined: libxml-parse-html-region, shr-insert-document

Finally, I have not really tested it as I mainly use emacs23

Best wishes

Mark




On Mon, 21 Apr 2014, Austin Clements <amdragon@MIT.EDU> wrote:
> I set out to quickly add support for cid: links in the shr renderer
> and wound up making our charset handling more robust and rewriting our
> content-ID handling.  The test introduced in patch 2 passes in all but
> one really obscure case, but only because of many unwritten and
> potentially fragile assumptions that Emacs and the CLI make about each
> other.
>
> The first three patches could reasonably go in to 0.18.  The rest of
> this series is certainly post-0.18, but I didn't want to lose track of
> it.
>
> This series comes in three stages.  Each depends on the earlier ones,
> but each prefix makes sense on its own and could be pushed without the
> later stages.
>
> Patch 1 is a simple clean up patch.
>
> Patches 2 through 7 robust-ify our charset handling in Emacs, mostly
> by splitting the broken `notmuch-get-bodypart-content' API into
> `notmuch-get-bodypart-binary' and `notmuch-get-bodypart-text' so a
> caller can explicitly convey their requirements.
>
> The remaining patches improve our content-ID handling and add support
> for cid: links for shr.
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 11/11] emacs: Support cid: references with shr renderer
  2014-04-21 18:37 ` [PATCH 11/11] emacs: Support cid: references with shr renderer Austin Clements
@ 2014-05-01  9:20   ` David Edmondson
  2015-01-24 17:12     ` Austin Clements
  0 siblings, 1 reply; 32+ messages in thread
From: David Edmondson @ 2014-05-01  9:20 UTC (permalink / raw)
  To: Austin Clements, notmuch

On Mon, Apr 21 2014, Austin Clements wrote:
> +(defun notmuch-show--insert-part-text/html-shr (msg part)
> +  ;; Make sure shr is loaded before we start let-binding its globals
> +  (require 'shr)
> +  (let ((dom (let (process-crypto notmuch-show-process-crypto)

Missing brackets? (You let-bind both `process-crypto' and
`notmuch-show-process-crypto' as nil.)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 01/11] emacs: Remove redundant NTH argument from `notmuch-get-bodypart-content'.
  2014-04-21 18:37 ` [PATCH 01/11] emacs: Remove redundant NTH argument from `notmuch-get-bodypart-content' Austin Clements
@ 2014-07-10  9:46   ` David Bremner
  0 siblings, 0 replies; 32+ messages in thread
From: David Bremner @ 2014-07-10  9:46 UTC (permalink / raw)
  To: Austin Clements, notmuch

Austin Clements <amdragon@MIT.EDU> writes:

> This can be derived from the PART argument (which is arguably
> canonical), so there's no sense in giving the caller an extra foot
> gun.

This patch seem ready to go, independant of the rest of the series.

d

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 03/11] emacs: Fix coding system in `notmuch-show-view-raw-message'
  2014-04-21 18:37 ` [PATCH 03/11] emacs: Fix coding system in `notmuch-show-view-raw-message' Austin Clements
@ 2014-07-10  9:48   ` David Bremner
  2014-09-23 17:59   ` David Bremner
  1 sibling, 0 replies; 32+ messages in thread
From: David Bremner @ 2014-07-10  9:48 UTC (permalink / raw)
  To: Austin Clements, notmuch

Austin Clements <amdragon@MIT.EDU> writes:

> This fixes the known-broken test of viewing 8bit messages added by the
> previous commit.

this looks OK, but waits on a new version of the test in question?

d

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/11] emacs: Track full message and part descriptor in w3m CID store
  2014-04-21 18:37 ` [PATCH 04/11] emacs: Track full message and part descriptor in w3m CID store Austin Clements
@ 2014-07-10  9:52   ` David Bremner
  2015-01-24 16:45     ` Austin Clements
  0 siblings, 1 reply; 32+ messages in thread
From: David Bremner @ 2014-07-10  9:52 UTC (permalink / raw)
  To: Austin Clements; +Cc: Notmuch Mail

Austin Clements <amdragon@MIT.EDU> writes:

> This will simplify later changes.

I'd have preferred the whitespace changes as a seperate patch, but OK.

d

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' API
  2014-04-21 18:37 ` [PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' API Austin Clements
@ 2014-07-11 11:48   ` David Bremner
  2015-01-24 17:10     ` Austin Clements
  0 siblings, 1 reply; 32+ messages in thread
From: David Bremner @ 2014-07-11 11:48 UTC (permalink / raw)
  To: Austin Clements, notmuch

Austin Clements <amdragon@MIT.EDU> writes:

> +This returns the content of the given part as a multibyte Lisp

What does "multibyte" mean here? utf8? current encoding?

> +string after performing content transfer decoding and any
> +necessary charset decoding.  It is an error to use this for
> +non-text/* parts."
> +  (let ((content (plist-get part :content)))
> +    (when (not content)
> +      ;; Use show --format=sexp to fetch decoded content
> +      (let* ((args `("show" "--format=sexp" "--include-html"
> +		     ,(format "--part=%s" (plist-get part :id))
> +		     ,@(when process-crypto '("--decrypt"))
> +		     ,(notmuch-id-to-query (plist-get msg :id))))
> +	     (npart (apply #'notmuch-call-notmuch-sexp args)))
> +	(setq content (plist-get npart :content))
> +	(when (not content)
> +	  (error "Internal error: No :content from %S" args))))
> +    content))

I'm a bit curious at the lack of setting "coding-system-for-read" here.
Are we assuming the user has their environment set up correctly? Not so
much a criticism as being nervous about everything coding-system
related.

I didn't see anything else to object to in this patch or the previous
one.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 03/11] emacs: Fix coding system in `notmuch-show-view-raw-message'
  2014-04-21 18:37 ` [PATCH 03/11] emacs: Fix coding system in `notmuch-show-view-raw-message' Austin Clements
  2014-07-10  9:48   ` David Bremner
@ 2014-09-23 17:59   ` David Bremner
  1 sibling, 0 replies; 32+ messages in thread
From: David Bremner @ 2014-09-23 17:59 UTC (permalink / raw)
  To: Austin Clements, notmuch

Austin Clements <amdragon@MIT.EDU> writes:

> This fixes the known-broken test of viewing 8bit messages added by the
> previous commit.

I pushed the first 3 patches in the series.

d

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/11] emacs: Track full message and part descriptor in w3m CID store
  2014-07-10  9:52   ` David Bremner
@ 2015-01-24 16:45     ` Austin Clements
  0 siblings, 0 replies; 32+ messages in thread
From: Austin Clements @ 2015-01-24 16:45 UTC (permalink / raw)
  To: David Bremner; +Cc: Notmuch Mail

On Thu, 10 Jul 2014, David Bremner <david@tethera.net> wrote:
> Austin Clements <amdragon@MIT.EDU> writes:
>
>> This will simplify later changes.
>
> I'd have preferred the whitespace changes as a seperate patch, but OK.

Not sure which whitespace changes you're referring to.  Everything in
this diff is actual (minor) code changes.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' API
  2014-07-11 11:48   ` David Bremner
@ 2015-01-24 17:10     ` Austin Clements
  0 siblings, 0 replies; 32+ messages in thread
From: Austin Clements @ 2015-01-24 17:10 UTC (permalink / raw)
  To: David Bremner, notmuch

On Fri, 11 Jul 2014, David Bremner <david@tethera.net> wrote:
> Austin Clements <amdragon@MIT.EDU> writes:
>
>> +This returns the content of the given part as a multibyte Lisp
>
> What does "multibyte" mean here? utf8? current encoding?

Elisp has two kinds of stings: "unibyte strings" and "multibyte
strings".

  https://www.gnu.org/software/emacs/manual/html_node/elisp/Non_002dASCII-in-Strings.html

You can think of unibyte strings as binary data; they're just vectors of
bytes without any particular encoding semantics (though when you use a
unibyte string you can endow it with encoding).  Multibyte strings,
however, are text; they're vectors of Unicode code points.

>> +string after performing content transfer decoding and any
>> +necessary charset decoding.  It is an error to use this for
>> +non-text/* parts."
>> +  (let ((content (plist-get part :content)))
>> +    (when (not content)
>> +      ;; Use show --format=sexp to fetch decoded content
>> +      (let* ((args `("show" "--format=sexp" "--include-html"
>> +		     ,(format "--part=%s" (plist-get part :id))
>> +		     ,@(when process-crypto '("--decrypt"))
>> +		     ,(notmuch-id-to-query (plist-get msg :id))))
>> +	     (npart (apply #'notmuch-call-notmuch-sexp args)))
>> +	(setq content (plist-get npart :content))
>> +	(when (not content)
>> +	  (error "Internal error: No :content from %S" args))))
>> +    content))
>
> I'm a bit curious at the lack of setting "coding-system-for-read" here.
> Are we assuming the user has their environment set up correctly? Not so
> much a criticism as being nervous about everything coding-system
> related.

That is interesting.  coding-system-for-read should really go in
notmuch-call-notmuch-sexp, but I worry that, while *almost* all strings
the CLI outputs are UTF-8, not quite all of them are.  For example, we
output filenames exactly at the OS reports the bytes to us (which is
necessary, in a sense, because POSIX enforces no particular encoding on
file names, but still really unfortunate).

We could set coding-system-for-read, but a full solution needs more
cooperation from the CLI.  Possibly the right answer, at least for the
sexp format, is to do our own UTF-8 to "\uXXXX" escapes for strings that
are known to be UTF-8 and leave the raw bytes for the few that aren't.
Then we would set the coding-system-for-read to 'no-conversion and I
think everything would Just Work.

That doesn't help for JSON, which is supposed to be all UTF-8 all the
time.  I can think of solutions there, but they're all ugly and involve
things like encoding filenames as base64 when they aren't valid UTF-8.

So...  I don't think I'm going to do anything about this at this moment.

> I didn't see anything else to object to in this patch or the previous
> one.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 11/11] emacs: Support cid: references with shr renderer
  2014-05-01  9:20   ` David Edmondson
@ 2015-01-24 17:12     ` Austin Clements
  0 siblings, 0 replies; 32+ messages in thread
From: Austin Clements @ 2015-01-24 17:12 UTC (permalink / raw)
  To: David Edmondson, notmuch

On Thu, 01 May 2014, David Edmondson <dme@dme.org> wrote:
> On Mon, Apr 21 2014, Austin Clements wrote:
>> +(defun notmuch-show--insert-part-text/html-shr (msg part)
>> +  ;; Make sure shr is loaded before we start let-binding its globals
>> +  (require 'shr)
>> +  (let ((dom (let (process-crypto notmuch-show-process-crypto)
>
> Missing brackets? (You let-bind both `process-crypto' and
> `notmuch-show-process-crypto' as nil.)

'Doh.  Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/11] Improve charset and cid: handling
  2014-04-26 10:50 ` Mark Walters
@ 2015-01-24 17:29   ` Austin Clements
  0 siblings, 0 replies; 32+ messages in thread
From: Austin Clements @ 2015-01-24 17:29 UTC (permalink / raw)
  To: Mark Walters, notmuch

I added declare-functions for both of these, which should take care of
the Emacs 23 warnings and be more robust on Emacs 24.  We can't reach
the function that calls these unless shr is actually available, but the
byte compiler doesn't know that.

On Sat, 26 Apr 2014, Mark Walters <markwalters1009@gmail.com> wrote:
> Aside from the minor comments I mentioned in previous emails and one
> more comment below this looks good. 
>
> The extra comment is that on emacs23 I get the following when compiling:
>
> In end of data:
> notmuch-show.el:2188:1:Warning: the following functions are not known to be
>     defined: libxml-parse-html-region, shr-insert-document
>
> Finally, I have not really tested it as I mainly use emacs23
>
> Best wishes
>
> Mark
>
>
>
>
> On Mon, 21 Apr 2014, Austin Clements <amdragon@MIT.EDU> wrote:
>> I set out to quickly add support for cid: links in the shr renderer
>> and wound up making our charset handling more robust and rewriting our
>> content-ID handling.  The test introduced in patch 2 passes in all but
>> one really obscure case, but only because of many unwritten and
>> potentially fragile assumptions that Emacs and the CLI make about each
>> other.
>>
>> The first three patches could reasonably go in to 0.18.  The rest of
>> this series is certainly post-0.18, but I didn't want to lose track of
>> it.
>>
>> This series comes in three stages.  Each depends on the earlier ones,
>> but each prefix makes sense on its own and could be pushed without the
>> later stages.
>>
>> Patch 1 is a simple clean up patch.
>>
>> Patches 2 through 7 robust-ify our charset handling in Emacs, mostly
>> by splitting the broken `notmuch-get-bodypart-content' API into
>> `notmuch-get-bodypart-binary' and `notmuch-get-bodypart-text' so a
>> caller can explicitly convey their requirements.
>>
>> The remaining patches improve our content-ID handling and add support
>> for cid: links for shr.
>>
>> _______________________________________________
>> notmuch mailing list
>> notmuch@notmuchmail.org
>> http://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/11] emacs: Support caching in notmuch-get-bodypart-{binary, text}
  2014-04-25  6:42       ` Mark Walters
@ 2015-01-24 18:06         ` Austin Clements
  0 siblings, 0 replies; 32+ messages in thread
From: Austin Clements @ 2015-01-24 18:06 UTC (permalink / raw)
  To: Mark Walters; +Cc: notmuch

On Fri, 25 Apr 2014, Mark Walters <markwalters1009@gmail.com> wrote:
> On Thu, 24 Apr 2014, Austin Clements <amdragon@MIT.EDU> wrote:
>> Quoth Mark Walters on Apr 24 at 11:46 am:
>>> 
>>> On Mon, 21 Apr 2014, Austin Clements <amdragon@MIT.EDU> wrote:
>>> > (The actual code change here is small, but requires re-indenting
>>> > existing code.)
>>> > ---
>>> >  emacs/notmuch-lib.el | 52 ++++++++++++++++++++++++++++++----------------------
>>> >  1 file changed, 30 insertions(+), 22 deletions(-)
>>> >
>>> > diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
>>> > index fc67b14..fee8512 100644
>>> > --- a/emacs/notmuch-lib.el
>>> > +++ b/emacs/notmuch-lib.el
>>> > @@ -503,33 +503,39 @@ (defun notmuch-parts-filter-by-type (parts type)
>>> >     (lambda (part) (notmuch-match-content-type (plist-get part :content-type) type))
>>> >     parts))
>>> >  
>>> > -(defun notmuch-get-bodypart-binary (msg part process-crypto)
>>> > +(defun notmuch-get-bodypart-binary (msg part process-crypto &optional cache)
>>> >    "Return the unprocessed content of PART in MSG as a unibyte string.
>>> >  
>>> >  This returns the \"raw\" content of the given part after content
>>> >  transfer decoding, but with no further processing (see the
>>> >  discussion of --format=raw in man notmuch-show).  In particular,
>>> >  this does no charset conversion."
>>> > -  (let ((args `("show" "--format=raw"
>>> > -		,(format "--part=%d" (plist-get part :id))
>>> > -		,@(when process-crypto '("--decrypt"))
>>> > -		,(notmuch-id-to-query (plist-get msg :id)))))
>>> > -    (with-temp-buffer
>>> > -      ;; Emacs internally uses a UTF-8-like multibyte string
>>> > -      ;; representation by default (regardless of the coding system,
>>> > -      ;; which only affects how it goes from outside data to this
>>> > -      ;; internal representation).  This *almost* never matters.
>>> > -      ;; Annoyingly, it does matter if we use this data in an image
>>> > -      ;; descriptor, since Emacs will use its internal data buffer
>>> > -      ;; directly and this multibyte representation corrupts binary
>>> > -      ;; image formats.  Since the caller is asking for binary data, a
>>> > -      ;; unibyte string is a more appropriate representation anyway.
>>> > -      (set-buffer-multibyte nil)
>>> > -      (let ((coding-system-for-read 'no-conversion))
>>> > -	(apply #'call-process notmuch-command nil '(t nil) nil args)
>>> > -	(buffer-string)))))
>>> > -
>>> > -(defun notmuch-get-bodypart-text (msg part process-crypto)
>>> > +  (let ((data (plist-get part :binary-content)))
>>> > +    (when (not data)
>>> > +      (let ((args `("show" "--format=raw"
>>> > +		    ,(format "--part=%d" (plist-get part :id))
>>> > +		    ,@(when process-crypto '("--decrypt"))
>>> > +		    ,(notmuch-id-to-query (plist-get msg :id)))))
>>> > +	(with-temp-buffer
>>> > +	  ;; Emacs internally uses a UTF-8-like multibyte string
>>> > +	  ;; representation by default (regardless of the coding
>>> > +	  ;; system, which only affects how it goes from outside data
>>> > +	  ;; to this internal representation).  This *almost* never
>>> > +	  ;; matters.  Annoyingly, it does matter if we use this data
>>> > +	  ;; in an image descriptor, since Emacs will use its internal
>>> > +	  ;; data buffer directly and this multibyte representation
>>> > +	  ;; corrupts binary image formats.  Since the caller is
>>> > +	  ;; asking for binary data, a unibyte string is a more
>>> > +	  ;; appropriate representation anyway.
>>> > +	  (set-buffer-multibyte nil)
>>> > +	  (let ((coding-system-for-read 'no-conversion))
>>> > +	    (apply #'call-process notmuch-command nil '(t nil) nil args)
>>> > +	    (setq data (buffer-string)))))
>>> > +      (when cache
>>> > +	(plist-put part :binary-content data)))
>>> > +    data))
>>> 
>>> I am a little puzzled by this but that could be lack of familiarity with
>>> elisp. As far as I can see plist-put will sometimes modify the original
>>> plist and sometimes return a new plist. If the latter happens then I
>>> think it works out as if we hadn't cached anything as the part passed to
>>> the function is unmodified. That might not matter in this case (though I
>>> find the lack of determinism disturbing).
>>> 
>>> Or is something else going on?
>>
>> No, your familiarity with Elisp serves you well.  I'm completely
>> cheating here.  According to the specification of plist-put, it's
>> allowed to return a new list but in reality this only happens when the
>> original plist is nil.  We lean on this already all over the
>> notmuch-emacs code, but maybe that doesn't excuse me adding one more
>> cheat.
>>
>> I could add a comment here explaining what's going on, I could
>> manually do the list insertion in a way that's guaranteed to mutate it
>> in place, or I could add a nil :binary-content property when parts are
>> created (since plist-put is guaranteed to mutate existing keys in
>> place).
>
> I think a comment is fine. 

Done.

> (Incidentally what is the best way of telling if emacs has changed an
> object or returned a new one for other commands? Something like (setq
> oldobject object) (setq object (operation-on object)) (if (eq object
> oldobject) ... ))

If `eq' returns t, it definitely returned the original object, though it
may or may not have modified it.  If `eq' returns nil, there's no
guarantee that it *didn't* modify the object you passed in (maybe it
modified it and tacked a new cons on the beginning).  In short, there's
really no way of knowing.

> Also, I think the function should have a comment about the lifetime of
> the caching. I think in some cases the addition of :binary-content could
> occur on load and thus the plist with binary content added would get
> saved in the buffer when the msg plist was saved as a
> text-property. However, maybe in other cases this gets called after the
> initial insertion and thus the cached value is just used during this
> operation on msg? Sorry that is a little incoherent as I haven't checked
> all callers.

I believe the caching will always last as long as the buffer.  We only
have one instance of each message plist.  This plist goes in to the text
properties and caching works by mutating this same plist.

However, I updated the docstring to clarify that the cache is stored in
MSG.  These functions don't ultimately control the lifetime of MSG, but
this gives the caller enough information to know the lifetime of the
cache if it knows the lifetime of MSG.

> Best wishes
>
> Mark
>
>
>
>>> Best wishes
>>> 
>>> Mark
>>> 
>>> 
>>> 
>>> > +
>>> > +(defun notmuch-get-bodypart-text (msg part process-crypto &optional cache)
>>> >    "Return the text content of PART in MSG.
>>> >  
>>> >  This returns the content of the given part as a multibyte Lisp
>>> > @@ -546,7 +552,9 @@ (defun notmuch-get-bodypart-text (msg part process-crypto)
>>> >  	     (npart (apply #'notmuch-call-notmuch-sexp args)))
>>> >  	(setq content (plist-get npart :content))
>>> >  	(when (not content)
>>> > -	  (error "Internal error: No :content from %S" args))))
>>> > +	  (error "Internal error: No :content from %S" args)))
>>> > +      (when cache
>>> > +	(plist-put part :content content)))
>>> >      content))
>>> >  
>>> >  ;; Workaround: The call to `mm-display-part' below triggers a bug in

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2015-01-24 18:07 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-21 18:37 [PATCH 00/11] Improve charset and cid: handling Austin Clements
2014-04-21 18:37 ` [PATCH 01/11] emacs: Remove redundant NTH argument from `notmuch-get-bodypart-content' Austin Clements
2014-07-10  9:46   ` David Bremner
2014-04-21 18:37 ` [PATCH 02/11] test: New tests for Emacs charset handling Austin Clements
2014-04-24 14:38   ` Mark Walters
2014-04-24 18:29     ` Austin Clements
2014-04-25  6:18       ` Mark Walters
2014-04-25 13:57       ` Tomi Ollila
2014-04-21 18:37 ` [PATCH 03/11] emacs: Fix coding system in `notmuch-show-view-raw-message' Austin Clements
2014-07-10  9:48   ` David Bremner
2014-09-23 17:59   ` David Bremner
2014-04-21 18:37 ` [PATCH 04/11] emacs: Track full message and part descriptor in w3m CID store Austin Clements
2014-07-10  9:52   ` David Bremner
2015-01-24 16:45     ` Austin Clements
2014-04-21 18:37 ` [PATCH 05/11] emacs: Create an API for fetching parts as undecoded binary Austin Clements
2014-04-21 18:37 ` [PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' API Austin Clements
2014-07-11 11:48   ` David Bremner
2015-01-24 17:10     ` Austin Clements
2014-04-21 18:37 ` [PATCH 07/11] emacs: Return unibyte strings for binary part data Austin Clements
2014-04-21 18:37 ` [PATCH 08/11] emacs: Support caching in notmuch-get-bodypart-{binary, text} Austin Clements
2014-04-24 10:46   ` Mark Walters
2014-04-24 18:12     ` Austin Clements
2014-04-25  6:42       ` Mark Walters
2015-01-24 18:06         ` Austin Clements
2014-04-21 18:37 ` [PATCH 09/11] emacs: Use generalized content caching in w3m CID code Austin Clements
2014-04-21 18:37 ` [PATCH 10/11] emacs: Rewrite content ID handling Austin Clements
2014-04-21 18:37 ` [PATCH 11/11] emacs: Support cid: references with shr renderer Austin Clements
2014-05-01  9:20   ` David Edmondson
2015-01-24 17:12     ` Austin Clements
2014-04-21 20:26 ` [PATCH 00/11] Improve charset and cid: handling Tomi Ollila
2014-04-26 10:50 ` Mark Walters
2015-01-24 17:29   ` Austin Clements

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).