unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: Austin Clements <amdragon@mit.edu>
To: David Bremner <david@tethera.net>, notmuch@notmuchmail.org
Subject: Re: [PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' API
Date: Sat, 24 Jan 2015 12:10:29 -0500	[thread overview]
Message-ID: <874mrgumt6.fsf@csail.mit.edu> (raw)
In-Reply-To: <8738e8p13v.fsf@maritornes.cs.unb.ca>

On Fri, 11 Jul 2014, David Bremner <david@tethera.net> wrote:
> Austin Clements <amdragon@MIT.EDU> writes:
>
>> +This returns the content of the given part as a multibyte Lisp
>
> What does "multibyte" mean here? utf8? current encoding?

Elisp has two kinds of stings: "unibyte strings" and "multibyte
strings".

  https://www.gnu.org/software/emacs/manual/html_node/elisp/Non_002dASCII-in-Strings.html

You can think of unibyte strings as binary data; they're just vectors of
bytes without any particular encoding semantics (though when you use a
unibyte string you can endow it with encoding).  Multibyte strings,
however, are text; they're vectors of Unicode code points.

>> +string after performing content transfer decoding and any
>> +necessary charset decoding.  It is an error to use this for
>> +non-text/* parts."
>> +  (let ((content (plist-get part :content)))
>> +    (when (not content)
>> +      ;; Use show --format=sexp to fetch decoded content
>> +      (let* ((args `("show" "--format=sexp" "--include-html"
>> +		     ,(format "--part=%s" (plist-get part :id))
>> +		     ,@(when process-crypto '("--decrypt"))
>> +		     ,(notmuch-id-to-query (plist-get msg :id))))
>> +	     (npart (apply #'notmuch-call-notmuch-sexp args)))
>> +	(setq content (plist-get npart :content))
>> +	(when (not content)
>> +	  (error "Internal error: No :content from %S" args))))
>> +    content))
>
> I'm a bit curious at the lack of setting "coding-system-for-read" here.
> Are we assuming the user has their environment set up correctly? Not so
> much a criticism as being nervous about everything coding-system
> related.

That is interesting.  coding-system-for-read should really go in
notmuch-call-notmuch-sexp, but I worry that, while *almost* all strings
the CLI outputs are UTF-8, not quite all of them are.  For example, we
output filenames exactly at the OS reports the bytes to us (which is
necessary, in a sense, because POSIX enforces no particular encoding on
file names, but still really unfortunate).

We could set coding-system-for-read, but a full solution needs more
cooperation from the CLI.  Possibly the right answer, at least for the
sexp format, is to do our own UTF-8 to "\uXXXX" escapes for strings that
are known to be UTF-8 and leave the raw bytes for the few that aren't.
Then we would set the coding-system-for-read to 'no-conversion and I
think everything would Just Work.

That doesn't help for JSON, which is supposed to be all UTF-8 all the
time.  I can think of solutions there, but they're all ugly and involve
things like encoding filenames as base64 when they aren't valid UTF-8.

So...  I don't think I'm going to do anything about this at this moment.

> I didn't see anything else to object to in this patch or the previous
> one.

  reply	other threads:[~2015-01-24 17:10 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-21 18:37 [PATCH 00/11] Improve charset and cid: handling Austin Clements
2014-04-21 18:37 ` [PATCH 01/11] emacs: Remove redundant NTH argument from `notmuch-get-bodypart-content' Austin Clements
2014-07-10  9:46   ` David Bremner
2014-04-21 18:37 ` [PATCH 02/11] test: New tests for Emacs charset handling Austin Clements
2014-04-24 14:38   ` Mark Walters
2014-04-24 18:29     ` Austin Clements
2014-04-25  6:18       ` Mark Walters
2014-04-25 13:57       ` Tomi Ollila
2014-04-21 18:37 ` [PATCH 03/11] emacs: Fix coding system in `notmuch-show-view-raw-message' Austin Clements
2014-07-10  9:48   ` David Bremner
2014-09-23 17:59   ` David Bremner
2014-04-21 18:37 ` [PATCH 04/11] emacs: Track full message and part descriptor in w3m CID store Austin Clements
2014-07-10  9:52   ` David Bremner
2015-01-24 16:45     ` Austin Clements
2014-04-21 18:37 ` [PATCH 05/11] emacs: Create an API for fetching parts as undecoded binary Austin Clements
2014-04-21 18:37 ` [PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' API Austin Clements
2014-07-11 11:48   ` David Bremner
2015-01-24 17:10     ` Austin Clements [this message]
2014-04-21 18:37 ` [PATCH 07/11] emacs: Return unibyte strings for binary part data Austin Clements
2014-04-21 18:37 ` [PATCH 08/11] emacs: Support caching in notmuch-get-bodypart-{binary, text} Austin Clements
2014-04-24 10:46   ` Mark Walters
2014-04-24 18:12     ` Austin Clements
2014-04-25  6:42       ` Mark Walters
2015-01-24 18:06         ` Austin Clements
2014-04-21 18:37 ` [PATCH 09/11] emacs: Use generalized content caching in w3m CID code Austin Clements
2014-04-21 18:37 ` [PATCH 10/11] emacs: Rewrite content ID handling Austin Clements
2014-04-21 18:37 ` [PATCH 11/11] emacs: Support cid: references with shr renderer Austin Clements
2014-05-01  9:20   ` David Edmondson
2015-01-24 17:12     ` Austin Clements
2014-04-21 20:26 ` [PATCH 00/11] Improve charset and cid: handling Tomi Ollila
2014-04-26 10:50 ` Mark Walters
2015-01-24 17:29   ` Austin Clements

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874mrgumt6.fsf@csail.mit.edu \
    --to=amdragon@mit.edu \
    --cc=david@tethera.net \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).