From: Mark Walters <markwalters1009@gmail.com>
To: David Bremner <david@tethera.net>, David Edmondson <dme@dme.org>,
notmuch@notmuchmail.org
Subject: Re: [PATCH v1 0/3] Improve the acquisition of text parts.
Date: Sat, 26 Mar 2016 09:18:20 +0000 [thread overview]
Message-ID: <87zitlppgj.fsf@qmul.ac.uk> (raw)
In-Reply-To: <87bn6h5lf3.fsf@zancas.localnet>
Hi
Sorry this email ended up rather long:
Summary: I have run a test (see below) on all of the lkml part of the
performance-corpus, and all the changes look expected. So this series
looks good to me.
First note how we do the bodypart-insertion: for a mime type of
text/plain we first try the text/plain handler, then a text/* handler,
and finally a */* handler until one succeeds. Before this series, when
the part is application/octet-stream but is detected as text/plain,
text/plain handler fails with a "bodypart insertion error" because
notmuch-get-bodypart-text fails can't get the text (because it's not
officially text). Thus we fall back on the */* handler and that inserts
the part.
With this series notmuch-get-bodypart-text succeeds and we stop.
Thus in most cases the only change is that we don't get a "bodypart
insertion error", but all the text looks the same. In a couple of cases
the text/plain handler wraps lines/replaces ^M by unix newlines, whereas
as the */* handler does not. This is an improvement.
There is one more "difference" but I think this is actually something
random. Sometimes when the part is application/tar or application/zip I
get "Bodypart insert error: Symbol's function definition is void:
gnus-recursive-directory-files". If I load gnus this goes away. In my
first batch of tests this only occurred when using this series, but
since then I have reproduced it on mainline. I think something else I
did when setting up the test on mainline caused gnus to be loaded, but i
have not worked out what is going on there.
Finally, the test was as follows. I downloaded the performance corpus,
configured a separate notmuch config file to use the
performance-test/corpus/mail/lkml as the mailstore, went into
notmuch-emacs and to the inbox (which contained all messages) and ran
the following lisp function
(defun my-save-all-show ()
(interactive)
(goto-char (point-min))
(let ((count 0))
(while (notmuch-search-find-thread-id)
(let ((thread-id (notmuch-search-find-thread-id)))
(setq count (1+ count))
(message "Thread %s: %s" count thread-id)
(notmuch-show thread-id)
(let ((text (buffer-string))
(coding-system-for-write 'no-conversion))
(with-temp-file (concat "OUTPUT-" thread-id) (insert text)))
(kill-buffer))
(notmuch-search-next-thread))))
I moved the OUTPUT files elsewhere and repeated with this series applied
and then ran diff on the output. This gave 7 threads with a change (each
an individual message) from the 16000 threads/ 100000 messages which I
looked at individually as above.
Best wishes
Mark
On Mon, 14 Mar 2016, David Bremner <david@tethera.net> wrote:
> David Edmondson <dme@dme.org> writes:
>
>> On Sun, Mar 13 2016, Mark Walters wrote:
>>> However, it would be sensible to get testing in a greater variety of
>>> charsets/encodings
>>
>> Agreed. Does anyone have suggestions on how we might achieve this? A
>> corpus of mail that we could use?
>
> Maybe the notmuch performance corpus, particularly the lkml sample.
>
> grep -R charset= performance-test/corpus/mail/lkml | sed -e 's/^.*charset=//' -e 's/;.*//' -e 's/"//g' | tr '[A-Z]' '[a-z]' | sort -u
>
> gives
>
> euc-kr
> gb2312
> iso-2022-jp
> iso-2022-jp-2
> iso-8859-1
> iso-8859-14
> iso 8859-15
> iso-8859-15
> iso-8859-1
> iso-8859-2
> iso-8859-6
> iso-8859-7
> iso-8859-9
> koi8-r
> koi8-u
> ks_c_5601-1987
> shift_jis
> unknown
> unknown-8bit
> us-ascii
> utf8
> utf-8
> windows-1250
> windows-1251
> windows-1252
> windows-1255
>
>
> to unpack the corpus
>
> cd performance-test
> make download-corpus
> ./T00-new.sh --large
>
> probably interrupt the test once notmuch-new starts running.
next prev parent reply other threads:[~2016-03-26 9:18 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-08 17:12 [PATCH v1 0/3] Improve the acquisition of text parts David Edmondson
2016-03-08 17:12 ` [PATCH v1 1/3] emacs: `notmuch-show-insert-part-multipart/encrypted' should not assume the presence of a button David Edmondson
2016-03-08 17:12 ` [PATCH v1 2/3] emacs: Neaten `notmuch-show-insert-bodypart-internal' David Edmondson
2016-03-08 17:12 ` [PATCH v1 3/3] emacs: Improve the acquisition of text parts David Edmondson
2016-03-13 15:22 ` [PATCH v1 0/3] " Mark Walters
2016-03-14 8:31 ` David Edmondson
2016-03-14 11:49 ` David Bremner
2016-03-26 9:18 ` Mark Walters [this message]
2016-03-27 20:48 ` David Bremner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://notmuchmail.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zitlppgj.fsf@qmul.ac.uk \
--to=markwalters1009@gmail.com \
--cc=david@tethera.net \
--cc=dme@dme.org \
--cc=notmuch@notmuchmail.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://yhetil.org/notmuch.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).