unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: David Bremner <david@tethera.net>
To: Ioan-Adrian Ratiu <adi@adirat.com>, notmuch@notmuchmail.org
Subject: Re: [PATCH v2 01/11] lib: message: index message file sizes
Date: Thu, 08 Jun 2017 08:39:16 -0300	[thread overview]
Message-ID: <87o9tyemjf.fsf@tethera.net> (raw)
In-Reply-To: <20170518222708.30032-2-adi@adirat.com>


As a preliminary note, I think this series will most likely need to
adapt to the reindexing series
id:20170604123235.24466-2-david@tethera.net as I think they are touching
the same parts of the code.  You might want to wait for that to go in
(or for it to be cancelled) before reworking your series.

Ioan-Adrian Ratiu <adi@adirat.com> writes:

> Parse & store the file sizes inside notmuch_message_t objects
> while indexing.

That seems not actually to be true, since there is no member of
notmuch_message_t which stores the filesize. It's also a bit confusing,
since indexing is about updating the database, not the in-memory data
structures.

> +    filesize = _notmuch_message_file_get_size (message_file);
> +    filesize_str = talloc_asprintf(NULL, "%lu", filesize);
> +    if (! filesize_str)
> +	return NOTMUCH_STATUS_OUT_OF_MEMORY;
> +
> +    _notmuch_message_add_term (message, "filesize", filesize_str);
> +    talloc_free (filesize_str);
> +

As I mentioned in a previous message,
   1) this crashes, because you have no prefix for filesize yet.
   2) there seems to be no point in adding this term, since you search
   on the value slot anyway.
   Presumably you want to replace it with a call to _notmuch_message_add_filesize.

I did manage to do a little benchmarking after applying the next patch,
and database size and initial indexing time both increase by about
0.5% with the notmuch performance test suite (large version). This seems
acceptable to me, and I would hope it only improves (or at least doesn't
get worse) when the redundant terms are dropped.

> +    /* filesize defaults to zero which is ignored */

Which filesize do you refer to here? I'm a bit on the fence about
pervasively assuming a zero filesize is an error.

> +    ret = g_stat(message->filename, &statResult);
> +    if (! ret)
> +	message->filesize = statResult.st_size;
> +

Why are you using g_stat instead of plain stat? g_stat seems to mainly
add windows compatibility (and confusion, since it's less familiar).

> +unsigned long
> +notmuch_message_get_filesize (notmuch_message_t *message)
> +{
> +    std::string value;
> +
> +    try {
> +	value = message->doc.get_value (NOTMUCH_VALUE_FILESIZE);

I wondered if this was wasteful going straight to the database without
caching, but apparently we do it already for from, subject, and
message-id.

> +    } catch (Xapian::Error &error) {
> +	_notmuch_database_log(_notmuch_message_database (message), "A Xapian exception occurred when reading filesize: %s\n",
> +		 error.get_msg().c_str());
> +	message->notmuch->exception_reported = TRUE;
> +	return 0;
> +    }
> +    if (value.empty ())
> +	/* sortable_unserialise is undefined on empty string */
> +	return 0;
> +    return Xapian::sortable_unserialise (value);
> +}

I'm not sure about this error handling. Do we want an API where we can't
tell the difference between a missing value, an empty file, and a
transient Xapian exception? OTOH, I do see that it's a bit clunky to use
a status return and output pointer here.
>  
> +void
> +_notmuch_message_add_filesize (notmuch_message_t *message,
> +			       notmuch_message_file_t *message_file)
> +{
> +    unsigned long filesize = _notmuch_message_file_get_size(message_file);
> +    message->doc.add_value (NOTMUCH_VALUE_FILESIZE,
> +			    Xapian::sortable_serialise (filesize));
> +}

Shouldn't this have some exception handling (and probably an error
return)? basically any xapian operation can throw an exception.

>  /**
> + * Get the filesize in bytes of 'message'.
> + */
> +unsigned long
> +notmuch_message_get_filesize  (notmuch_message_t *message);
> +
> +/**

Please document the error conditions and returns of any public API call added.

  parent reply	other threads:[~2017-06-08 11:39 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-18 22:26 [PATCH v2 00/11] Add filesize index, search, sort & emacs UI Ioan-Adrian Ratiu
2017-05-18 22:26 ` [PATCH v2 01/11] lib: message: index message file sizes Ioan-Adrian Ratiu
2017-06-06 23:47   ` David Bremner
2017-06-08 11:39   ` David Bremner [this message]
2017-05-18 22:26 ` [PATCH v2 02/11] lib: database: store message filesize & add range processor Ioan-Adrian Ratiu
2017-06-09 23:18   ` David Bremner
2017-05-18 22:27 ` [PATCH v2 03/11] notmuch-search: add filesize based sort order Ioan-Adrian Ratiu
2017-05-19  9:42   ` Tomi Ollila
2017-05-22 13:37     ` Ioan-Adrian Ratiu
2017-06-10  0:30   ` David Bremner
2017-05-18 22:27 ` [PATCH v2 04/11] emacs: make notmuch-search-oldest-first generic Ioan-Adrian Ratiu
2017-06-11  0:13   ` David Bremner
2017-05-18 22:27 ` [PATCH v2 05/11] emacs: notmuch-search: add filesize sorting Ioan-Adrian Ratiu
2017-06-11  0:15   ` David Bremner
2017-05-18 22:27 ` [PATCH v2 06/11] sprinter: add unsigned_long printer function Ioan-Adrian Ratiu
2017-05-23 16:53   ` Jani Nikula
2017-05-18 22:27 ` [PATCH v2 07/11] lib: thread: add thread total size function Ioan-Adrian Ratiu
2017-06-11  0:16   ` David Bremner
2017-05-18 22:27 ` [PATCH v2 08/11] notmuch-search: output total_filesize thread result Ioan-Adrian Ratiu
2017-06-11  0:22   ` David Bremner
2017-05-18 22:27 ` [PATCH v2 09/11] notmuch-show: export message filesize Ioan-Adrian Ratiu
2017-06-11  0:42   ` David Bremner
2017-05-18 22:27 ` [PATCH v2 10/11] emacs: notmuch-search: add display thread sizes capability Ioan-Adrian Ratiu
2017-06-11  0:24   ` David Bremner
2017-06-11  0:40   ` David Bremner
2017-05-18 22:27 ` [PATCH v2 11/11] emacs: notmuch-show: add filesize to headerline Ioan-Adrian Ratiu
2017-06-11  0:55   ` David Bremner
2017-05-23 17:19 ` [PATCH v2 00/11] Add filesize index, search, sort & emacs UI Jani Nikula
2017-05-23 19:20   ` Ioan-Adrian Ratiu
2017-06-06 18:51 ` David Bremner
2017-06-06 23:11   ` David Bremner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o9tyemjf.fsf@tethera.net \
    --to=david@tethera.net \
    --cc=adi@adirat.com \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).