From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <david@tethera.net>
Received: from localhost (localhost [127.0.0.1])
 by arlo.cworth.org (Postfix) with ESMTP id E18FE6DE0B7C
 for <notmuch@notmuchmail.org>; Thu,  8 Jun 2017 04:39:22 -0700 (PDT)
X-Virus-Scanned: Debian amavisd-new at cworth.org
X-Spam-Flag: NO
X-Spam-Score: -0.001
X-Spam-Level: 
X-Spam-Status: No, score=-0.001 tagged_above=-999 required=5 tests=[AWL=0.010, 
 SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled
Received: from arlo.cworth.org ([127.0.0.1])
 by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id EH1yCYiz_71P for <notmuch@notmuchmail.org>;
 Thu,  8 Jun 2017 04:39:22 -0700 (PDT)
Received: from fethera.tethera.net (fethera.tethera.net [198.245.60.197])
 by arlo.cworth.org (Postfix) with ESMTPS id DF75D6DE0B6D
 for <notmuch@notmuchmail.org>; Thu,  8 Jun 2017 04:39:21 -0700 (PDT)
Received: from remotemail by fethera.tethera.net with local (Exim 4.84_2)
 (envelope-from <david@tethera.net>)
 id 1dIvlb-00050k-K5; Thu, 08 Jun 2017 07:38:19 -0400
Received: (nullmailer pid 5448 invoked by uid 1000);
 Thu, 08 Jun 2017 11:39:16 -0000
From: David Bremner <david@tethera.net>
To: Ioan-Adrian Ratiu <adi@adirat.com>, notmuch@notmuchmail.org
Subject: Re: [PATCH v2 01/11] lib: message: index message file sizes
In-Reply-To: <20170518222708.30032-2-adi@adirat.com>
References: <20170518222708.30032-1-adi@adirat.com>
 <20170518222708.30032-2-adi@adirat.com>
Date: Thu, 08 Jun 2017 08:39:16 -0300
Message-ID: <87o9tyemjf.fsf@tethera.net>
MIME-Version: 1.0
Content-Type: text/plain
X-BeenThere: notmuch@notmuchmail.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Use and development of the notmuch mail system."
 <notmuch.notmuchmail.org>
List-Unsubscribe: <https://notmuchmail.org/mailman/options/notmuch>,
 <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>
List-Archive: <http://notmuchmail.org/pipermail/notmuch/>
List-Post: <mailto:notmuch@notmuchmail.org>
List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>
List-Subscribe: <https://notmuchmail.org/mailman/listinfo/notmuch>,
 <mailto:notmuch-request@notmuchmail.org?subject=subscribe>
X-List-Received-Date: Thu, 08 Jun 2017 11:39:23 -0000


As a preliminary note, I think this series will most likely need to
adapt to the reindexing series
id:20170604123235.24466-2-david@tethera.net as I think they are touching
the same parts of the code.  You might want to wait for that to go in
(or for it to be cancelled) before reworking your series.

Ioan-Adrian Ratiu <adi@adirat.com> writes:

> Parse & store the file sizes inside notmuch_message_t objects
> while indexing.

That seems not actually to be true, since there is no member of
notmuch_message_t which stores the filesize. It's also a bit confusing,
since indexing is about updating the database, not the in-memory data
structures.

> +    filesize = _notmuch_message_file_get_size (message_file);
> +    filesize_str = talloc_asprintf(NULL, "%lu", filesize);
> +    if (! filesize_str)
> +	return NOTMUCH_STATUS_OUT_OF_MEMORY;
> +
> +    _notmuch_message_add_term (message, "filesize", filesize_str);
> +    talloc_free (filesize_str);
> +

As I mentioned in a previous message,
   1) this crashes, because you have no prefix for filesize yet.
   2) there seems to be no point in adding this term, since you search
   on the value slot anyway.
   Presumably you want to replace it with a call to _notmuch_message_add_filesize.

I did manage to do a little benchmarking after applying the next patch,
and database size and initial indexing time both increase by about
0.5% with the notmuch performance test suite (large version). This seems
acceptable to me, and I would hope it only improves (or at least doesn't
get worse) when the redundant terms are dropped.

> +    /* filesize defaults to zero which is ignored */

Which filesize do you refer to here? I'm a bit on the fence about
pervasively assuming a zero filesize is an error.

> +    ret = g_stat(message->filename, &statResult);
> +    if (! ret)
> +	message->filesize = statResult.st_size;
> +

Why are you using g_stat instead of plain stat? g_stat seems to mainly
add windows compatibility (and confusion, since it's less familiar).

> +unsigned long
> +notmuch_message_get_filesize (notmuch_message_t *message)
> +{
> +    std::string value;
> +
> +    try {
> +	value = message->doc.get_value (NOTMUCH_VALUE_FILESIZE);

I wondered if this was wasteful going straight to the database without
caching, but apparently we do it already for from, subject, and
message-id.

> +    } catch (Xapian::Error &error) {
> +	_notmuch_database_log(_notmuch_message_database (message), "A Xapian exception occurred when reading filesize: %s\n",
> +		 error.get_msg().c_str());
> +	message->notmuch->exception_reported = TRUE;
> +	return 0;
> +    }
> +    if (value.empty ())
> +	/* sortable_unserialise is undefined on empty string */
> +	return 0;
> +    return Xapian::sortable_unserialise (value);
> +}

I'm not sure about this error handling. Do we want an API where we can't
tell the difference between a missing value, an empty file, and a
transient Xapian exception? OTOH, I do see that it's a bit clunky to use
a status return and output pointer here.
>  
> +void
> +_notmuch_message_add_filesize (notmuch_message_t *message,
> +			       notmuch_message_file_t *message_file)
> +{
> +    unsigned long filesize = _notmuch_message_file_get_size(message_file);
> +    message->doc.add_value (NOTMUCH_VALUE_FILESIZE,
> +			    Xapian::sortable_serialise (filesize));
> +}

Shouldn't this have some exception handling (and probably an error
return)? basically any xapian operation can throw an exception.

>  /**
> + * Get the filesize in bytes of 'message'.
> + */
> +unsigned long
> +notmuch_message_get_filesize  (notmuch_message_t *message);
> +
> +/**

Please document the error conditions and returns of any public API call added.