From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id E18FE6DE0B7C for ; Thu, 8 Jun 2017 04:39:22 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -0.001 X-Spam-Level: X-Spam-Status: No, score=-0.001 tagged_above=-999 required=5 tests=[AWL=0.010, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EH1yCYiz_71P for ; Thu, 8 Jun 2017 04:39:22 -0700 (PDT) Received: from fethera.tethera.net (fethera.tethera.net [198.245.60.197]) by arlo.cworth.org (Postfix) with ESMTPS id DF75D6DE0B6D for ; Thu, 8 Jun 2017 04:39:21 -0700 (PDT) Received: from remotemail by fethera.tethera.net with local (Exim 4.84_2) (envelope-from ) id 1dIvlb-00050k-K5; Thu, 08 Jun 2017 07:38:19 -0400 Received: (nullmailer pid 5448 invoked by uid 1000); Thu, 08 Jun 2017 11:39:16 -0000 From: David Bremner To: Ioan-Adrian Ratiu , notmuch@notmuchmail.org Subject: Re: [PATCH v2 01/11] lib: message: index message file sizes In-Reply-To: <20170518222708.30032-2-adi@adirat.com> References: <20170518222708.30032-1-adi@adirat.com> <20170518222708.30032-2-adi@adirat.com> Date: Thu, 08 Jun 2017 08:39:16 -0300 Message-ID: <87o9tyemjf.fsf@tethera.net> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Jun 2017 11:39:23 -0000 As a preliminary note, I think this series will most likely need to adapt to the reindexing series id:20170604123235.24466-2-david@tethera.net as I think they are touching the same parts of the code. You might want to wait for that to go in (or for it to be cancelled) before reworking your series. Ioan-Adrian Ratiu writes: > Parse & store the file sizes inside notmuch_message_t objects > while indexing. That seems not actually to be true, since there is no member of notmuch_message_t which stores the filesize. It's also a bit confusing, since indexing is about updating the database, not the in-memory data structures. > + filesize = _notmuch_message_file_get_size (message_file); > + filesize_str = talloc_asprintf(NULL, "%lu", filesize); > + if (! filesize_str) > + return NOTMUCH_STATUS_OUT_OF_MEMORY; > + > + _notmuch_message_add_term (message, "filesize", filesize_str); > + talloc_free (filesize_str); > + As I mentioned in a previous message, 1) this crashes, because you have no prefix for filesize yet. 2) there seems to be no point in adding this term, since you search on the value slot anyway. Presumably you want to replace it with a call to _notmuch_message_add_filesize. I did manage to do a little benchmarking after applying the next patch, and database size and initial indexing time both increase by about 0.5% with the notmuch performance test suite (large version). This seems acceptable to me, and I would hope it only improves (or at least doesn't get worse) when the redundant terms are dropped. > + /* filesize defaults to zero which is ignored */ Which filesize do you refer to here? I'm a bit on the fence about pervasively assuming a zero filesize is an error. > + ret = g_stat(message->filename, &statResult); > + if (! ret) > + message->filesize = statResult.st_size; > + Why are you using g_stat instead of plain stat? g_stat seems to mainly add windows compatibility (and confusion, since it's less familiar). > +unsigned long > +notmuch_message_get_filesize (notmuch_message_t *message) > +{ > + std::string value; > + > + try { > + value = message->doc.get_value (NOTMUCH_VALUE_FILESIZE); I wondered if this was wasteful going straight to the database without caching, but apparently we do it already for from, subject, and message-id. > + } catch (Xapian::Error &error) { > + _notmuch_database_log(_notmuch_message_database (message), "A Xapian exception occurred when reading filesize: %s\n", > + error.get_msg().c_str()); > + message->notmuch->exception_reported = TRUE; > + return 0; > + } > + if (value.empty ()) > + /* sortable_unserialise is undefined on empty string */ > + return 0; > + return Xapian::sortable_unserialise (value); > +} I'm not sure about this error handling. Do we want an API where we can't tell the difference between a missing value, an empty file, and a transient Xapian exception? OTOH, I do see that it's a bit clunky to use a status return and output pointer here. > > +void > +_notmuch_message_add_filesize (notmuch_message_t *message, > + notmuch_message_file_t *message_file) > +{ > + unsigned long filesize = _notmuch_message_file_get_size(message_file); > + message->doc.add_value (NOTMUCH_VALUE_FILESIZE, > + Xapian::sortable_serialise (filesize)); > +} Shouldn't this have some exception handling (and probably an error return)? basically any xapian operation can throw an exception. > /** > + * Get the filesize in bytes of 'message'. > + */ > +unsigned long > +notmuch_message_get_filesize (notmuch_message_t *message); > + > +/** Please document the error conditions and returns of any public API call added.