* [PATCH] new: Don't scan unchanged directories with no sub-directories
@ 2013-10-24 20:33 Austin Clements
2013-10-24 21:08 ` Austin Clements
0 siblings, 1 reply; 9+ messages in thread
From: Austin Clements @ 2013-10-24 20:33 UTC (permalink / raw)
To: notmuch
This can substantially reduce the cost of notmuch new in some
situations, such as when the file system cache is cold or when the
Maildir is on NFS.
---
notmuch-new.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/notmuch-new.c b/notmuch-new.c
index faa33f1..364c73a 100644
--- a/notmuch-new.c
+++ b/notmuch-new.c
@@ -323,6 +323,26 @@ add_files (notmuch_database_t *notmuch,
}
db_mtime = directory ? notmuch_directory_get_mtime (directory) : 0;
+ /* If the directory is unchanged from our last scan and has no
+ * sub-directories, then return without scanning it at all. In
+ * some situations, skipping the scan can substantially reduce the
+ * cost of notmuch new, especially since the huge numbers of files
+ * in Maildirs make scans expensive, but all files live in leaf
+ * directories.
+ *
+ * To check for sub-directories, we borrow a trick from find,
+ * kpathsea, and many other UNIX tools: since a directory's link
+ * count is the number of sub-directories (specifically, their
+ * '..' entries) plus 2 (the link from the parent and the link for
+ * '.'). This check is safe even on weird file systems, since
+ * file systems that can't compute this will return 0 or 1. This
+ * is safe even on *really* weird file systems like HFS+ that
+ * mistakenly return the total number of directory entries, since
+ * that only inflates the count beyond 2.
+ */
+ if (directory && fs_mtime == db_mtime && st.st_nlink == 2)
+ goto DONE;
+
/* If the database knows about this directory, then we sort based
* on strcmp to match the database sorting. Otherwise, we can do
* inode-based sorting for faster filesystem operation. */
--
1.8.4.rc3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] new: Don't scan unchanged directories with no sub-directories
2013-10-24 20:33 [PATCH] new: Don't scan unchanged directories with no sub-directories Austin Clements
@ 2013-10-24 21:08 ` Austin Clements
2013-10-24 21:38 ` [PATCH v2] " Austin Clements
0 siblings, 1 reply; 9+ messages in thread
From: Austin Clements @ 2013-10-24 21:08 UTC (permalink / raw)
To: notmuch
There might be a problem with this patch. Directory entries that are
*symlinks* to other directories do not increase the containing
directory's link count, but we do count them as directories in
add_files pass 1 and traverse in to them. Hence, if you had a
directory that contained no sub-directories, but did contain symlinks
to other directories, we would fail to notice changes in the symlinked
directories.
We could check if the database thinks there are sub-directories and
only bail early if the directory is unchanged and *both* the file
system and the database think there are no sub-directories.
Quoth myself on Oct 24 at 4:33 pm:
> This can substantially reduce the cost of notmuch new in some
> situations, such as when the file system cache is cold or when the
> Maildir is on NFS.
> ---
> notmuch-new.c | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> diff --git a/notmuch-new.c b/notmuch-new.c
> index faa33f1..364c73a 100644
> --- a/notmuch-new.c
> +++ b/notmuch-new.c
> @@ -323,6 +323,26 @@ add_files (notmuch_database_t *notmuch,
> }
> db_mtime = directory ? notmuch_directory_get_mtime (directory) : 0;
>
> + /* If the directory is unchanged from our last scan and has no
> + * sub-directories, then return without scanning it at all. In
> + * some situations, skipping the scan can substantially reduce the
> + * cost of notmuch new, especially since the huge numbers of files
> + * in Maildirs make scans expensive, but all files live in leaf
> + * directories.
> + *
> + * To check for sub-directories, we borrow a trick from find,
> + * kpathsea, and many other UNIX tools: since a directory's link
> + * count is the number of sub-directories (specifically, their
> + * '..' entries) plus 2 (the link from the parent and the link for
> + * '.'). This check is safe even on weird file systems, since
> + * file systems that can't compute this will return 0 or 1. This
> + * is safe even on *really* weird file systems like HFS+ that
> + * mistakenly return the total number of directory entries, since
> + * that only inflates the count beyond 2.
> + */
> + if (directory && fs_mtime == db_mtime && st.st_nlink == 2)
> + goto DONE;
> +
> /* If the database knows about this directory, then we sort based
> * on strcmp to match the database sorting. Otherwise, we can do
> * inode-based sorting for faster filesystem operation. */
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2] new: Don't scan unchanged directories with no sub-directories
2013-10-24 21:08 ` Austin Clements
@ 2013-10-24 21:38 ` Austin Clements
2013-10-25 11:46 ` Tomi Ollila
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Austin Clements @ 2013-10-24 21:38 UTC (permalink / raw)
To: notmuch
This can substantially reduce the cost of notmuch new in some
situations, such as when the file system cache is cold or when the
Maildir is on NFS.
---
This should fix the problem with directories containing symlinks to
other directories, but no actual sub-directories.
notmuch-new.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/notmuch-new.c b/notmuch-new.c
index faa33f1..ba05cb4 100644
--- a/notmuch-new.c
+++ b/notmuch-new.c
@@ -323,6 +323,35 @@ add_files (notmuch_database_t *notmuch,
}
db_mtime = directory ? notmuch_directory_get_mtime (directory) : 0;
+ /* If the directory is unchanged from our last scan and has no
+ * sub-directories, then return without scanning it at all. In
+ * some situations, skipping the scan can substantially reduce the
+ * cost of notmuch new, especially since the huge numbers of files
+ * in Maildirs make scans expensive, but all files live in leaf
+ * directories.
+ *
+ * To check for sub-directories, we borrow a trick from find,
+ * kpathsea, and many other UNIX tools: since a directory's link
+ * count is the number of sub-directories (specifically, their
+ * '..' entries) plus 2 (the link from the parent and the link for
+ * '.'). This check is safe even on weird file systems, since
+ * file systems that can't compute this will return 0 or 1. This
+ * is safe even on *really* weird file systems like HFS+ that
+ * mistakenly return the total number of directory entries, since
+ * that only inflates the count beyond 2.
+ */
+ if (directory && fs_mtime == db_mtime && st.st_nlink == 2) {
+ /* There's one catch: pass 1 below considers symlinks to
+ * directories to be directories, but these don't increase the
+ * file system link count. So, only bail early if the
+ * database agrees that there are no sub-directories. */
+ db_subdirs = notmuch_directory_get_child_directories (directory);
+ if (!notmuch_filenames_valid (db_subdirs))
+ goto DONE;
+ notmuch_filenames_destroy (db_subdirs);
+ db_subdirs = NULL;
+ }
+
/* If the database knows about this directory, then we sort based
* on strcmp to match the database sorting. Otherwise, we can do
* inode-based sorting for faster filesystem operation. */
--
1.8.4.rc3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2] new: Don't scan unchanged directories with no sub-directories
2013-10-24 21:38 ` [PATCH v2] " Austin Clements
@ 2013-10-25 11:46 ` Tomi Ollila
2013-10-25 11:59 ` Vladimir Marek
2013-10-26 0:13 ` David Bremner
2013-10-28 20:00 ` David Bremner
2 siblings, 1 reply; 9+ messages in thread
From: Tomi Ollila @ 2013-10-25 11:46 UTC (permalink / raw)
To: Austin Clements, notmuch
On Fri, Oct 25 2013, Austin Clements <amdragon@MIT.EDU> wrote:
> This can substantially reduce the cost of notmuch new in some
> situations, such as when the file system cache is cold or when the
> Maildir is on NFS.
> ---
LGTM. The creation and destruction of child directories happens
only if there are symlinks to directories in otherwise leaf directories.
Tomi
>
> This should fix the problem with directories containing symlinks to
> other directories, but no actual sub-directories.
>
> notmuch-new.c | 29 +++++++++++++++++++++++++++++
> 1 file changed, 29 insertions(+)
>
> diff --git a/notmuch-new.c b/notmuch-new.c
> index faa33f1..ba05cb4 100644
> --- a/notmuch-new.c
> +++ b/notmuch-new.c
> @@ -323,6 +323,35 @@ add_files (notmuch_database_t *notmuch,
> }
> db_mtime = directory ? notmuch_directory_get_mtime (directory) : 0;
>
> + /* If the directory is unchanged from our last scan and has no
> + * sub-directories, then return without scanning it at all. In
> + * some situations, skipping the scan can substantially reduce the
> + * cost of notmuch new, especially since the huge numbers of files
> + * in Maildirs make scans expensive, but all files live in leaf
> + * directories.
> + *
> + * To check for sub-directories, we borrow a trick from find,
> + * kpathsea, and many other UNIX tools: since a directory's link
> + * count is the number of sub-directories (specifically, their
> + * '..' entries) plus 2 (the link from the parent and the link for
> + * '.'). This check is safe even on weird file systems, since
> + * file systems that can't compute this will return 0 or 1. This
> + * is safe even on *really* weird file systems like HFS+ that
> + * mistakenly return the total number of directory entries, since
> + * that only inflates the count beyond 2.
> + */
> + if (directory && fs_mtime == db_mtime && st.st_nlink == 2) {
> + /* There's one catch: pass 1 below considers symlinks to
> + * directories to be directories, but these don't increase the
> + * file system link count. So, only bail early if the
> + * database agrees that there are no sub-directories. */
> + db_subdirs = notmuch_directory_get_child_directories (directory);
> + if (!notmuch_filenames_valid (db_subdirs))
> + goto DONE;
> + notmuch_filenames_destroy (db_subdirs);
> + db_subdirs = NULL;
> + }
> +
> /* If the database knows about this directory, then we sort based
> * on strcmp to match the database sorting. Otherwise, we can do
> * inode-based sorting for faster filesystem operation. */
> --
> 1.8.4.rc3
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] new: Don't scan unchanged directories with no sub-directories
2013-10-24 21:38 ` [PATCH v2] " Austin Clements
2013-10-25 11:46 ` Tomi Ollila
@ 2013-10-26 0:13 ` David Bremner
2013-10-26 11:52 ` David Bremner
2013-10-28 20:00 ` David Bremner
2 siblings, 1 reply; 9+ messages in thread
From: David Bremner @ 2013-10-26 0:13 UTC (permalink / raw)
To: Austin Clements, notmuch
Austin Clements <amdragon@MIT.EDU> writes:
> This can substantially reduce the cost of notmuch new in some
> situations, such as when the file system cache is cold or when the
> Maildir is on NFS.
On my desktop at home (a core i7 950) with spinning rust disks (and lvm
on luks) this patch yields about a 7% slowdown in the intial new perf
test
from
Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B)
Initial notmuch new 579.60 348.86 14.26 217188 5330266/3501272
to
Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B)
Initial notmuch new 620.51 368.62 15.48 217156 5330354/3416456
On an SSD I don't detect a significant different (<0.5% speedup)
d
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] new: Don't scan unchanged directories with no sub-directories
2013-10-24 21:38 ` [PATCH v2] " Austin Clements
2013-10-25 11:46 ` Tomi Ollila
2013-10-26 0:13 ` David Bremner
@ 2013-10-28 20:00 ` David Bremner
2013-10-28 20:46 ` Vladimir Marek
2 siblings, 1 reply; 9+ messages in thread
From: David Bremner @ 2013-10-28 20:00 UTC (permalink / raw)
To: Austin Clements, notmuch
Austin Clements <amdragon@MIT.EDU> writes:
> This can substantially reduce the cost of notmuch new in some
> situations, such as when the file system cache is cold or when the
> Maildir is on NFS.
> ---
pushed as commit 516efb7807
d
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-10-28 20:46 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-24 20:33 [PATCH] new: Don't scan unchanged directories with no sub-directories Austin Clements
2013-10-24 21:08 ` Austin Clements
2013-10-24 21:38 ` [PATCH v2] " Austin Clements
2013-10-25 11:46 ` Tomi Ollila
2013-10-25 11:59 ` Vladimir Marek
2013-10-26 0:13 ` David Bremner
2013-10-26 11:52 ` David Bremner
2013-10-28 20:00 ` David Bremner
2013-10-28 20:46 ` Vladimir Marek
Code repositories for project(s) associated with this public inbox
https://yhetil.org/notmuch.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).