From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id AEKVD7kMJWCRNQAA0tVLHw (envelope-from ) for ; Thu, 11 Feb 2021 10:53:45 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id 6HsnC7kMJWCrPwAAbx9fmQ (envelope-from ) for ; Thu, 11 Feb 2021 10:53:45 +0000 Received: from mail.notmuchmail.org (nmbug.tethera.net [144.217.243.247]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (2048 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 0D6899403D2 for ; Thu, 11 Feb 2021 10:53:43 +0000 (UTC) Received: from nmbug.tethera.net (localhost [127.0.0.1]) by mail.notmuchmail.org (Postfix) with ESMTP id 26D301FF55; Thu, 11 Feb 2021 05:53:33 -0500 (EST) Received: from fethera.tethera.net (fethera.tethera.net [IPv6:2607:5300:60:c5::1]) by mail.notmuchmail.org (Postfix) with ESMTP id D807C1FD68 for ; Thu, 11 Feb 2021 05:53:29 -0500 (EST) Received: by fethera.tethera.net (Postfix, from userid 1001) id F2FC8607FB; Thu, 11 Feb 2021 05:53:27 -0500 (EST) Received: (nullmailer pid 2380896 invoked by uid 1000); Thu, 11 Feb 2021 10:53:27 -0000 From: David Bremner To: Gregor Zattler , xapian-discuss , notmuch Subject: Re: out of memory on idle machine In-Reply-To: <20210207214654.GA2906@no.workgroup> References: <20201213131909.GD21521@no.workgroup> <87zh2hhk15.fsf@tethera.net> <20201213141543.GE21521@no.workgroup> <20201213151336.GF21521@no.workgroup> <20201213212252.GH21521@no.workgroup> <20201214192251.GA7858@no.workgroup> <20210130085432.GA14025@no.workgroup> <87bld6shrk.fsf@tethera.net> <20210131081638.GA5640@no.workgroup> <87sg6dweds.fsf@tethera.net> <20210207214654.GA2906@no.workgroup> Date: Thu, 11 Feb 2021 06:53:27 -0400 Message-ID: <87o8gqx4d4.fsf@tethera.net> MIME-Version: 1.0 Message-ID-Hash: SPAPCO2FA4IPRASRGMDPKSHUH5MIFWBK X-Message-ID-Hash: SPAPCO2FA4IPRASRGMDPKSHUH5MIFWBK X-MailFrom: david@tethera.net X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-notmuch.notmuchmail.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.2.1 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Help: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_IN X-Migadu-Spam-Score: -0.55 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 144.217.243.247 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org X-Migadu-Queue-Id: 0D6899403D2 X-Spam-Score: -0.55 X-Migadu-Scanner: scn0.migadu.com X-TUID: WHHGewhbAsh8 Gregor Zattler writes: > Hi David, notmuch and xapian developers, > * David Bremner [03. Feb. 2021]: > > >> Olly Betts mentioned in a different thread that he will build a version >> of xapian 1.4.18 for buster backports, so trying with that is probably a >> good step when it is available. > > I'll do so. > At this point I don't really have any good ideas, so I'm waiting for results from the 1.4.18 trial. >> >> Comparing those two lists may give you some hints. Any directory that >> shows up in the second list but not the first should have no files in it >> (but potentially other directories) or be ignored either implicitly >> (.notmuch, .notmuch/xapian) or explicitely by your configuration. > > > I tried this after the second OOM (see above): > > grfz@mic:~/Mail/.notmuch$ rm -rf xapian > grfz@mic:~/Mail/.notmuch$ cp -a xapian-3 xapian > > grfz@mic:~/Mail/.notmuch$ xapian-delve -1 -A XDIRECTORY ~/Mail/.notmuch/xapian | sort -u > /tmp/delve.txt > grfz@mic:~/Mail/.notmuch$ find ~/Mail/ -type d -not -empty | sed s,/home/grfz/Mail/,XDIRECTORY, |sort -u >/tmp/find.txt > > As expected the find.txt contains a huge amount of ../cur > directories. > > I searched for the directories in find.txt, which are not in > delve.txt: > > grfz@mic:~/Mail/.notmuch$ while read ; do grep -qF "$REPLY" /tmp/delve.txt 2>/dev/null || echo $REPLY ; done < /tmp/find.txt > /tmp/out > > It contains a huge amount of ../cur directories. > > I searched if some of these ../cur directories do not > contain files: > > grfz@mic:~/Mail/.notmuch$ cat /tmp/out | grep "/cur$" | sed -e "s,XDIRECTORY,/home/grfz/Mail/," | while read ; do cd "$REPLY" ; test $(find -type f | wc -l) = 0 && echo "$REPLY" ; done > /home/grfz/Mail/findex/cur I don't have any /cur directories in my version. I do have a few (3 or 4) /tmp directories that are apparently not indexed. That's a bit mysterious, but nothing on the scale of what you are seeing. d