From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 481151F66F; Tue, 10 Nov 2020 18:53:51 +0000 (UTC) Date: Tue, 10 Nov 2020 18:53:51 +0000 From: Eric Wong To: meta@public-inbox.org Subject: detached external index: performance note Message-ID: <20201110185351.GA11848@dcvr> References: <20201027075453.19163-1-e@80x24.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20201027075453.19163-1-e@80x24.org> List-Id: Eric Wong wrote: > Not sure about the usability aspects, but I think this can > replace the need for per-inbox Xapian DBs and save a truckload > of disk space (and more importantly: cache space). Per-inbox > over.sqlite3 remains required for compatibility with NNTP/IMAP > and existing WWW code. Keeping v2 indexlevel=basic (*.sqlite3) and git repos on HDD and putting -extindex on SSD seems to work reasonably well. Xapian on HDD is really painful. > Performance isn't great, it took 30+ hours to index my mirror of > lore on a SATA SSD, but the entire index is <200GB due to > deduplication between cross posts. Still a problem for RAM-starved systems and Xapian :< Larger systems can use --batch-size, and maybe Sys::Meminfo can be used (if installed) to determine a larger batch-size by default. Fortunately, Sys::Meminfo is packaged for FreeBSD, CentOS and Debian so it can be an optional dependency.