From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 6A7B81F4C0; Mon, 21 Oct 2019 11:34:11 +0000 (UTC) Date: Mon, 21 Oct 2019 11:34:11 +0000 From: Eric Wong To: meta@public-inbox.org Subject: Re: [PATCH 3/3] v2writable: reindex handles 3-headered monsters Message-ID: <20191021113411.wm7rgoccrvhtygrn@dcvr> References: <20191021110221.23753-1-e@80x24.org> <20191021110221.23753-4-e@80x24.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20191021110221.23753-4-e@80x24.org> List-Id: Eric Wong wrote: > +++ b/lib/PublicInbox/V2Writable.pm > +sub reindex_oid ($$$$) { > + } else { # multiple MIDs are a weird case: > + my $del = 0; > + for (@$mids) { > + $del += delete($sync->{D}->{"$_\0$cid"}) // 0; > + } > + if ($del) { > + unindex_oid_remote($self, $oid, $_) for @$mids; > + # do not delete from {mm_tmp}, since another > + # single-MID message may use it. > + } else { # handle them at the end: > + push @{$sync->{multi_mid} //= []}, $oid; Part of me worres @$multi_mid can be abused here by people trying to OOM the indexing process... > @@ -1184,10 +1265,22 @@ sub index_sync { > > # unindex is required for leftovers if "deletes" affect messages > # in a previous fetch+index window: > + my $git; > if (my @leftovers = values %{delete $sync->{D}}) { > - my $git = $self->{-inbox}->git; > - unindex_oid($self, $git, $_) for @leftovers; > - $git->cleanup; > + $git = $self->{-inbox}->git; > + for my $oid (@leftovers) { > + $self->{current_info} = "leftover $oid"; > + unindex_oid($self, $git, $oid); > + } > + } > + if (my $multi_mid = delete $sync->{multi_mid}) { > + $git //= $self->{-inbox}->git; > + > + while (defined(my $oid = pop(@$multi_mid))) { > + $self->{current_info} = "multi_mid $oid"; > + reindex_oid_m($self, $sync, $git, $oid); > + } > + $git->cleanup if $git; > } > $self->done; I suppose we could easily write to a temporary file and use fixed offsets, too; but fixed offsets would make git + SHA-256 more difficult. So maybe Tie::File (stdlib) or a SQLite-based stack could work, too...