From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 526891FBEF for ; Wed, 10 Jun 2020 07:05:23 +0000 (UTC) From: Eric Wong To: meta@public-inbox.org Subject: [PATCH 22/82] imap: speed up HEADER.FIELDS[.NOT] range fetches Date: Wed, 10 Jun 2020 07:04:19 +0000 Message-Id: <20200610070519.18252-23-e@yhbt.net> In-Reply-To: <20200610070519.18252-1-e@yhbt.net> References: <20200610070519.18252-1-e@yhbt.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: While we can't memoize the regexp forever like we do with other Eml users, we can still benefit from caching regexp compilation on a per-request basis. A FETCH request from mutt on a 4K message inbox is around 8% faster after this. Since regexp compilation via qr// isn't unbearably slow, a shared cache probably isn't worth the trouble of implementing. A per-request cache seems enough. --- lib/PublicInbox/IMAP.pm | 17 ++++++++--------- t/imap.t | 10 +++++++--- 2 files changed, 15 insertions(+), 12 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 0852ffab868..39667199080 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -544,25 +544,23 @@ sub hdrs_regexp ($) { # BODY[($SECTION_IDX.)?HEADER.FIELDS.NOT ($HDRS)]<$offset.$bytes> sub partial_hdr_not { - my ($eml, $section_idx, $hdrs) = @_; + my ($eml, $section_idx, $hdrs_re) = @_; if (defined $section_idx) { $eml = eml_body_idx($eml, $section_idx) or return; } my $str = $eml->header_obj->as_string; - my $re = hdrs_regexp($hdrs); - $str =~ s/$re//g; + $str =~ s/$hdrs_re//g; $str .= "\r\n"; } # BODY[($SECTION_IDX.)?HEADER.FIELDS ($HDRS)]<$offset.$bytes> sub partial_hdr_get { - my ($eml, $section_idx, $hdrs) = @_; + my ($eml, $section_idx, $hdrs_re) = @_; if (defined $section_idx) { $eml = eml_body_idx($eml, $section_idx) or return; } my $str = $eml->header_obj->as_string; - my $re = hdrs_regexp($hdrs); - join('', ($str =~ m/($re)/g), "\r\n"); + join('', ($str =~ m/($hdrs_re)/g), "\r\n"); } sub partial_prepare ($$$) { @@ -583,9 +581,10 @@ sub partial_prepare ($$$) { (?:HEADER\.FIELDS(\.NOT)?)\x20 # 2 \(([A-Z0-9\-\x20]+)\) # 3 - hdrs \](?:<([0-9]+)(?:\.([0-9]+))?>)?\z/sx) { # 4 5 - $partial->{$att} = [ $2 ? \&partial_hdr_not - : \&partial_hdr_get, - $1, $3, $4, $5 ]; + my $tmp = $partial->{$att} = [ $2 ? \&partial_hdr_not + : \&partial_hdr_get, + $1, undef, $4, $5 ]; + $tmp->[2] = hdrs_regexp($3); } else { undef; } diff --git a/t/imap.t b/t/imap.t index fe6352b678c..451b6596bf9 100644 --- a/t/imap.t +++ b/t/imap.t @@ -46,17 +46,21 @@ use PublicInbox::IMAPD; my $partial_body = \&PublicInbox::IMAP::partial_body; my $partial_hdr_get = \&PublicInbox::IMAP::partial_hdr_get; my $partial_hdr_not = \&PublicInbox::IMAP::partial_hdr_not; + my $hdrs_regexp = \&PublicInbox::IMAP::hdrs_regexp; is_deeply($x, { 'BODY[9]' => [ $partial_body, 9, undef, undef, undef ], 'BODY[9]<5>' => [ $partial_body, 9, undef, 5, undef ], 'BODY[9]<5.1>' => [ $partial_body, 9, undef, 5, 1 ], 'BODY[1.1]' => [ $partial_body, '1.1', undef, undef, undef ], 'BODY[HEADER.FIELDS (DATE FROM)]' => [ $partial_hdr_get, - undef, 'DATE FROM', undef, undef ], + undef, $hdrs_regexp->('DATE FROM'), + undef, undef ], 'BODY[HEADER.FIELDS.NOT (TO)]' => [ $partial_hdr_not, - undef, 'TO', undef, undef ], + undef, $hdrs_regexp->('TO'), + undef, undef ], 'BODY[1.1.HEADER.FIELDS (TO)]' => [ $partial_hdr_get, - '1.1', 'TO', undef, undef ], + '1.1', $hdrs_regexp->('TO'), + undef, undef ], }, 'structure matches expected'); }