From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id OKGNLxYtAWNYNwEAbAwnHQ (envelope-from ) for ; Sat, 20 Aug 2022 20:51:02 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id OI3ALhYtAWOmgQEAG6o9tA (envelope-from ) for ; Sat, 20 Aug 2022 20:51:02 +0200 Received: from mail.notmuchmail.org (yantan.tethera.net [IPv6:2a01:4f9:c011:7a79::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 7DAA4ED8F for ; Sat, 20 Aug 2022 20:51:02 +0200 (CEST) Received: from yantan.tethera.net (localhost [127.0.0.1]) by mail.notmuchmail.org (Postfix) with ESMTP id 9A83B5FD47; Sat, 20 Aug 2022 18:50:53 +0000 (UTC) Received: from fethera.tethera.net (fethera.tethera.net [IPv6:2607:5300:60:c5::1]) by mail.notmuchmail.org (Postfix) with ESMTP id 8F3D15FD41 for ; Sat, 20 Aug 2022 18:50:50 +0000 (UTC) Received: by fethera.tethera.net (Postfix, from userid 1001) id ECB2E5FBC6; Sat, 20 Aug 2022 14:50:49 -0400 (EDT) Received: (nullmailer pid 289648 invoked by uid 1000); Sat, 20 Aug 2022 18:50:32 -0000 From: David Bremner To: notmuch@notmuchmail.org Subject: [PATCH 2/3] test: add known broken test for indexing text/* attachments Date: Sat, 20 Aug 2022 11:50:06 -0700 Message-Id: <20220820185007.289543-3-david@tethera.net> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220820185007.289543-1-david@tethera.net> References: <20220820185007.289543-1-david@tethera.net> MIME-Version: 1.0 Message-ID-Hash: FYXLVAM4R5D3W7P7SYUYDQO33IK65R5R X-Message-ID-Hash: FYXLVAM4R5D3W7P7SYUYDQO33IK65R5R X-MailFrom: bremner@tethera.net X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-notmuch.notmuchmail.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: jwilk@jwilk.net X-Mailman-Version: 3.3.3 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_IN X-Migadu-To: larch@yhetil.org X-Migadu-Country: DE ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1661021462; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-owner:list-unsubscribe:list-subscribe:list-post; bh=mBj3Yg6pzA+ZJyyZMwei9hIMECUWUbdzepopiTySXj8=; b=eFkd44d32O+Q4xdzf3Roh5LERpzcXA3xCSURtBzIh0EGZ043hs59XkezK9ClezmhXVxasE vUbBqmVEM8dCeVlVqFeCatQn0j7lG6MsvZCIzzp19C/uFgWFbIvw1gSg/mapgZU97kgq94 J/+WB9nXYoW/KmbBkogos3o9nJXStb4sl+3aB5HUclUOUJqb0Wf+aH8eIGPwv8y70SmN2M 12SwfVBePWnGkVgJxtW/KGyo5pqmld5CXFF8dG9p1zQPhn8fkCXeyu+aVyIHC6EmGG3Ste m+pUCtJh4ZvRwFS9w07KvAkjH8CXYw6V+HwY4cmDhbTMvGERzAZmcq7XBcMlEA== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1661021462; a=rsa-sha256; cv=none; b=By/5/Kw29Ge/RoyOQJpzejmBbfjdCVGGgw7O/D6sJ5HrDXu+WaNPyvIA3I3YJbZgjd0IQL sraQLchPYcUFRtYldpcuVQ1vYRgvkmsz92YKqMoDMsukeusvarMlMB8B0AAA1yHrxMO/V1 HzXa8bzwJPiyLPrsxUsBD60E592a3WDBMF+PFgP368pSGxFLjs+pna5rnbqNFCTEypyPQz Kw/ZJUNkPCaJPGDOIsuxsH6e425gNtVU0ejZqiLy09aJS+nMIM5Bu6hxtGlJWskFo1Uxmo TtJdFycmOFnC2IAav8ePuSRAhYq3Oy1PCLaW4UUIEqthJIk2sLzDJHFY1Ejj6A== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 2a01:4f9:c011:7a79::1 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org X-Migadu-Spam-Score: -1.08 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 2a01:4f9:c011:7a79::1 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org X-Migadu-Queue-Id: 7DAA4ED8F X-Spam-Score: -1.08 X-Migadu-Scanner: scn1.migadu.com X-TUID: JZ8Db3klBesA The general problem of indexing attachments requires some help to turn things into text, but (most?) text/* should be doable internally, possibly with optimizations as for the text/html case. --- test/T050-new.sh | 8 + ...TCH-1-2-system_data_types.7-srcfix.txt:2,S | 282 ++++++++++++++++++ 2 files changed, 290 insertions(+) create mode 100644 test/corpora/indexing/PATCH-1-2-system_data_types.7-srcfix.txt:2,S diff --git a/test/T050-new.sh b/test/T050-new.sh index 6791f87c..cb67889c 100755 --- a/test/T050-new.sh +++ b/test/T050-new.sh @@ -455,4 +455,12 @@ Date: Fri, 17 Jun 2016 22:14:41 -0400 EOF test_expect_equal_file EXPECTED OUTPUT +add_email_corpus indexing + +test_begin_subtest "index text/* attachments" +test_subtest_known_broken +notmuch search id:20200930101213.2m2pt3jrspvcrxfx@localhost.localdomain > EXPECTED +notmuch search id:20200930101213.2m2pt3jrspvcrxfx@localhost.localdomain and ersatz > OUTPUT +test_expect_equal_file_nonempty EXPECTED OUTPUT + test_done diff --git a/test/corpora/indexing/PATCH-1-2-system_data_types.7-srcfix.txt:2,S b/test/corpora/indexing/PATCH-1-2-system_data_types.7-srcfix.txt:2,S new file mode 100644 index 00000000..1361c6f2 --- /dev/null +++ b/test/corpora/indexing/PATCH-1-2-system_data_types.7-srcfix.txt:2,S @@ -0,0 +1,282 @@ +From mboxrd@z Thu Jan 1 00:00:00 1970 +Return-Path: +X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on + aws-us-west-2-korg-lkml-1.web.codeaurora.org +X-Spam-Level: +X-Spam-Status: No, score=-8.3 required=3.0 tests=BAYES_00,DKIM_SIGNED, + DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, + HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, + SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no + version=3.4.0 +Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) + by smtp.lore.kernel.org (Postfix) with ESMTP id AFE3FC4727E + for ; Wed, 30 Sep 2020 10:12:21 +0000 (UTC) +Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) + by mail.kernel.org (Postfix) with ESMTP id 4E0D62074A + for ; Wed, 30 Sep 2020 10:12:21 +0000 (UTC) +Authentication-Results: mail.kernel.org; + dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Osm9Pn67" +Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand + id S1725823AbgI3KMU (ORCPT ); + Wed, 30 Sep 2020 06:12:20 -0400 +Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50038 "EHLO + lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org + with ESMTP id S1725779AbgI3KMU (ORCPT + ); Wed, 30 Sep 2020 06:12:20 -0400 +Received: from mail-pf1-x443.google.com (mail-pf1-x443.google.com [IPv6:2607:f8b0:4864:20::443]) + by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5026DC061755 + for ; Wed, 30 Sep 2020 03:12:20 -0700 (PDT) +Received: by mail-pf1-x443.google.com with SMTP id b124so832681pfg.13 + for ; Wed, 30 Sep 2020 03:12:20 -0700 (PDT) +DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; + d=gmail.com; s=20161025; + h=date:from:to:cc:subject:message-id:references:mime-version + :content-disposition:in-reply-to:user-agent; + bh=qR1FJVXOhU6/g+m4SoSco3vMtV+CNvRvNyXS1xuG+T4=; + b=Osm9Pn67G380QiA1ORltntJShSHlKg/KZZfKV8ebvfEXJw9893EO0N6J6GDR+zkmHi + TOQuIe7x9y95Pipm54rWWEW33U3gwoXRHsPc2Kivm6L8Ixb+f0T0rMPKw/FOkL8OGo9t + WmmSvnlErAXHqBq9aRAJJsf2bSlDgdAyYY1Qe6PSq2hKi2rg+sOy1Vaj4RqZ6jTK/DWY + tX28Ql0XS3kKWp0Lc8MNsSP+SXlcdwHQYll5LeReAg1oi++hICgWphuMmo3OH+2B1WtO + hMH7VuUONqbuE1aLoZ6PyyUlCeN1soJd8bKY0cmY0TKCsw0Jvkuh/XzYDVNi6wOSM6Ez + okpA== +X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; + d=1e100.net; s=20161025; + h=x-gm-message-state:date:from:to:cc:subject:message-id:references + :mime-version:content-disposition:in-reply-to:user-agent; + bh=qR1FJVXOhU6/g+m4SoSco3vMtV+CNvRvNyXS1xuG+T4=; + b=TJU+duGLhruSES/5sJy4y1wfcltfokDpA58edkSUJyasvsooUo67VNtOB3ZK49iHm5 + C/cjy0ExxTECB0aM6p+B1jcePdWoPUaVBY9bVd/Q5DNhm4KhTO8ON96gB43d2rLWLOiK + /Y1vCu+MwOpY0JQTojbC140s/JYccR/KPapTmbUkRzrpmeoYqw8CbBPV60rQxYCn9GUu + FeCXJY5q9OfaYW1viQZoBL5n1IMMpJDVa61Q8gZ33b3wRCvQv/x1eZCsVlYpjcqf7Umc + /Amx3i27cxvo8pSvvwiTzrlJHJv0Gkytz13i7s+zW+XKzZRyzy3yirtU2DFTGat6FeMn + H8Ig== +X-Gm-Message-State: AOAM530Yon7xNOW6kiuy6bVpbpwbzR/9pldRB49OtZaSAHAZg7Gyf7qE + JXgAH20rZzYlwqOZyeZCeAwtWh09PeI= +X-Google-Smtp-Source: ABdhPJxzyZAVDBtMwQ5+dUqVg37y/LgZByrSaTxvhS6wnx6sJuG8ROItw0CwDAg939XUVADeje/nZQ== +X-Received: by 2002:a63:c547:: with SMTP id g7mr1563654pgd.234.1601460739764; + Wed, 30 Sep 2020 03:12:19 -0700 (PDT) +Received: from localhost.localdomain ([1.129.172.177]) + by smtp.gmail.com with ESMTPSA id k14sm1804437pjd.45.2020.09.30.03.12.17 + (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); + Wed, 30 Sep 2020 03:12:19 -0700 (PDT) +Date: Wed, 30 Sep 2020 20:12:15 +1000 +From: "G. Branden Robinson" +To: "Michael Kerrisk (man-pages)" +Cc: Jakub Wilk , linux-man@vger.kernel.org +Subject: Re: [PATCH 1/2] system_data_types.7: srcfix +Message-ID: <20200930101213.2m2pt3jrspvcrxfx@localhost.localdomain> +References: <20200925080330.184303-1-colomar.6.4.3@gmail.com> + <20200927061015.4obt73pdhyh7wecu@localhost.localdomain> + <20200928132959.x4koforqnzohxh5u@jwilk.net> + <9b8303fe-969e-c9f0-e3cd-0590b342d5bf@gmail.com> +MIME-Version: 1.0 +Content-Type: multipart/signed; micalg=pgp-sha256; + protocol="application/pgp-signature"; boundary="jg2hlfugxpumieke" +Content-Disposition: inline +In-Reply-To: <9b8303fe-969e-c9f0-e3cd-0590b342d5bf@gmail.com> +User-Agent: NeoMutt/20180716 +Precedence: bulk +List-ID: +X-Mailing-List: linux-man@vger.kernel.org + + +--jg2hlfugxpumieke +Content-Type: multipart/mixed; boundary="wl6i3r6gpq7ibouc" +Content-Disposition: inline + + +--wl6i3r6gpq7ibouc +Content-Type: text/plain; charset=us-ascii +Content-Disposition: inline +Content-Transfer-Encoding: quoted-printable + +Hi Jakub and Michael, + +At 2020-09-29T14:13:26+0200, Michael Kerrisk (man-pages) wrote: +> On 9/28/20 3:29 PM, Jakub Wilk wrote: +> > Hi Branden! +> >=20 +> > In groff_man_style(7) you wrote: +> >> Unused macro arguments are more often simply omitted, or good style +> >> suggests that a more appropriate macro be chosen, that earlier +> >> arguments are more important than later ones, or that arguments +> >> have identical significance such that skipping any is superfluous. +> >=20 +> > After 15 minutes of gawking at this sentence, I still don't +> > understand what are you trying to say here. The sentence should be +> > either thoroughly rephrased or removed. +>=20 +> I must say that I too found it hard to parse. I presume, Branden, +> that you mean: +>=20 +> [[ +> Unused macro arguments are more often simply omitted, or good style=20 +> suggests +> EITHER (1)=20 +> that a more appropriate macro be chosen,=20 +> (2) +> that earlier arguments are more important than later ones, or +> (3) +> that arguments have=20 +> identical significance such that skipping any is superfluous. +> ]] + +You got it. But it was too much work. + +> But it takes a few scans to work that out. Perhaps break this into +> smaller pieces, or add some explicit structuring elements to the +> sentence? + +I was trying to be comprehensive with respect to several anti-patterns I +had in mind. However, using the anti-patterns concretely is premature +at that point in the page. So I both expanded and relocated the +material. + +I'm attaching what I've just committed to groff git. + +Further feedback is welcome, of course; revision of documentation is a +process that is never completed, only abandoned. And I haven't given up +yet. :) + +Thank you both for your reviews. + +Regards, +Branden + +--wl6i3r6gpq7ibouc +Content-Type: text/x-diff; charset=us-ascii +Content-Disposition: attachment; filename="excise_standardese.diff" +Content-Transfer-Encoding: quoted-printable + +commit dd2c4cf05a659ae7127e342924668ff0fa0deaa1 +Author: G. Branden Robinson +Date: Wed Sep 30 19:56:38 2020 +1000 + + groff_man_style(7): Clarify empty macro arguments. + =20 + Rewrite some ersatz standardese I had managed to concoct regarding why + empty macro arguments are usually not needed. Put an expanded + discussion, with anti-patterns and remedies, in section "Notes", with + forward reference from subsection "Macro reference preliminaries". + =20 + Thanks to Jakub Wilk and Michael Kerrisk for the critique. + +diff --git a/tmac/groff_man.7.man.in b/tmac/groff_man.7.man.in +index c62d97ba..b96cbaf4 100644 +--- a/tmac/groff_man.7.man.in ++++ b/tmac/groff_man.7.man.in +@@ -281,23 +281,8 @@ but the + package is designed such that this should seldom be necessary. + _ifstyle()dnl + . +-Unused macro arguments are more often simply omitted, +-.\" antipattern: '.TP ""' (just '.TP' will do) +-or good style suggests that a more appropriate macro be chosen, +-.\" antipattern: '.BI "" italic bold' (use '.IB' instead) +-that earlier arguments are more important than later ones, +-.\" antipattern: '.TH foo 1 "" "foo "1.2.3"' (don't skip the date!) +-.\" antipattern: '.IP "" 4n' (use .TP or .RS/.RE, depending on needs) +-or that arguments have identical significance such that skipping any is +-superfluous. +-.\" antipattern: '.B one two "" three' (pointless) +-.\" Technically, the above has a side-effect of additional space +-.\" between "two" and "three", but there are much more obvious ways of +-.\" getting it if desired. +-.\" .B "one two three" +-.\" .B one "two " three +-.\" .B one two " three" +-.\" .B one two\~ three ++See section \(lqNotes\(rq below for examples of cases where better ++alternatives to empty arguments in macro calls are available. + _endif()dnl + . + Most macro arguments are strings that will be output as text; +@@ -3235,6 +3220,63 @@ Some tips on troubleshooting your man pages follow. + . + . + .TP ++\(bu Do I ever need to use an empty macro argument ("")? ++Probably not. ++. ++When this seems necessary, ++often a shorter or clearer alternative is available. ++. ++.\" antipattern: '.TP ""' (just '.TP' will do) ++.\" antipattern: '.BI "" italic bold' (use '.IB' instead) ++.\" antipattern: '.TH foo 1 "" "foo 1.2.3"' (don't skip the date!) ++.\" antipattern: '.IP "" 4n' (use .TP or .RS/.RE, depending on needs) ++.\" antipattern: '.B one two "" three' (pointless) ++.\" Technically, the above has a side-effect of additional space ++.\" between "two" and "three", but there are much more obvious ways of ++.\" getting it if desired. ++.\" .B "one two three" ++.\" .B one "two " three ++.\" .B one two " three" ++.\" .B one two\~ three ++.TS ++c c ++lfCB lfCB. ++Instead of.\|.\. .\|.\|.do this. ++_ ++\&.TP \(dq\(dq .TP ++\&.BI \(dq\(dq italic-text bold-text .IB italic-text bold-text ++\&.TH foo 1 \(dq\(dq \(dqfoo 1.2.3\(dq .TH foo 1 \ ++\f(CIyyyy\fP-\f(CImm\fP-\f(CIdd\fP \(dqfoo 1.2.3\(dq ++\&.IP \(dq\(dq 4n .TP 4n ++\&.B one two \(dq\(dq three .B one two three ++.TE ++. ++. ++.IP ++In the title heading ++.RB ( .TH ), ++the date of the page's last revision is more important than packaging ++information; ++it should not be omitted. ++. ++Ideally, ++a page maintainer will keep both up to date. ++. ++. ++.IP ++In the last example, ++the empty argument does have a subtly different effect than its ++suggested replacement; ++the empty argument becomes an additional space character\(embut it is a ++regular breaking space, ++so it can be discarded at the end of an output line. ++. ++It is better not to be subtle, ++particularly with space, ++which can be overlooked in source and rendered forms. ++. ++. ++.TP + .RB \(bu " .RS" " doesn't indent relative to my indented paragraph" + The + .B .RS + +--wl6i3r6gpq7ibouc-- + +--jg2hlfugxpumieke +Content-Type: application/pgp-signature; name="signature.asc" + +-----BEGIN PGP SIGNATURE----- + +iQIzBAEBCAAdFiEEh3PWHWjjDgcrENwa0Z6cfXEmbc4FAl90WfUACgkQ0Z6cfXEm +bc5raQ/9GhXG/5U7McaEEu+aW1IgaTYTMbsMpew5u3tBlj3/IenGzsy8wDO912BD +aHPSedYoc485k1Vh/Kowyx569RhyIXiMtH7uINCEtheMSUNgITNFqXo8mhaqVMlU +3JoV12btQddOIqHnGX6c5V9Z38KXFmVctD6CxjLaWGLp/Bu9tSKwSaHOOmtUYyOv +fYpMzr0amd4z9f+O8PPnToqBhwUitEvis1ZHYU6gIj8VwOjD0gNsWjA9HR3uC3c9 +GK/R5przMANrNejzSgofm0/yAL6a61WhqhYEtzLUYu2NFnsyNJWzITNsNnoxzgQ5 +liKL0Onmw0YWjOo4Z9Zht9Iyd6JhJxW0uRwlpFhE6UlCkFHK8nbv3NbHT2xlx/po +rxY5jDC3Ap3+mdYHY8k5o8vFd4QOXc2bSTuDRZoWtFZQsjnl4Fpkqks1W54Txq4y +o3Vu9aOPx//Jfi8sDc/qD/mFnyUu+AMFWjIj8UxQN4HmbrbXg/DEczRfP68DjOiX +ssy/0Rmm/H1cu7oBMoSss63mpk/NvPTSzzCR+VhU4PHQ7rxSZYS105tzkBVfe37e +hSS00rQVWe2YnI1KkfJHFjzveHiPXf+IxC0Z4PpJuLhl+pIZ/FgxJ5yEkX0XVUIy +aYRzKI3JaJktYl6WvulKSBPzQxIyOgrqVkZW4lv/uTh64pE6E5w= +=oeam +-----END PGP SIGNATURE----- + +--jg2hlfugxpumieke-- + -- 2.35.1