From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1.migadu.com ([2001:41d0:403:58f0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms13.migadu.com with LMTPS id yHm7DDjz7mb0lgAA62LTzQ:P1 (envelope-from ) for ; Sat, 21 Sep 2024 16:24:24 +0000 Received: from aspmx1.migadu.com ([2001:41d0:403:58f0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1.migadu.com with LMTPS id yHm7DDjz7mb0lgAA62LTzQ (envelope-from ) for ; Sat, 21 Sep 2024 18:24:24 +0200 X-Envelope-To: larch@yhetil.org Authentication-Results: aspmx1.migadu.com; dkim=fail ("body hash did not verify") header.d=gmail.com header.s=20230601 header.b=SR8obi35; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 2a01:4f9:c011:7a79::1 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1726935864; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-owner:list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=EmD7Qw+nBFrWhsSt5ahkVAUonxHCNJ53M50mxEil8xU=; b=LGQYyUAjNLuDCGDMsMwlOpoAcvXx8Fu0FY95SUgCR6qu7KAZx8fCGpDtrFpnQPgke74Ywg vPdjUh1rIokCPsNiLfewTZjMHROV5odn//iDw96Wm9qC689k+m3s7hujVy1arR4vRQVW0O vT+s2Nli52xK6qcavD3QZnpOtsm2XunhYckTrCOQSuE6qMQO1N+RadrbjPw6lIKd9W56tT z11UBGaFc/WAtpXBwjFquil6sSpZqTUUehoLuWNvXyh87mVYCvTU6CkXdC4Zfh3ocId3OD xCJS4fdgJYpZdMiTVhYbONaWnbwXuCaygrxNo3jF3Der9RqiONgJadkXy2Lehg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("body hash did not verify") header.d=gmail.com header.s=20230601 header.b=SR8obi35; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 2a01:4f9:c011:7a79::1 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none) ARC-Seal: i=1; s=key1; d=yhetil.org; t=1726935864; a=rsa-sha256; cv=none; b=V5VDQpkjkrjStya2bs4rOpK5Id2+52lL/e0ByUZLNK42u707QXjnMlohNKZ5eugHPYJAuf 5b2IL62pLIYlUoHJwtDC637vCYylwHfuXDXfYtqbEAFUv35xjfqdaWBondRWzwO+DwbZT4 CnNTGro4kgHc17CaygNvTgEAcWumkkOyXGP8sg5A3UM5hUEQsQ1mtA4uGA7B2LVOQ8+kU4 BL3uT8xistHbNLj3ogplKisA92aLfo7+LcwO44dc1eW2Yp1P3VEaTFxjEXsPOlHpd4oPGu sz8rsKXSkWPFz0jGZXbIpnX08sU6SvbIBzPZ+WyHZpH3Gr4aC5GOP0Ee/YFHoQ== Received: from mail.notmuchmail.org (yantan.tethera.net [IPv6:2a01:4f9:c011:7a79::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id B8E7072C31 for ; Sat, 21 Sep 2024 18:24:22 +0200 (CEST) Received: from yantan.tethera.net (localhost [127.0.0.1]) by mail.notmuchmail.org (Postfix) with ESMTP id C8F915F81C; Sat, 21 Sep 2024 16:24:19 +0000 (UTC) Received: from mail-qv1-xf32.google.com (mail-qv1-xf32.google.com [IPv6:2607:f8b0:4864:20::f32]) by mail.notmuchmail.org (Postfix) with ESMTPS id 3608F5E527 for ; Sat, 21 Sep 2024 16:24:17 +0000 (UTC) Received: by mail-qv1-xf32.google.com with SMTP id 6a1803df08f44-6c3f12d3930so24074976d6.3 for ; Sat, 21 Sep 2024 09:24:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1726935855; x=1727540655; darn=notmuchmail.org; h=mime-version:message-id:date:references:in-reply-to:subject:cc:to :from:from:to:cc:subject:date:message-id:reply-to; bh=G+bk6bpVZjjkkhq1wc+b4n5J5/B5Le528mmwk4Mg6po=; b=SR8obi356TQyv5ed4yOf7NqQIK7+/dTWCxSHKM0hBf5tzmrretZJEPbh9HYsYSttgK qPZNKGrcAVBgOHsyV2R79GnoSv3ZPy4JqvYKTP4I1FBmMdhUKNcNG+QA48kPakssJbfo T/Z3jMxNvxutI6vvJ26LNMllh4dKiZhkuOl9oOjt4U3Zwa959F+WSc2be1YDyoLhB0Ca 5ChlFialeIscM3FKcWs+7x0/Cjpdjl0dX2xj44z0f/f6y/4zH2ErNv7RTGk7/9DY2tFi ykpD6R8MiiVuEQ7dE7wWyI1ovuHLQponhzFuEhrCF5VOGMhtFgyPpEQnboNiqxmAT59Y YnIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726935855; x=1727540655; h=mime-version:message-id:date:references:in-reply-to:subject:cc:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=G+bk6bpVZjjkkhq1wc+b4n5J5/B5Le528mmwk4Mg6po=; b=Lt+2p7YvO5dS8CF0TMwJ7hvyMTJJyZgH4f3NDvXJaRQ2iAh4Q0bboLOlSY5v/xQiBh Fr2YgH8tk63SSObkI8GaR3l7wQU2G9zIuvNbuYoYRpK6J/5EMKgZYLnG0nxrWE4WXRTL 5yfflKf2qUYcgr3psACwOwhUjmDZ10PIJDwbATUFgFBlMhnpF7LocoyEHHnfPdRJztGp MduqU3wRlX9XyIIpyLk+HMO/ZHwud7cIp/yCkDEWIQGshuXmojW+yOYGBsFcHe66Glub FBIoDmI9DbedHndpPxs3CD8JQO42rcmN5zAUy+P7Do644eDRGsmLrnfwbNfNuKDEt5ed qfRQ== X-Gm-Message-State: AOJu0Yz1lYchXWweRPYicPBPHwy3jBRBSOQkpPoHIX3eukFAh6byH/fn izE9OsXlcZ0EXa8svfaIRfyetFdEw1mPaSsY+7u2uzVtbbr9t8fO X-Google-Smtp-Source: AGHT+IG/HoSK/BoaziEJJqlvP/0biJjClfEJBAEth58I0H7dO2YI7pwH1Z7wqNO4+mjcg5B4gKZYKg== X-Received: by 2002:a05:6214:4411:b0:6c3:5b2a:e3d6 with SMTP id 6a1803df08f44-6c7bc83703amr96623136d6.43.1726935855421; Sat, 21 Sep 2024 09:24:15 -0700 (PDT) Received: from localhost ([2600:2b00:8e39:e100:72d:f05e:c079:6318]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6c75e586f31sm29937076d6.142.2024.09.21.09.24.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 21 Sep 2024 09:24:15 -0700 (PDT) From: Panayotis Manganaris To: frederik@ofb.net Subject: Re: searching for a message by path In-Reply-To: <20240921032340.opozeclfbyqzw2yt@localhost> References: <20240920175232.zryeqyl76nbydiab@localhost> <87zfo1dfa1.fsf@pengjiz.com> <20240921032340.opozeclfbyqzw2yt@localhost> Date: Sat, 21 Sep 2024 12:24:14 -0400 Message-ID: <87wmj52cwh.fsf@ASCALON.mail-host-address-is-not-set> MIME-Version: 1.0 Message-ID-Hash: EKMLJSJZGUFNL7JQOJEEKBYKEM3BKFG3 X-Message-ID-Hash: EKMLJSJZGUFNL7JQOJEEKBYKEM3BKFG3 X-MailFrom: panos.manganaris@gmail.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-notmuch.notmuchmail.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: notmuch@notmuchmail.org X-Mailman-Version: 3.3.3 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_IN X-Migadu-Country: DE X-Migadu-Queue-Id: B8E7072C31 X-Migadu-Scanner: mx11.migadu.com X-Spam-Score: -2.50 X-Migadu-Spam-Score: -2.50 X-TUID: y52VRinKvKyB Frederick Eaton writes: > > Suppose the filter script reads a message from a particular file and decides that it is > spam. How does the filter tell Notmuch that the message corresponding to that file is spam? > You seem to be saying below that the filter script should extract the Message-ID and use it > to identify the message to Notmuch, since file paths of the messages are not > indexed. Probably what my script should be doing for each message is appending a line to a > batch file like this: > > +spam -new -- id:some_message_id@foo > +inbox -new -- id:some_other@baz > > and then passing the batch file to "notmuch tag"? > Hello Fredrick, you are exactly correct. This is what I've written to handle spam filtering in my notmuch post-new hook. Like you, I have notmuch configured to assign newly fetched mail with tag "new" notmuch search --output=messages 'tag:new' > /tmp/msgs notmuch search --output=files 'tag:new' |\ bogofilter -o0.7,0.7 -bt |\ paste - /tmp/msgs |\ awk '$1 ~ /S/ { print "-new +spam", "-", $3 }' |\ notmuch tag --batch This should run under any shell. My chosen filter is bogofilter. The -bt flags tell it to operate on a stdin "batch" of file paths and return a "terse" summary of results e.g. H 0.248913 S 0.999999 This script operates on the assumption that the order of results from notmuch queries are always the same, which is fortunately true. >>>I've tentatively concluded that the best way to locate each message in the Notmuch database >>>is to extract the Message-ID and search for it with "id:"? But the FAQ says that multiple >>>messages can have the same Message-ID (and some spam messages don't have one at all). Your instinct to use batch tagging and id: queries is correct. I collect my new message ids in /tmp/msgs. These ids are unique, they are definitely unique enough to be used to tag individual messages on a daily basis. If you prefer to tag entire threads as spam the moment a single message is spam, you can simply use notmuch search --output=threads 'tag:new' > /tmp/msgs I prefer to manually mute threads with a mute tag, but Thread ids are definitely unique. If you want auto-tag spam in an existing archive, then you will need to first manually tag a good quantity of messages (100-1000) you consider to be spam and a good quantity of messages (100-1000) you consider to be ham and use them to train the filter e.g. notmuch search --output=files 'tag:spam' | bogofilter -bs notmuch search --output=files 'tag:inbox' | bogofilter -bn >>>If I could access the message using the filename that the script is processing, it would >>>seem slightly more reliable. It seems like there should be some way to allow a Notmuch >>>database entry to be accessed directly by filename, without even creating a Notmuch-style >>>search query containing that filename, but rather by passing the filename as a command-line >>>argument to "notmuch". It would be nice not to have to worry about quoting and unquoting. >> >>I am not sure if this is useful, given that (presumably) Notmuch uses message IDs as >>keys. Besides, those filenames are usually generated automatically and quite cryptic. > > It might be useful for the reasons I stated, namely in case the Message-ID does not exist or > is not unique. I think mail that is successfully transmitted through a mail host necessarily obtains a message id, but I might be wrong. I believe notmuch indexes on both it's own unique thread ids and the message ids. Thereby further decreasing the already minuscule chance of message id collisions. -- Best, Panos