From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id E9E0E6DE1AFC for ; Wed, 1 Mar 2017 14:45:21 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -0.012 X-Spam-Level: X-Spam-Status: No, score=-0.012 tagged_above=-999 required=5 tests=[AWL=0.298, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.211, SPF_PASS=-0.001] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ThHlLVkbo5jX for ; Wed, 1 Mar 2017 14:45:20 -0800 (PST) Received: from mail-io0-f178.google.com (mail-io0-f178.google.com [209.85.223.178]) by arlo.cworth.org (Postfix) with ESMTPS id C7ACB6DE1A63 for ; Wed, 1 Mar 2017 14:45:19 -0800 (PST) Received: by mail-io0-f178.google.com with SMTP id f84so40691857ioj.0 for ; Wed, 01 Mar 2017 14:45:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=Y9gUfeStjEyPDqCCFgE47jrdjQhFl/4xhzHtQSTan9M=; b=fL3CS51bMOcJZC39n8f+kx72bjlKCJgrmcXFKkqbKo+OYyy/8XJYVe7TFOwp3tmJc4 Iou9cnCDpxe/YOATLiD5U8pamWjAiUKAXZ0fwDMXBkkwBs32HlUFY5VJqbtKeBTh2wVV Eq5MPkvV+b/+DM9inhdMuPuCrCv2spnlf6J9a6I/PGeCqblZ57GsXuY36F8aod5y55R9 lH+Ew0N/cWMuwwsOI3k4yb7+fzDGNpVI4CmjTavCns/BQPwqd5wALenPmTeMVu4zpdZ7 TcQnWwx4ZybXfkbIZCziBqzDgB5mKJwLVJb9MTrVd3236vXV/kmH4UNQg6W22jmVcVMH PLGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=Y9gUfeStjEyPDqCCFgE47jrdjQhFl/4xhzHtQSTan9M=; b=RAifmqCZ2bXVIpekLox4w4M2T/oMUo6eH3tFqxEFero9NDf5/k9zaCW8lXX/r9rvcu y/gMcQkooqSqw8Na63wYbY6tnaFCI7THLTjJDUDFU3fv75qwMIW6X2FvjgKEaZXBDjFp w9kronBF+xRUizdHp/4EULQLdB3fLbmSnb10Z5BS2PiWTxj+uz/UMJqX2EgRvjos/YEl syqaD8BEbkOF89qSOkM6H5ytV9j8twdV4hR2MmTFeBUOHNJhLdPT6KXzyVAVMaCXT1tN p9atUQ1Pr1xK1/AqugEk07Knlxh13E3Zaq04RzyMJH65H77yawdnqBvIA2915kVNJ/M1 C9Zg== X-Gm-Message-State: AMke39mfZbvXulB2FuiVL533vrPgHBeXyR5ptaLKjujnsL0PP+9APSDhZwpk5OedvzISVY0m+glx96/XjHwEQQ== X-Received: by 10.107.140.82 with SMTP id o79mr10966460iod.17.1488408318776; Wed, 01 Mar 2017 14:45:18 -0800 (PST) MIME-Version: 1.0 Received: by 10.50.224.137 with HTTP; Wed, 1 Mar 2017 14:45:18 -0800 (PST) In-Reply-To: <877f48lw4s.fsf@bistromath> References: <87inntut68.fsf@tethera.net> <877f48lw4s.fsf@bistromath> From: Olaf TNSB Date: Thu, 2 Mar 2017 09:45:18 +1100 Message-ID: Subject: Re: Add (extracted) attachment text to the search index? To: Steven Allen Cc: notmuch@notmuchmail.org, David Bremner Content-Type: multipart/alternative; boundary=94eb2c0624da1f81c40549b311cd X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Mar 2017 22:45:22 -0000 --94eb2c0624da1f81c40549b311cd Content-Type: text/plain; charset=UTF-8 On Thu, Mar 2, 2017 at 4:55 AM, Steven Allen wrote: > > > David Bremner writes: > > This would require some modifications of notmuch. Either modifying > > lib/index.cc to add the terms at indexing (notmuch new/insert) time, or > > providing some way of adding the terms later. The former actually sounds > > simpler to me. > > To do this correctly, you'd want to be able to run an external text > extraction tool (for PDFs, word documents, etc.) so I think the latter > would be better in the long run (it would allow the user to index > attachments in the hooks). (As a non-dev...) I agree. The ability to add (and delete!) content post-insert sounds more desirable. I don't want to have to re-index all my email as the next version of -to-text gets released. I'd like to be able to (search-for-attachment)-(delete)-(re-add). I was thinking a really hacky solution would be fake up a new email with the same headers but body being the attachment text, doing a notmuch new/insert and then replacing the file on disk of the new email with a link to the original message (not sure if that will trigger notmuch new, I don't think so). Doesn't feel robust, but... What do ya reckon? --94eb2c0624da1f81c40549b311cd Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

On Thu, Mar 2, 2017 at 4:55 AM, Steven Allen= <steven@stebalien.com> w= rote:
>
>
> David Bremner <david@tethera.net> writes:
> > This would require= some modifications of notmuch. Either modifying
> > lib/index.cc = to add the terms at indexing (notmuch new/insert) time, or
> > pro= viding some way of adding the terms later. The former actually sounds
&g= t; > simpler to me.
>
> To do this correctly, you'd want= to be able to run an external text
> extraction tool (for PDFs, word= documents, etc.) so I think the latter
> would be better in the long= run (it would allow the user to index
> attachments in the hooks).
(As a non-dev...) I agree.=C2=A0 The ability to add (and delete= !) content post-insert sounds more desirable.=C2=A0 I don't want to hav= e to re-index all my email as the next version of <horrible-binary-objec= t>-to-text gets released.=C2=A0 I'd like to be able to (search-for-a= ttachment)-(delete)-(re-add).


I was thinking a really hack= y solution would be fake up a new email with the same headers but body bein= g the attachment text, doing a notmuch new/insert and then replacing the fi= le on disk of the new email with a link to the original message (not sure i= f that will trigger notmuch new, I don't think so).=C2=A0 Doesn't f= eel robust, but...


What do ya reckon?

--94eb2c0624da1f81c40549b311cd--