From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id D84D96DE0318 for ; Mon, 9 Jan 2017 16:57:59 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -0.912 X-Spam-Level: X-Spam-Status: No, score=-0.912 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.211, SPF_PASS=-0.001] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZWBU80AK3EVn for ; Mon, 9 Jan 2017 16:57:58 -0800 (PST) Received: from mail-pf0-f175.google.com (mail-pf0-f175.google.com [209.85.192.175]) by arlo.cworth.org (Postfix) with ESMTPS id 5368E6DE00D2 for ; Mon, 9 Jan 2017 16:57:58 -0800 (PST) Received: by mail-pf0-f175.google.com with SMTP id 127so36535485pfg.1 for ; Mon, 09 Jan 2017 16:57:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ursys-com-au.20150623.gappssmtp.com; s=20150623; h=from:to:subject:in-reply-to:references:date:message-id:mime-version; bh=Ea6o7sC9Ljje4e4rWucoHTQI/95bF+kdnZlFaPnBD4E=; b=K/hlfVvT3ubGbl0PWn4/XrGCf+q4VXR131YlKRWO5RTIkLP2Ze518KSS85ERirHZRd ix5hWdHr4aWhOhwhguEWNqDhs3YM7eQS4VPozr9yVsvYYjp0CCHfCeRGTY2AjJ2uUwud xRMdjjlrIM0GRBut3i8LQCMNLptUS4KDy40yAU9rTojAaqvoU6s5e50kQHDs+dECYYd3 7s20uMvr96VoqZ9i5K44kXkRZncx3iE/faNFog0GsYHrDnvbJr3XYMmQNSv35W9M8UkX yrLCKI4KUoXywgPOyUH1gMvqzPYdDH0sXzkwTC2aG5RiCnArpUv5gBTlT3THK325v4V9 uyOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:in-reply-to:references:date :message-id:mime-version; bh=Ea6o7sC9Ljje4e4rWucoHTQI/95bF+kdnZlFaPnBD4E=; b=OFCb/Qf/xneQ6RQ9/0badyY4/A7VP7xGYeD/AVXV3f8VKbG7t+e6Oidn5k2EnLtoHL KbIncoa5Euv1Gu4+v9+/6NP5uP0JU4oLuzNq+iwyE7tCjGp+BorMGRBpZPDUK1kxirpS +yHJ2zjJLv6UDo5eqg/PojJC+QeKMEGn5Aj2SjYQ5U5vzCSVnO475iB9EPHzLrrkAs0s +mrW8HfwRpv8/GMSqU0SxNOxrX/y4LucYkLZrG2jxGtBl0q/FnhleYRoNTUwNd1oBwIo Ju3s3fzRQbmHROBklL5pEHxJyBrnx2jCS9f0Tx+wnZnbf7bIfeY2BcCv7ZMrk5+xuK8S 2kPw== X-Gm-Message-State: AIkVDXJJ4HRMes4lisPVoLXQyAU6pA4dphg/WSrgoci8qofW77u9tnysqt2xPmDU19Wz/MRe X-Received: by 10.99.43.8 with SMTP id r8mr657119pgr.83.1484009877700; Mon, 09 Jan 2017 16:57:57 -0800 (PST) Received: from fiz.local (c-108927-7098-VAIES-222-865.cust.nxg.net.au. [121.200.226.37]) by smtp.gmail.com with ESMTPSA id n86sm252826pfb.45.2017.01.09.16.57.56 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 09 Jan 2017 16:57:57 -0800 (PST) From: Bart Bunting X-Google-Original-From: Bart Bunting Received: by fiz.local (Postfix, from userid 501) id 58D43154E3A2; Tue, 10 Jan 2017 11:57:54 +1100 (AEDT) To: Brian Sniffen , notmuch@notmuchmail.org Subject: Re: converting attachments to text In-Reply-To: <874m1gukfd.fsf@istari.evenmere.org> References: <874m1gukfd.fsf@istari.evenmere.org> Date: Tue, 10 Jan 2017 11:57:54 +1100 Message-ID: MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jan 2017 00:58:00 -0000 Hi Brian, Thanks so much for this, it has done exactly what I was after! Kind regards Bart Brian Sniffen writes: > Sure! Here's what I use for docx, and I think it could be adapted to > pdf with pdftotext or whatever you're already using there. You need a > small shell script that reads from STDIN, writes to a file, and calls > pandoc or pdftotext or whatever, like ~/bin/antiwordx: > > #!/bin/sh > > tmpfile=$(mktemp /tmp/antiwordx.XXXXXX.docx) > trap 'rm -f -- "$tmpfile"' INT TERM HUP EXIT > cat > "$tmpfile" > pandoc --normalize -r docx -w markdown "$tmpfile" > > You need a small handler function to call it from Elisp---see attached > file `inline-docx.el`, which assumed you have both the old `antiword` > for old-style .doc files and pandoc for new-style `docx`. > > I apologize for the roughness of the code; it should probably use > customizable paths for pandoc and such. > > -Brian > > > (defun mm-inline-msword (handle) > (let (text) > (with-temp-buffer > (mm-insert-part handle) > (call-process-region (point-min) (point-max) "antiword" t t nil "-") > (setq text (buffer-string))) > (mm-insert-inline handle text))) > > (defun mm-inline-docx (handle) > "pandoc --normalize -r docx -w markdown %s" > (let (text) > (with-temp-buffer > (mm-insert-part handle) > (let ((coding-system-for-read 'utf-8)) > (call-process-region (point-min) (point-max) "/Users/bts/bin/antiwordx" t t nil)) > (setq text (buffer-string))) > (mm-insert-inline handle text))) > > (setq my-inline-mime-tests > '(("text/rtf" mm-inline-rtf > (lambda > (handle) > (let > ((name > (mail-content-type-get > (mm-handle-disposition handle) > 'filename))) > (and name > (equal ".rtf" > (substring name -4 nil)))))) > ("application/x-msword" mm-inline-docx > (lambda > (handle) > (let > ((name > (mail-content-type-get > (mm-handle-disposition handle) > 'filename))) > (and name > (equal ".docx" > (substring name -5 nil)))))) > ("application/x-msword" mm-inline-msword > (lambda > (handle) > (let > ((name > (mail-content-type-get > (mm-handle-disposition handle) > 'filename))) > (and name > (equal ".doc" > (substring name -4 nil)))))) > ("application/vnd.openxmlformats-officedocument.wordprocessingml.document" mm-inline-docx identity) > ("application/octet-stream" mm-inline-docx > (lambda > (handle) > (let > ((name > (mail-content-type-get > (mm-handle-disposition handle) > 'filename))) > (and name > (equal ".docx" > (substring name -5 nil)))))) > ("application/octet-stream" mm-inline-msword > (lambda > (handle) > (let > ((name > (mail-content-type-get > (mm-handle-disposition handle) > 'filename))) > (and name > (equal ".doc" > (substring name -4 nil)))))) > ("application/msword" mm-inline-msword identity))) > > (mapcar (lambda (x) (add-to-list 'mm-inlined-types (car x))) > my-inline-mime-tests) > > (mapcar (lambda (x) (add-to-list 'mm-inline-media-tests x)) > my-inline-mime-tests) > > > Bart Bunting writes: > >> Hi, >> >> Just looking for some pointers. >> >> I have to deal with quite a few emails with attachments in either pdf or >> word format. >> >> I'm on a mac so can use applescript or something pdftotext or similar to >> convert them to text. >> >> I'm blind so use emacspeak as my primary interface. Having an easy way >> to convert the notmuch attachments to text other than saving to a file >> and processing them would greatly speed up my workflow. >> >> Is there something in existance already to do this sort of thing? >> >> I have a little rudimentary lisp skill so can hack something up if >> someone can give me some pointers on a direction to head in. >> >> Any advice appreciated. >> >> Kind regards >> >> Bart >> >> Kind regards >> Bart >> -- >> >> Bart Bunting - URSYS >> PH: 02 87452811 >> Mbl: 0409560005 >> _______________________________________________ >> notmuch mailing list >> notmuch@notmuchmail.org >> https://notmuchmail.org/mailman/listinfo/notmuch Bart -- Bart Bunting - URSYS PH: 02 87452811 Mbl: 0409560005