From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id E099C6DE01CE for ; Tue, 3 Jan 2017 09:23:36 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: 0.03 X-Spam-Level: X-Spam-Status: No, score=0.03 tagged_above=-999 required=5 tests=[AWL=0.141, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MNd0yuFwNJz2 for ; Tue, 3 Jan 2017 09:23:36 -0800 (PST) Received: from istari.evenmere.org (istari.evenmere.org [136.248.125.194]) by arlo.cworth.org (Postfix) with ESMTP id F12F46DE00AC for ; Tue, 3 Jan 2017 09:23:35 -0800 (PST) Received: by istari.evenmere.org (Postfix, from userid 1000) id 3743C1E00B2; Tue, 3 Jan 2017 12:23:34 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=evenmere.org; s=default; t=1483464214; bh=9Rn7GGod9h1ILXZ0cGPwDIJ9Mr6MuW4Y/anmdzb8inU=; h=From:To:Subject:In-Reply-To:References:Date:From; b=fCZXxhfeVsCvbycJ81dHiM3gqcqOgC+tMOxb/0o5fHitF+0uMpyIuHfSUOkDnVxkl U1sTDdhwl4yhI88rFXprOBZYu2v4fN9cxQp7ZULKgyzKCrvBESWyWCEtxHEwpuNGoo v275WtLJoddd5N6ZLhDF4OOxM/BGvETfpAfAev/k= From: Brian Sniffen To: Bart Bunting , notmuch@notmuchmail.org Subject: Re: converting attachments to text In-Reply-To: References: Date: Tue, 03 Jan 2017 12:23:34 -0500 Message-ID: <874m1gukfd.fsf@istari.evenmere.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Jan 2017 17:23:37 -0000 --=-=-= Content-Type: text/plain Sure! Here's what I use for docx, and I think it could be adapted to pdf with pdftotext or whatever you're already using there. You need a small shell script that reads from STDIN, writes to a file, and calls pandoc or pdftotext or whatever, like ~/bin/antiwordx: #!/bin/sh tmpfile=$(mktemp /tmp/antiwordx.XXXXXX.docx) trap 'rm -f -- "$tmpfile"' INT TERM HUP EXIT cat > "$tmpfile" pandoc --normalize -r docx -w markdown "$tmpfile" You need a small handler function to call it from Elisp---see attached file `inline-docx.el`, which assumed you have both the old `antiword` for old-style .doc files and pandoc for new-style `docx`. I apologize for the roughness of the code; it should probably use customizable paths for pandoc and such. -Brian --=-=-= Content-Type: application/emacs-lisp Content-Disposition: attachment; filename=inline-docx.el Content-Transfer-Encoding: quoted-printable (defun mm-inline-msword (handle)=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20 (let (text)=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 (with-temp-buffer=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 (mm-insert-part handle)=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20 (call-process-region (point-min) (point-max) "antiword" t t nil "-")= =20=20=20=20=20=20=20=20 (setq text (buffer-string)))=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20 (mm-insert-inline handle text)))=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20 =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20 (defun mm-inline-docx (handle) "pandoc --normalize -r docx -w markdown %s" (let (text)=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 (with-temp-buffer=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 (mm-insert-part handle)=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20 (let ((coding-system-for-read 'utf-8)) (call-process-region (point-min) (point-max) "/Users/bts/bin/antiwordx" t = t nil)) (setq text (buffer-string)))=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20 (mm-insert-inline handle text)))=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20 (setq my-inline-mime-tests '(("text/rtf" mm-inline-rtf (lambda (handle) (let ((name (mail-content-type-get (mm-handle-disposition handle) 'filename))) (and name (equal ".rtf" (substring name -4 nil)))))) ("application/x-msword" mm-inline-docx (lambda (handle) (let ((name (mail-content-type-get (mm-handle-disposition handle) 'filename))) (and name (equal ".docx" (substring name -5 nil)))))) ("application/x-msword" mm-inline-msword (lambda (handle) (let ((name (mail-content-type-get (mm-handle-disposition handle) 'filename))) (and name (equal ".doc" (substring name -4 nil)))))) ("application/vnd.openxmlformats-officedocument.wordprocessingml.doc= ument" mm-inline-docx identity) ("application/octet-stream" mm-inline-docx (lambda (handle) (let ((name (mail-content-type-get (mm-handle-disposition handle) 'filename))) (and name (equal ".docx" (substring name -5 nil)))))) ("application/octet-stream" mm-inline-msword (lambda (handle) (let ((name (mail-content-type-get (mm-handle-disposition handle) 'filename))) (and name (equal ".doc" (substring name -4 nil)))))) ("application/msword" mm-inline-msword identity))) (mapcar (lambda (x) (add-to-list 'mm-inlined-types (car x))) my-inline-mime-tests) (mapcar (lambda (x) (add-to-list 'mm-inline-media-tests x)) my-inline-mime-tests) --=-=-= Content-Type: text/plain Bart Bunting writes: > Hi, > > Just looking for some pointers. > > I have to deal with quite a few emails with attachments in either pdf or > word format. > > I'm on a mac so can use applescript or something pdftotext or similar to > convert them to text. > > I'm blind so use emacspeak as my primary interface. Having an easy way > to convert the notmuch attachments to text other than saving to a file > and processing them would greatly speed up my workflow. > > Is there something in existance already to do this sort of thing? > > I have a little rudimentary lisp skill so can hack something up if > someone can give me some pointers on a direction to head in. > > Any advice appreciated. > > Kind regards > > Bart > > Kind regards > Bart > -- > > Bart Bunting - URSYS > PH: 02 87452811 > Mbl: 0409560005 > _______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > https://notmuchmail.org/mailman/listinfo/notmuch --=-=-=--