unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* converting attachments to text
@ 2017-01-03  7:27 Bart Bunting
  2017-01-03  7:49 ` Daniel Kahn Gillmor
  2017-01-03 17:23 ` Brian Sniffen
  0 siblings, 2 replies; 4+ messages in thread
From: Bart Bunting @ 2017-01-03  7:27 UTC (permalink / raw)
  To: notmuch


Hi,

Just looking for some pointers.

I have to deal with quite a few emails with attachments in either pdf or
word format.

I'm on a mac so can use applescript or something pdftotext or similar to
convert them to text.

I'm blind so use emacspeak as my primary interface.  Having an easy way
to convert the notmuch attachments to text other than saving to a file
and processing them would greatly speed up my workflow.

Is there something in existance already to do this sort of thing?

I have a little rudimentary lisp skill so can hack something up if
someone can give me some pointers on a direction to head in.

Any advice appreciated.

Kind regards

Bart

Kind regards
Bart
-- 

Bart Bunting - URSYS
PH: 02 87452811
Mbl: 0409560005

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: converting attachments to text
  2017-01-03  7:27 converting attachments to text Bart Bunting
@ 2017-01-03  7:49 ` Daniel Kahn Gillmor
  2017-01-03 17:23 ` Brian Sniffen
  1 sibling, 0 replies; 4+ messages in thread
From: Daniel Kahn Gillmor @ 2017-01-03  7:49 UTC (permalink / raw)
  To: Bart Bunting, notmuch

[-- Attachment #1: Type: text/plain, Size: 692 bytes --]

On Tue 2017-01-03 02:27:23 -0500, Bart Bunting wrote:
> I'm blind so use emacspeak as my primary interface.  Having an easy way
> to convert the notmuch attachments to text other than saving to a file
> and processing them would greatly speed up my workflow.

I use notmuch-emacs, and frequently pipe message parts (attachments)
into pipelines with ". |" (that is, the dot key on my keyboard, followed
by the pipe key on my keyboard) while the cursor is over the button
representing the attachment.

If any of your tools can work in a pipeline (e.g. "pdftotext - -"), you
could try that, but i don't know how to feed the output of the pipeline
into emacspeak.

hope this helps,

     --dkg



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: converting attachments to text
  2017-01-03  7:27 converting attachments to text Bart Bunting
  2017-01-03  7:49 ` Daniel Kahn Gillmor
@ 2017-01-03 17:23 ` Brian Sniffen
  2017-01-10  0:57   ` Bart Bunting
  1 sibling, 1 reply; 4+ messages in thread
From: Brian Sniffen @ 2017-01-03 17:23 UTC (permalink / raw)
  To: Bart Bunting, notmuch

[-- Attachment #1: Type: text/plain, Size: 771 bytes --]

Sure!  Here's what I use for docx, and I think it could be adapted to
pdf with pdftotext or whatever you're already using there.  You need a
small shell script that reads from STDIN, writes to a file, and calls
pandoc or pdftotext or whatever, like ~/bin/antiwordx:

    #!/bin/sh

    tmpfile=$(mktemp /tmp/antiwordx.XXXXXX.docx)
    trap 'rm -f -- "$tmpfile"' INT TERM HUP EXIT
    cat > "$tmpfile"
    pandoc --normalize -r docx -w markdown "$tmpfile"

You need a small handler function to call it from Elisp---see attached
file `inline-docx.el`, which assumed you have both the old `antiword`
for old-style .doc files and pandoc for new-style `docx`.

I apologize for the roughness of the code; it should probably use
customizable paths for pandoc and such.

-Brian


[-- Attachment #2: inline-docx.el --]
[-- Type: application/emacs-lisp, Size: 3445 bytes --]

[-- Attachment #3: Type: text/plain, Size: 993 bytes --]



Bart Bunting <bart.bunting@ursys.com.au> writes:

> Hi,
>
> Just looking for some pointers.
>
> I have to deal with quite a few emails with attachments in either pdf or
> word format.
>
> I'm on a mac so can use applescript or something pdftotext or similar to
> convert them to text.
>
> I'm blind so use emacspeak as my primary interface.  Having an easy way
> to convert the notmuch attachments to text other than saving to a file
> and processing them would greatly speed up my workflow.
>
> Is there something in existance already to do this sort of thing?
>
> I have a little rudimentary lisp skill so can hack something up if
> someone can give me some pointers on a direction to head in.
>
> Any advice appreciated.
>
> Kind regards
>
> Bart
>
> Kind regards
> Bart
> -- 
>
> Bart Bunting - URSYS
> PH: 02 87452811
> Mbl: 0409560005
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> https://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: converting attachments to text
  2017-01-03 17:23 ` Brian Sniffen
@ 2017-01-10  0:57   ` Bart Bunting
  0 siblings, 0 replies; 4+ messages in thread
From: Bart Bunting @ 2017-01-10  0:57 UTC (permalink / raw)
  To: Brian Sniffen, notmuch

Hi Brian,
Thanks so much for this, it has done exactly what I was after!


Kind regards

Bart

Brian Sniffen <bts@evenmere.org> writes:

> Sure!  Here's what I use for docx, and I think it could be adapted to
> pdf with pdftotext or whatever you're already using there.  You need a
> small shell script that reads from STDIN, writes to a file, and calls
> pandoc or pdftotext or whatever, like ~/bin/antiwordx:
>
>     #!/bin/sh
>
>     tmpfile=$(mktemp /tmp/antiwordx.XXXXXX.docx)
>     trap 'rm -f -- "$tmpfile"' INT TERM HUP EXIT
>     cat > "$tmpfile"
>     pandoc --normalize -r docx -w markdown "$tmpfile"
>
> You need a small handler function to call it from Elisp---see attached
> file `inline-docx.el`, which assumed you have both the old `antiword`
> for old-style .doc files and pandoc for new-style `docx`.
>
> I apologize for the roughness of the code; it should probably use
> customizable paths for pandoc and such.
>
> -Brian
>
>
> (defun mm-inline-msword (handle)                                                  
>   (let (text)                                                                     
>     (with-temp-buffer                                                             
>       (mm-insert-part handle)                                                     
>       (call-process-region (point-min) (point-max) "antiword" t t nil "-")        
>       (setq text (buffer-string)))                                                
>     (mm-insert-inline handle text)))                                              
>                                                                                   
> (defun mm-inline-docx (handle)
>   "pandoc --normalize -r docx -w markdown %s"
>   (let (text)                                                                     
>     (with-temp-buffer                                                             
>       (mm-insert-part handle)                                                     
>       (let ((coding-system-for-read 'utf-8))
> 	(call-process-region (point-min) (point-max) "/Users/bts/bin/antiwordx" t t nil))
>       (setq text (buffer-string)))                                                
>     (mm-insert-inline handle text)))                                              
>
> (setq my-inline-mime-tests
>      '(("text/rtf" mm-inline-rtf
>         (lambda
>           (handle)
>           (let
>               ((name
>                 (mail-content-type-get
>                  (mm-handle-disposition handle)
>                  'filename)))
>             (and name
>                  (equal ".rtf"
>                         (substring name -4 nil))))))
>        ("application/x-msword" mm-inline-docx
>         (lambda
>           (handle)
>           (let
>               ((name
>                 (mail-content-type-get
>                  (mm-handle-disposition handle)
>                  'filename)))
>             (and name
>                  (equal ".docx"
>                         (substring name -5 nil))))))
>        ("application/x-msword" mm-inline-msword
>         (lambda
>           (handle)
>           (let
>               ((name
>                 (mail-content-type-get
>                  (mm-handle-disposition handle)
>                  'filename)))
>             (and name
>                  (equal ".doc"
>                         (substring name -4 nil))))))
>        ("application/vnd.openxmlformats-officedocument.wordprocessingml.document" mm-inline-docx identity)
>        ("application/octet-stream" mm-inline-docx
>         (lambda
>           (handle)
>           (let
>               ((name
>                 (mail-content-type-get
>                  (mm-handle-disposition handle)
>                  'filename)))
>             (and name
>                  (equal ".docx"
>                         (substring name -5 nil))))))
>        ("application/octet-stream" mm-inline-msword
>         (lambda
>           (handle)
>           (let
>               ((name
>                 (mail-content-type-get
>                  (mm-handle-disposition handle)
>                  'filename)))
>             (and name
>                  (equal ".doc"
>                         (substring name -4 nil))))))
>        ("application/msword" mm-inline-msword identity)))
>
> (mapcar (lambda (x) (add-to-list 'mm-inlined-types (car x)))
>         my-inline-mime-tests)
>
> (mapcar (lambda (x) (add-to-list 'mm-inline-media-tests x))
>         my-inline-mime-tests)
>
>
> Bart Bunting <bart.bunting@ursys.com.au> writes:
>
>> Hi,
>>
>> Just looking for some pointers.
>>
>> I have to deal with quite a few emails with attachments in either pdf or
>> word format.
>>
>> I'm on a mac so can use applescript or something pdftotext or similar to
>> convert them to text.
>>
>> I'm blind so use emacspeak as my primary interface.  Having an easy way
>> to convert the notmuch attachments to text other than saving to a file
>> and processing them would greatly speed up my workflow.
>>
>> Is there something in existance already to do this sort of thing?
>>
>> I have a little rudimentary lisp skill so can hack something up if
>> someone can give me some pointers on a direction to head in.
>>
>> Any advice appreciated.
>>
>> Kind regards
>>
>> Bart
>>
>> Kind regards
>> Bart
>> -- 
>>
>> Bart Bunting - URSYS
>> PH: 02 87452811
>> Mbl: 0409560005
>> _______________________________________________
>> notmuch mailing list
>> notmuch@notmuchmail.org
>> https://notmuchmail.org/mailman/listinfo/notmuch
Bart
-- 

Bart Bunting - URSYS
PH: 02 87452811
Mbl: 0409560005

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-01-10  0:57 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-03  7:27 converting attachments to text Bart Bunting
2017-01-03  7:49 ` Daniel Kahn Gillmor
2017-01-03 17:23 ` Brian Sniffen
2017-01-10  0:57   ` Bart Bunting

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).