all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* [bug#31949] [PATCH] gnu: Add docx2txt.
@ 2018-06-23 13:32 Pierre Neidhardt
  2018-06-25 20:58 ` Ludovic Courtès
  0 siblings, 1 reply; 4+ messages in thread
From: Pierre Neidhardt @ 2018-06-23 13:32 UTC (permalink / raw)
  To: 31949

* gnu/packages/textutils.scm (docx2txt): New variable.
---
 gnu/packages/textutils.scm | 65 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 65 insertions(+)

diff --git a/gnu/packages/textutils.scm b/gnu/packages/textutils.scm
index 5734bf62d..8eec045a6 100644
--- a/gnu/packages/textutils.scm
+++ b/gnu/packages/textutils.scm
@@ -14,6 +14,7 @@
 ;;; Copyright © 2017 Kei Kebreau <kkebreau@posteo.net>
 ;;; Copyright © 2017 Alex Vong <alexvong1995@gmail.com>
 ;;; Copyright © 2018 Tobias Geerinckx-Rice <me@tobias.gr>
+;;; Copyright © 2018 Pierre Neidhardt <ambrevar@gmail.com>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -675,3 +676,67 @@ and Cython.")
 measuring and checking the width of strings, with support east asian text.")
     (home-page "https://github.com/jessevdk/go-flags")
     (license license:expat)))
+
+(define-public docx2txt
+  (package
+    (name "docx2txt")
+    (version "1.4")
+    (source (origin
+              (method url-fetch)
+              (uri (string-append
+                    "http://downloads.sourceforge.net/docx2txt/docx2txt-"
+                    version ".tgz"))
+              (sha256
+               (base32
+                "06vdikjvpj6qdb41d8wzfnyj44jpnknmlgbhbr1w215420lpb5xj"))))
+    (build-system gnu-build-system)
+    (inputs
+     `(("unzip" ,unzip)
+       ("perl" ,perl)))
+    (arguments
+     `(#:tests? #f                      ; No tests.
+       #:make-flags (list (string-append "BINDIR=" (assoc-ref %outputs "out") "/bin")
+                          (string-append "CONFIGDIR=" (assoc-ref %outputs "out") "/etc")
+                          ;; Makefile seems to be a bit dumb at guessing.
+                          (string-append "INSTALL=install")
+                          (string-append "PERL=perl"))
+       #:phases
+       (modify-phases %standard-phases
+         (delete 'configure)
+         (add-after 'install 'fix-install
+           (lambda* (#:key outputs inputs #:allow-other-keys)
+             (let* ((out (assoc-ref outputs "out"))
+                    (bin (string-append out "/bin"))
+                    (config (string-append out "/etc/docx2txt.config"))
+                    (unzip (assoc-ref inputs "unzip")))
+               ;; According to INSTALL, the .sh wrapper can be skipped.
+               (delete-file (string-append bin "/docx2txt.sh"))
+               (rename-file (string-append bin "/docx2txt.pl")
+                            (string-append bin "/docx2txt"))
+               (substitute* config
+                 (("config_unzip         => '/usr/bin/unzip',")
+                  (string-append "config_unzip         => '"
+                                 unzip
+                                 "/bin/unzip',")))
+               ;; Makefile is wrong.
+               (chmod config #o644)))))))
+    (synopsis "Recover text from .docx files, with good formatting")
+    (description
+     "@command{docx2txt} is a perl based command line utility to convert
+Microsoft Office™ .docx documents to equivalent text documents. Latest version
+supports following features during text extraction.
+
+@itemize
+@item Character conversions (\" ' < & > -, fractions and some mathematical
+symbols, etc.); currency characters are converted to respective names like
+Euro.
+@item Capitalisation of text blocks.
+@item Center and right justification of text fitting in a line of
+(configurable) 80 columns.
+@item Horizontal ruler, line breaks, paragraphs separation, tabs.
+@item Indicating hyperlinked text along with the hyperlink (configurable).
+@item Handling (bullet, decimal, letter, roman) lists along with (attempt at)
+indentation.
+@end itemize\n")
+    (home-page "http://docx2txt.sourceforge.net")
+    (license license:gpl3+)))
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [bug#31949] [PATCH] gnu: Add docx2txt.
  2018-06-23 13:32 [bug#31949] [PATCH] gnu: Add docx2txt Pierre Neidhardt
@ 2018-06-25 20:58 ` Ludovic Courtès
  2018-06-25 21:22   ` Pierre Neidhardt
  0 siblings, 1 reply; 4+ messages in thread
From: Ludovic Courtès @ 2018-06-25 20:58 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: 31949

Hi,

Pierre Neidhardt <ambrevar@gmail.com> skribis:

> * gnu/packages/textutils.scm (docx2txt): New variable.

[...]

> +    (source (origin
> +              (method url-fetch)
> +              (uri (string-append
> +                    "http://downloads.sourceforge.net/docx2txt/docx2txt-"
> +                    version ".tgz"))

Could you use mirror://sourceforge?

> +       #:make-flags (list (string-append "BINDIR=" (assoc-ref %outputs "out") "/bin")
> +                          (string-append "CONFIGDIR=" (assoc-ref %outputs "out") "/etc")

Lines are a bit long.  :-)

> +    (synopsis "Recover text from .docx files, with good formatting")

@file{.docx} please.

> +    (description
> +     "@command{docx2txt} is a perl based command line utility to convert

s/perl/Perl/

> +Microsoft Office™ .docx documents to equivalent text documents. Latest version

No need for the trademark sign; two spaces after period.

> +@itemize
> +@item Character conversions (\" ' < & > -, fractions and some mathematical
> +symbols, etc.); currency characters are converted to respective names like
> +Euro.

Maybe you remove what’s in parentheses?  Or use @code.

Could you send an updated patch?  Make sure ‘guix lint’ is happy.  :-)

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [bug#31949] [PATCH] gnu: Add docx2txt.
  2018-06-25 20:58 ` Ludovic Courtès
@ 2018-06-25 21:22   ` Pierre Neidhardt
  2018-07-07 15:53     ` bug#31949: " Ludovic Courtès
  0 siblings, 1 reply; 4+ messages in thread
From: Pierre Neidhardt @ 2018-06-25 21:22 UTC (permalink / raw)
  To: 31949

* gnu/packages/textutils.scm (docx2txt): New variable.
---
 gnu/packages/textutils.scm | 66 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/gnu/packages/textutils.scm b/gnu/packages/textutils.scm
index 5734bf62d..5dec41428 100644
--- a/gnu/packages/textutils.scm
+++ b/gnu/packages/textutils.scm
@@ -14,6 +14,7 @@
 ;;; Copyright © 2017 Kei Kebreau <kkebreau@posteo.net>
 ;;; Copyright © 2017 Alex Vong <alexvong1995@gmail.com>
 ;;; Copyright © 2018 Tobias Geerinckx-Rice <me@tobias.gr>
+;;; Copyright © 2018 Pierre Neidhardt <ambrevar@gmail.com>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -675,3 +676,68 @@ and Cython.")
 measuring and checking the width of strings, with support east asian text.")
     (home-page "https://github.com/jessevdk/go-flags")
     (license license:expat)))
+
+(define-public docx2txt
+  (package
+    (name "docx2txt")
+    (version "1.4")
+    (source (origin
+              (method url-fetch)
+              (uri (string-append
+                    "mirror://sourceforge/docx2txt/docx2txt/v"
+                    version "/docx2txt-" version ".tgz"))
+              (sha256
+               (base32
+                "06vdikjvpj6qdb41d8wzfnyj44jpnknmlgbhbr1w215420lpb5xj"))))
+    (build-system gnu-build-system)
+    (inputs
+     `(("unzip" ,unzip)
+       ("perl" ,perl)))
+    (arguments
+     `(#:tests? #f                      ; No tests.
+       #:make-flags (list (string-append "BINDIR="
+                                         (assoc-ref %outputs "out") "/bin")
+                          (string-append "CONFIGDIR="
+                                         (assoc-ref %outputs "out") "/etc")
+                          ;; Makefile seems to be a bit dumb at guessing.
+                          (string-append "INSTALL=install")
+                          (string-append "PERL=perl"))
+       #:phases
+       (modify-phases %standard-phases
+         (delete 'configure)
+         (add-after 'install 'fix-install
+           (lambda* (#:key outputs inputs #:allow-other-keys)
+             (let* ((out (assoc-ref outputs "out"))
+                    (bin (string-append out "/bin"))
+                    (config (string-append out "/etc/docx2txt.config"))
+                    (unzip (assoc-ref inputs "unzip")))
+               ;; According to INSTALL, the .sh wrapper can be skipped.
+               (delete-file (string-append bin "/docx2txt.sh"))
+               (rename-file (string-append bin "/docx2txt.pl")
+                            (string-append bin "/docx2txt"))
+               (substitute* config
+                 (("config_unzip         => '/usr/bin/unzip',")
+                  (string-append "config_unzip         => '"
+                                 unzip
+                                 "/bin/unzip',")))
+               ;; Makefile is wrong.
+               (chmod config #o644)))))))
+    (synopsis "Recover text from @file{.docx} files, with good formatting")
+    (description
+     "@command{docx2txt} is a Perl based command line utility to convert
+Microsoft Office @file{.docx} documents to equivalent text documents.  Latest
+version supports following features during text extraction.
+
+@itemize
+@item Character conversions; currency characters are converted to respective
+names like Euro.
+@item Capitalisation of text blocks.
+@item Center and right justification of text fitting in a line of
+(configurable) 80 columns.
+@item Horizontal ruler, line breaks, paragraphs separation, tabs.
+@item Indicating hyperlinked text along with the hyperlink (configurable).
+@item Handling (bullet, decimal, letter, roman) lists along with (attempt at)
+indentation.
+@end itemize\n")
+    (home-page "http://docx2txt.sourceforge.net")
+    (license license:gpl3+)))
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* bug#31949: [PATCH] gnu: Add docx2txt.
  2018-06-25 21:22   ` Pierre Neidhardt
@ 2018-07-07 15:53     ` Ludovic Courtès
  0 siblings, 0 replies; 4+ messages in thread
From: Ludovic Courtès @ 2018-07-07 15:53 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: 31949-done

Hello Pierre,

Pierre Neidhardt <ambrevar@gmail.com> skribis:

> * gnu/packages/textutils.scm (docx2txt): New variable.

Perfect.  Applied, thanks!

Ludo’.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-07-07 15:54 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-23 13:32 [bug#31949] [PATCH] gnu: Add docx2txt Pierre Neidhardt
2018-06-25 20:58 ` Ludovic Courtès
2018-06-25 21:22   ` Pierre Neidhardt
2018-07-07 15:53     ` bug#31949: " Ludovic Courtès

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.