From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44782) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fXYx0-00033t-IR for guix-patches@gnu.org; Mon, 25 Jun 2018 17:23:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fXYww-0000Is-FH for guix-patches@gnu.org; Mon, 25 Jun 2018 17:23:06 -0400 Received: from debbugs.gnu.org ([208.118.235.43]:55708) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fXYww-0000IR-9j for guix-patches@gnu.org; Mon, 25 Jun 2018 17:23:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1fXYww-000824-3L for guix-patches@gnu.org; Mon, 25 Jun 2018 17:23:02 -0400 Subject: [bug#31949] [PATCH] gnu: Add docx2txt. Resent-Message-ID: From: Pierre Neidhardt Date: Mon, 25 Jun 2018 23:22:32 +0200 Message-Id: <20180625212232.8292-1-ambrevar@gmail.com> In-Reply-To: <871scur3be.fsf@gnu.org> References: <871scur3be.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+kyle=kyleam.com@gnu.org Sender: "Guix-patches" To: 31949@debbugs.gnu.org * gnu/packages/textutils.scm (docx2txt): New variable. --- gnu/packages/textutils.scm | 66 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) diff --git a/gnu/packages/textutils.scm b/gnu/packages/textutils.scm index 5734bf62d..5dec41428 100644 --- a/gnu/packages/textutils.scm +++ b/gnu/packages/textutils.scm @@ -14,6 +14,7 @@ ;;; Copyright © 2017 Kei Kebreau ;;; Copyright © 2017 Alex Vong ;;; Copyright © 2018 Tobias Geerinckx-Rice +;;; Copyright © 2018 Pierre Neidhardt ;;; ;;; This file is part of GNU Guix. ;;; @@ -675,3 +676,68 @@ and Cython.") measuring and checking the width of strings, with support east asian text.") (home-page "https://github.com/jessevdk/go-flags") (license license:expat))) + +(define-public docx2txt + (package + (name "docx2txt") + (version "1.4") + (source (origin + (method url-fetch) + (uri (string-append + "mirror://sourceforge/docx2txt/docx2txt/v" + version "/docx2txt-" version ".tgz")) + (sha256 + (base32 + "06vdikjvpj6qdb41d8wzfnyj44jpnknmlgbhbr1w215420lpb5xj")))) + (build-system gnu-build-system) + (inputs + `(("unzip" ,unzip) + ("perl" ,perl))) + (arguments + `(#:tests? #f ; No tests. + #:make-flags (list (string-append "BINDIR=" + (assoc-ref %outputs "out") "/bin") + (string-append "CONFIGDIR=" + (assoc-ref %outputs "out") "/etc") + ;; Makefile seems to be a bit dumb at guessing. + (string-append "INSTALL=install") + (string-append "PERL=perl")) + #:phases + (modify-phases %standard-phases + (delete 'configure) + (add-after 'install 'fix-install + (lambda* (#:key outputs inputs #:allow-other-keys) + (let* ((out (assoc-ref outputs "out")) + (bin (string-append out "/bin")) + (config (string-append out "/etc/docx2txt.config")) + (unzip (assoc-ref inputs "unzip"))) + ;; According to INSTALL, the .sh wrapper can be skipped. + (delete-file (string-append bin "/docx2txt.sh")) + (rename-file (string-append bin "/docx2txt.pl") + (string-append bin "/docx2txt")) + (substitute* config + (("config_unzip => '/usr/bin/unzip',") + (string-append "config_unzip => '" + unzip + "/bin/unzip',"))) + ;; Makefile is wrong. + (chmod config #o644))))))) + (synopsis "Recover text from @file{.docx} files, with good formatting") + (description + "@command{docx2txt} is a Perl based command line utility to convert +Microsoft Office @file{.docx} documents to equivalent text documents. Latest +version supports following features during text extraction. + +@itemize +@item Character conversions; currency characters are converted to respective +names like Euro. +@item Capitalisation of text blocks. +@item Center and right justification of text fitting in a line of +(configurable) 80 columns. +@item Horizontal ruler, line breaks, paragraphs separation, tabs. +@item Indicating hyperlinked text along with the hyperlink (configurable). +@item Handling (bullet, decimal, letter, roman) lists along with (attempt at) +indentation. +@end itemize\n") + (home-page "http://docx2txt.sourceforge.net") + (license license:gpl3+))) -- 2.17.1