From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:470:142:3::10]:51620) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hrUCZ-0005H3-HO for guix-patches@gnu.org; Sat, 27 Jul 2019 17:26:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hrUCY-0002wi-9s for guix-patches@gnu.org; Sat, 27 Jul 2019 17:26:03 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:36665) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hrUCY-0002wE-68 for guix-patches@gnu.org; Sat, 27 Jul 2019 17:26:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hrUCY-0007eP-2K for guix-patches@gnu.org; Sat, 27 Jul 2019 17:26:02 -0400 Subject: [bug#36825] [PATCH 2/2] gnu: Add uniutils. Resent-Message-ID: From: Hartmut Goebel Date: Sat, 27 Jul 2019 23:25:34 +0200 Message-Id: <20190727212534.631-2-h.goebel@crazy-compilers.com> In-Reply-To: <20190727212534.631-1-h.goebel@crazy-compilers.com> References: <20190727212534.631-1-h.goebel@crazy-compilers.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+kyle=kyleam.com@gnu.org Sender: "Guix-patches" To: 36825@debbugs.gnu.org * gnu/packages/textutils.scm (uniutils): New variable. --- gnu/packages/textutils.scm | 63 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 63 insertions(+) diff --git a/gnu/packages/textutils.scm b/gnu/packages/textutils.scm index aeb1953b99..ea6b4232de 100644 --- a/gnu/packages/textutils.scm +++ b/gnu/packages/textutils.scm @@ -45,6 +45,7 @@ #:use-module (guix build-system python) #:use-module (gnu packages) #:use-module (gnu packages autotools) + #:use-module (gnu packages base) #:use-module (gnu packages compression) #:use-module (gnu packages gettext) #:use-module (gnu packages java) @@ -362,6 +363,68 @@ useful when it is desired to reformat numbers. of floating point numbers, just treat the input as a sequence of unsigned characters.) +@end itemize") + (license license:gpl3))) + +(define-public uniutils + (package + (name "uniutils") + (version "2.27") + (source + (origin + (method url-fetch) + (uri (string-append "http://billposer.org/Software/Downloads/" + "uniutils-" version ".tar.bz2")) + (sha256 + (base32 "19w1510w87gx7n4qy3zsb0m467a4rn5scvh4ajajg7jh6x5xri08")))) + (build-system gnu-build-system) + (arguments + '(#:configure-flags '("--disable-dependency-tracking") + #:phases + (modify-phases %standard-phases + (add-after 'build 'fix-paths + (lambda* (#:key outputs inputs #:allow-other-keys) + (let ((out (assoc-ref outputs "out")) + (a2b (assoc-ref inputs "ascii2binary")) + (iconv (assoc-ref inputs "libiconv"))) + (substitute* "utf8lookup" + (("^ascii2binary ") (string-append a2b "/bin/ascii2binary ")) + (("^uniname ") (string-append out "/bin/uniname ")) + (("^iconv ") (string-append iconv "/bin/iconv "))) + #t)))))) + (inputs + `(("ascii2binary" ,ascii2binary) + ("libiconv" ,libiconv))) + (home-page "https://billposer.org/Software/unidesc.html") + (synopsis "Find out what is in a Unicode file") + (description "Useful tools when working with Unicode files when one +doesn't know the writing system, doesn't have the necessary font, needs to +inspect invisible characters, needs to find out whether characters have been +combined or in what order they occur, or needs statistics on which characters +occur. + +@itemize + +@item @command{uniname} defaults to printing the character offset of each +character, its byte offset, its hex code value, its encoding, the glyph +itself, and its name. It may also be used to validate UTF-8 input. + +@item @command{unidesc} reports the character ranges to which different +portions of the text belong. It can also be used to identify Unicode encodings +(e.g. UTF-16be) flagged by magic numbers. + +@item @command{unihist} generates a histogram of the characters in its input. + +@item @command{ExplicateUTF8} is intended for debugging or for learning about +Unicode. It determines and explains the validity of a sequence of bytes as a +UTF8 encoding. + +@item @command{utf8lookup} provides a handy way to look up Unicode characters +from the command line. + +@item @command{unireverse} reverse each line of UTF-8 input +character-by-character. + @end itemize") (license license:gpl3))) -- 2.21.0