From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id mHmeGJTg9WJKcAAAbAwnHQ (envelope-from ) for ; Fri, 12 Aug 2022 07:09:40 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id gLzYF5Tg9WJ2WgAAG6o9tA (envelope-from ) for ; Fri, 12 Aug 2022 07:09:40 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 036DB3A6CC for ; Fri, 12 Aug 2022 07:09:40 +0200 (CEST) Received: from localhost ([::1]:54938 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oMMvS-0001VI-T3 for larch@yhetil.org; Fri, 12 Aug 2022 01:09:38 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:56164) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oMMux-0001V2-7K for guix-patches@gnu.org; Fri, 12 Aug 2022 01:09:07 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:37379) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1oMMus-0000Xv-P4 for guix-patches@gnu.org; Fri, 12 Aug 2022 01:09:06 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1oMMus-0005vf-KL for guix-patches@gnu.org; Fri, 12 Aug 2022 01:09:02 -0400 X-Loop: help-debbugs@gnu.org Subject: [bug#57151] [PATCH 2/2] gnu: tesseract-ocr: Make the default install minimally useful. Resent-From: Maxim Cournoyer Original-Sender: "Debbugs-submit" Resent-CC: guix-patches@gnu.org Resent-Date: Fri, 12 Aug 2022 05:09:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 57151 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: patch To: 57151@debbugs.gnu.org Cc: Maxim Cournoyer Received: via spool by 57151-submit@debbugs.gnu.org id=B57151.166028089222719 (code B ref 57151); Fri, 12 Aug 2022 05:09:02 +0000 Received: (at 57151) by debbugs.gnu.org; 12 Aug 2022 05:08:12 +0000 Received: from localhost ([127.0.0.1]:55360 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oMMu4-0005uM-3M for submit@debbugs.gnu.org; Fri, 12 Aug 2022 01:08:12 -0400 Received: from mail-qt1-f181.google.com ([209.85.160.181]:34514) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oMMty-0005td-RN for 57151@debbugs.gnu.org; Fri, 12 Aug 2022 01:08:11 -0400 Received: by mail-qt1-f181.google.com with SMTP id e28so61180qts.1 for <57151@debbugs.gnu.org>; Thu, 11 Aug 2022 22:08:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=HDg+cFq/ihlwS9SixKXXfP0sbLRkkHbidgtddAS5G+0=; b=UrT1wj9K2bso9tB9cvdLkgmqIfswm0GSwszL5VqhLG9Txd2byRp3uebN9DnqZb4cVt MGKchH+1xvCc3t0iRCOgtsg8sR+fUJYB3Y0ahGUiMfpibewVZMbsymDOkh3hOn4arH64 S9mFhfqgOKLokY+PBSF+l1L6Fpz4WDSP7smbykZlC6uwaH9AN+p72tBFbWZDpML4Bu9A cjlCjBi5id0DdMC7oiX2WGPwKS1VEbyWLuyGHheVjeAvEV3GWtr5b/Lzq2bTbNJXi36f nllgwalSACF7TELdEWVhEtQJRT8pj0C83/nkuvGVWKOrw17hJies+Oj8/Dt6brLAeFiQ 2zhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=HDg+cFq/ihlwS9SixKXXfP0sbLRkkHbidgtddAS5G+0=; b=2eTHljZrvMI7vTSsFRmezO9CO+JTjjYMD71nY5jZu1z0aWb9BKG8T3tGqxBb0T/wRr 6eA9L/2ZXr+Xc+nCAHG6e1NFG0eunWGgVZaEnKGrJ97z+xkGcVaTdDEj/NMpJZGJhj9c U4WjsYubZkvvCpMDQh0kcMgoxUq7rI32Rfh7pXxuijACX95qpZtN+r7aUzdihhlR2DOz AmoDkaHL6ylVE7qoQBphxd9QaEbpEzFp2nyNdn3sRGn00NViK6tuvFpwS1kogqp7i91O B7HfDULXuiHT4SpdPMTusdELGsuK3wY6NUw8Mpz3JWCSiXnccY/xVYTLrjp7Tk+vQU2b m2+Q== X-Gm-Message-State: ACgBeo1mChn3d2niRTUqZTz6BaYlZCctdCK5ak0Iz58n2Cx3APGZBoPy 3tofcNPen3Hg/rwa90WpaLAA3zyJEnU= X-Google-Smtp-Source: AA6agR6ykEQR0AKc58jhPhnjF9BRYIm9z/GLXZHVYL/Rq9HbaG7QyKeFOLTJy8u6AgxbAmD5e/gIWA== X-Received: by 2002:ac8:5f12:0:b0:343:6510:ed6f with SMTP id x18-20020ac85f12000000b003436510ed6fmr2195974qta.342.1660280881225; Thu, 11 Aug 2022 22:08:01 -0700 (PDT) Received: from localhost.localdomain (dsl-10-148-207.b2b2c.ca. [72.10.148.207]) by smtp.gmail.com with ESMTPSA id l18-20020a37f912000000b006b5fe1c376fsm938253qkj.131.2022.08.11.22.08.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 11 Aug 2022 22:08:00 -0700 (PDT) From: Maxim Cournoyer Date: Fri, 12 Aug 2022 01:07:52 -0400 Message-Id: <20220812050752.3980-2-maxim.cournoyer@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220812050752.3980-1-maxim.cournoyer@gmail.com> References: <20220812050752.3980-1-maxim.cournoyer@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+larch=yhetil.org@gnu.org Sender: "Guix-patches" X-Migadu-Flow: FLOW_IN X-Migadu-To: larch@yhetil.org X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1660280980; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding:resent-cc: resent-from:resent-sender:resent-message-id:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=HDg+cFq/ihlwS9SixKXXfP0sbLRkkHbidgtddAS5G+0=; b=dXRxt2QOKq+wLeylmr6EB0xLk2g+UOaMYQvlhx95Y+BaoDmBW0auaFzZ7euyO73+zAeUDF X7nkBcCnP1rccwG2xEAmNBWQrxiuYhu4inJcnew6JMD1KmJKAaH3f98tkbNab19GtZkpHI UPFP9PxYj62WhvyXMgK5yyCpVD7QsLA6E/cPoHwG+gzu8DikZTQS0/6IrxKRl0M4Ll8FG8 5yDwwU/g8ml1BDUREiNScsgSbBieHGJBRqqW/cuReIp18LIXnxY1ZftShJL3AfOWyxxmEz zRIzHHK8IvFOvtDm1eH6dlvxRZ31OqQ9y2uIUetStG3Ffj8g9ivM6TxiDqKLxg== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1660280980; a=rsa-sha256; cv=none; b=XXv/GeUAHRZ8L0XcXq5ihJTCDO9cFejaqSH4a/HGaDUgq1us64NdEBRNNTCaL+RqXp/kC8 sCqm2tquZGRcgPlXBOWjuPDlR84IQ4DusOtLHPwXfPDdjSOyL85D+qgDtleSwgf8I6e/5N ZD7IhIlo1E6QDhBu77ozLINAWqkDwSqRqV9MApPC7L1PPwVYBcpcuRdYRlKCS8A18ryQy5 RZDXxjv+drNSpMmcSG8icSPcENuaFVePQThRygRlo0wM6PLXXrmdz4KZ5U55JnjDY+HtCE INJ6vLT8RmUiPv6zFbJRHlGuB6matVKXAa8i7PzwmvWOSCrB4ouQdRdDj26KOw== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20210112 header.b=UrT1wj9K; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of "guix-patches-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-patches-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: 6.62 Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20210112 header.b=UrT1wj9K; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of "guix-patches-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-patches-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 036DB3A6CC X-Spam-Score: 6.62 X-Migadu-Scanner: scn1.migadu.com X-TUID: 3rq8AFFTmm1t * gnu/packages/ocr.scm (tesseract-ocr) [phases]{adjust-TESSDATA_PREFIX-macro}: New phase. {install-minimal-tessdata}: New phase. [native-inputs]: Add tesseract-ocr-tessdata-fast. [search-paths]: New field. [description]: Mention how to add support for more languages. --- gnu/packages/ocr.scm | 33 ++++++++++++++++++++++++++++++--- 1 file changed, 30 insertions(+), 3 deletions(-) diff --git a/gnu/packages/ocr.scm b/gnu/packages/ocr.scm index e2c9f561cc..21d257ef24 100644 --- a/gnu/packages/ocr.scm +++ b/gnu/packages/ocr.scm @@ -132,6 +132,15 @@ (define-public tesseract-ocr (substitute* "configure.ac" (("AC_SUBST\\(\\[XML_CATALOG_FILES])") "")))) + (add-after 'unpack 'adjust-TESSDATA_PREFIX-macro + (lambda _ + ;; Use a deeper TESSDATA_PREFIX hierarchy so that a more + ;; specific search-path than '/share' can be specified. The + ;; build system uses CPPFLAGS for itself, so we can't simply set + ;; a make flag. + (substitute* "Makefile.am" + (("-DTESSDATA_PREFIX='\"@datadir@\"'") + "-DTESSDATA_PREFIX='\"@datadir@/tesseract-ocr\"'")))) (add-after 'build 'build-training (lambda* (#:key parallel-build? #:allow-other-keys) (define n (if parallel-build? (number->string @@ -140,7 +149,18 @@ (define n (if parallel-build? (number->string (invoke "make" "-j" n "training"))) (add-after 'install 'install-training (lambda _ - (invoke "make" "training-install")))))) + (invoke "make" "training-install"))) + (add-after 'install 'install-minimal-tessdata + ;; tesseract-ocr cannot be used without its trained models data; + ;; install the English language as a minimal base which can be + ;; extended via TESSDATA_PREFIX. + (lambda* (#:key native-inputs inputs #:allow-other-keys) + (define eng.traineddata + "/share/tesseract-ocr/tessdata/eng.traineddata") + (install-file (search-input-file (or native-inputs inputs) + eng.traineddata) + (dirname (string-append #$output + eng.traineddata)))))))) (native-inputs (list asciidoc autoconf @@ -152,13 +172,18 @@ (define n (if parallel-build? (number->string libtool libxml2 ;for XML_CATALOG_FILES libxslt - pkg-config)) + pkg-config + tesseract-ocr-tessdata-fast)) (inputs (list cairo icu4c leptonica pango python-wrapper)) + (native-search-paths (list (search-path-specification + (variable "TESSDATA_PREFIX") + (files (list "share/tesseract-ocr/tessdata")) + (separator #f)))) ;single value (home-page "https://github.com/tesseract-ocr/tesseract") (synopsis "Optical character recognition engine") (description @@ -166,7 +191,9 @@ (define n (if parallel-build? (number->string high accuracy. It supports many languages, output text formatting, hOCR positional information and page layout analysis. Several image formats are supported through the Leptonica library. It can also detect whether text is -monospaced or proportional.") +monospaced or proportional. Support for the English language is included by +default. To add support for more languages, the +@code{tesseract-ocr-tessdata-fast} package should be installed.") (license license:asl2.0))) (define-public gimagereader -- 2.36.1