From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp12.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id aBn9EplL/WMFbAEAbAwnHQ (envelope-from ) for ; Tue, 28 Feb 2023 01:32:25 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp12.migadu.com with LMTPS id cHunEplL/WOWCQEAauVa8A (envelope-from ) for ; Tue, 28 Feb 2023 01:32:25 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 0E20DC112 for ; Tue, 28 Feb 2023 01:32:25 +0100 (CET) Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=fsfe.org header.s=2021100501 header.b=U5ITwx8P; spf=pass (aspmx1.migadu.com: domain of "guix-patches-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-patches-bounces+larch=yhetil.org@gnu.org"; dmarc=fail reason="SPF not aligned (relaxed)" header.from=fsfe.org (policy=none) ARC-Seal: i=1; s=key1; d=yhetil.org; t=1677544345; a=rsa-sha256; cv=none; b=XeT0yxUqvEG2o1befjLU1rR7ttAFTk59KnAxihD+8+rMYqUb4qzHW6TsJviauU5k0ka6Ma Z5oC3zCK30r+Htk9/OI41iD/oroTwyqeBqxO3B6mMjJVpnNw7gLF+/jjrkmkg3xdwxUot6 ELlyhvgOLQx8Xgtw3DutKBwOIcKawfspUDfBTR90U73olm05gF7Q0AUrJz7VA+oAvJKX/j t2H7ASm0afnb1ebSe0locTBBu6XVPQKDXw+NGuF+LYwVkLefcp8oXvbEuNdf+K7qmoujad k/Dljpzl7DqhPHIeQ7PosIsqlV/aEu0pG7i83bGSP5NYwlosP8zn9EF6d8U28g== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=fsfe.org header.s=2021100501 header.b=U5ITwx8P; spf=pass (aspmx1.migadu.com: domain of "guix-patches-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-patches-bounces+larch=yhetil.org@gnu.org"; dmarc=fail reason="SPF not aligned (relaxed)" header.from=fsfe.org (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1677544345; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:resent-cc:resent-from:resent-sender: resent-message-id:in-reply-to:in-reply-to:references:references: list-id:list-help:list-unsubscribe:list-subscribe:list-post: dkim-signature; bh=AE1Fo0Cd7LtASWlAKh712gZbXtGBOpd2UMuRuLSqF/s=; b=njSxfOsto5sbPrB3CFEQVCtFGumAbCZQlRaCjhdmnNBHYcMKFFJBiJNeDKkmLxWxMc6ewR lPEcgE/Fu0Dae9KI0x4eRP00a+yJYhTrKhCh7n/XLsSiIhmGHeoKsG4/9vdcvVo8ncE/DP NCrCN3Bj7oQTjMj2LtJcp2oWbYau0Be0YJj+/feGLgPF2DyDTemAuw+9VjZ2j+sdkBURCn QvvNkeP2eKRnzCBfY/kmRC406ybTHuUiYYE7MHuNZxhcL4QLYZyHwdJDFWBuVYMldUG2dF GyB0q22ESFRH8mIEgc4Yb4jJWuQXXW8jlScw7XQhUw1ttlzF4sMl2lr+YyLETQ== Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pWnuW-0004f3-5L; Mon, 27 Feb 2023 19:32:04 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pWnuU-0004es-JQ for guix-patches@gnu.org; Mon, 27 Feb 2023 19:32:02 -0500 Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pWnuU-0000dI-5S for guix-patches@gnu.org; Mon, 27 Feb 2023 19:32:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pWnuT-0006HB-Is for guix-patches@gnu.org; Mon, 27 Feb 2023 19:32:01 -0500 X-Loop: help-debbugs@gnu.org Subject: [bug#61851] [PATCH] gnu: tesseract-ocr-tessdata-fast: Install tesseract config files. Resent-From: Jelle Licht Original-Sender: "Debbugs-submit" Resent-CC: guix-patches@gnu.org Resent-Date: Tue, 28 Feb 2023 00:32:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 61851 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: patch To: Simon South Cc: 61851@debbugs.gnu.org, Maxim Cournoyer Received: via spool by 61851-submit@debbugs.gnu.org id=B61851.167754431124091 (code B ref 61851); Tue, 28 Feb 2023 00:32:01 +0000 Received: (at 61851) by debbugs.gnu.org; 28 Feb 2023 00:31:51 +0000 Received: from localhost ([127.0.0.1]:49274 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pWnuI-0006GU-Ma for submit@debbugs.gnu.org; Mon, 27 Feb 2023 19:31:51 -0500 Received: from mail1.fsfe.org ([217.69.89.151]:53206) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pWnuD-0006G7-Sw for 61851@debbugs.gnu.org; Mon, 27 Feb 2023 19:31:49 -0500 From: Jelle Licht DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fsfe.org; s=2021100501; t=1677544301; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=AE1Fo0Cd7LtASWlAKh712gZbXtGBOpd2UMuRuLSqF/s=; b=U5ITwx8PMT5kPboA3YxrllVv58/KBWoUqdnRx0DH8dbDCnn72UVAFArg2HEhpKcpV4PTOv d/cAhLbJ6mglKx0a3Kd/2nQMax38ueiRKr3RGmoZ2t2HhxRb2Y/GZniQthYpqh5658OfF4 2POaUVcjAW3yGFfIqVRNHy1BVCb7KuY= In-Reply-To: <878rgik9uo.fsf@simonsouth.net> References: <878rgik9uo.fsf@simonsouth.net> Date: Tue, 28 Feb 2023 01:31:40 +0100 Message-ID: <87bkle4olv.fsf@fsfe.org> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: X-Migadu-Queue-Id: 0E20DC112 X-Spam-Score: -1.39 X-Migadu-Spam-Score: -1.39 X-Migadu-Scanner: scn0.migadu.com List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+larch=yhetil.org@gnu.org Sender: guix-patches-bounces+larch=yhetil.org@gnu.org X-Migadu-Flow: FLOW_IN X-Migadu-Country: US X-TUID: FygWpndGP29f Hi Simon, Simon South writes: > Jelle, > > Respectfully, and speaking only as an interested observer, I think this > may not be the right fix. Cunningham's law strikes again :) [1]. > > Guix's Tesseract is indeed missing its config files, causing (among > other things) the examples in the online documentation[0] to not work, > e.g.: > > ssouth@hamlet ~/tesseract-ocr-test [env]$ tesseract images/eurotext.png - -l eng hocr > read_params_file: Can't open hocr > The (quick) [brown] {fox} jumps! > Over the $43,456.78 #90 dog > (...) > > But the root issue appears to be a misconfiguration of the > TESSDATA_PREFIX search path in the tessdata-ocr package, which causes > Tesseract's own config files to be installed in a folder other than the > one it's configured to search. > > Fixing this places Tesseract's config files and the trained-data files > together beneath /usr/share/tessdata, allowing Tesseract to work as > expected: > > ssouth@hamlet ~/tesseract-ocr-test [env]$ tesseract images/eurotext.png - -l eng hocr > > (...) I will believe you without any doubt, but there's this spooky comment left in the tesseract-ocr 'adjust-TESSDATA_PREFIX-macro phase: --8<---------------cut here---------------start------------->8--- ;; Use a deeper TESSDATA_PREFIX hierarchy so that a more ;; specific search-path than '/share' can be specified. The ;; build system uses CPPFLAGS for itself, so we can't simply set ;; a make flag. --8<---------------cut here---------------end--------------->8--- This makes me believe the current situation was a deliberate choice, but I personally don't understand what the original problem was/is. > This approach has the advantage of keeping the > tesseract-ocr-tessdata-fast package "pure" and focused only on > trained-data files, which will be important for the patch I'm working on > that will split it into multiple packages, one for each language and > script, to allow greater flexibility. > > I'll respond to this email with a draft (!) patch to tesseract-ocr that > should achieve the same result as yours, making the config files > available for use. Does this also fix the problem for you? If so, > would you consider submitting this change instead? It seems to work for my stuff! I'm bringing Maxim to weigh in on this, as they are the (un?)lucky expert according to my git-foo. Thanks for paying attention! - Jelle [1] https://meta.wikimedia.org/wiki/Cunningham%27s_Law