From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp11.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id 6A+7H3of/mPBQwEAbAwnHQ (envelope-from ) for ; Tue, 28 Feb 2023 16:36:26 +0100 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp11.migadu.com with LMTPS id cH6XH3of/mMEtQAA9RJhRA (envelope-from ) for ; Tue, 28 Feb 2023 16:36:26 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 43FDC200D0 for ; Tue, 28 Feb 2023 16:36:26 +0100 (CET) Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20210112 header.b=lWeYSedG; spf=pass (aspmx1.migadu.com: domain of "guix-patches-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-patches-bounces+larch=yhetil.org@gnu.org"; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1677598586; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:resent-cc: resent-from:resent-sender:resent-message-id:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=n2rdSAEVDlMmlXpp2x4kV4uzFw4qKYO+QviUpLhNBLU=; b=A/sc/mEKHSP42A9NBnJGBZnJBklmjbmxym5F0iLGCn+CwXcvHTW5mWdnGwFvwewRreK03I VuiX7oGK11iYSZ4jeo99rUjIXIveOJJrPtf8fYtg21Y6yo1rbGyoTldnBLha4JsR5YHU8V iiy3KybBxjLjTd0p51FiS4Rx4MDcxInvR41sOcusk4baUtSLXQkNqtz2LR9wxF9aINudL4 UrnrSJhn2qO2FmiTeuqLj5TRGKUXxIHirwTm7gO3kUimq3OtgK5RidIW08O6xw7JQAh4Bj UwpkQPu0az+PfhRXqcdURlFTtYJj5A3Kpem4mkbB6FAd3jMDKNdn0xzV2QulHQ== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20210112 header.b=lWeYSedG; spf=pass (aspmx1.migadu.com: domain of "guix-patches-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-patches-bounces+larch=yhetil.org@gnu.org"; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none) ARC-Seal: i=1; s=key1; d=yhetil.org; t=1677598586; a=rsa-sha256; cv=none; b=muH69xX5M/KtwLIckT8wy8ivd+/2fBNheSL9UaLsOatmSHBEZx2LGDF0LjSHiRU3DLMywb j4WX2UwUDnBLgJF/dQ5VXR3J0DO9p4UV4Z5KTdZDtm5V6VjOr3KIvSYYOPVy7PyoSqZeRc enq3q123R2WThksun3OqD3wQdXf2AdM8CHV+Dn2ihpD/PmEw64MYdS0GWKZN6jtpRUUFh0 uKYrYygrt2aakKBNSBucdKuIifmQKQEHknHNgZSsM9PD7JNrJobDnlTfsNF41tx3W9KGbU Bmai8Zd0lRJc1pbYB8gF/532JqT5ChgvSNrZl50GkIyyS5giVCUoM4o9IumaxQ== Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pX21Q-0002Bu-SP; Tue, 28 Feb 2023 10:36:08 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pX21M-0002Bd-Du for guix-patches@gnu.org; Tue, 28 Feb 2023 10:36:06 -0500 Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pX21K-0001pS-TD for guix-patches@gnu.org; Tue, 28 Feb 2023 10:36:04 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pX21J-0003VI-R9 for guix-patches@gnu.org; Tue, 28 Feb 2023 10:36:01 -0500 X-Loop: help-debbugs@gnu.org Subject: [bug#61851] [PATCH] gnu: tesseract-ocr-tessdata-fast: Install tesseract config files. Resent-From: Maxim Cournoyer Original-Sender: "Debbugs-submit" Resent-CC: guix-patches@gnu.org Resent-Date: Tue, 28 Feb 2023 15:36:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 61851 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: patch To: Simon South Cc: Jelle Licht , 61851@debbugs.gnu.org Received: via spool by 61851-submit@debbugs.gnu.org id=B61851.167759854213435 (code B ref 61851); Tue, 28 Feb 2023 15:36:01 +0000 Received: (at 61851) by debbugs.gnu.org; 28 Feb 2023 15:35:42 +0000 Received: from localhost ([127.0.0.1]:51781 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pX20z-0003Ua-Lt for submit@debbugs.gnu.org; Tue, 28 Feb 2023 10:35:42 -0500 Received: from mail-qt1-f176.google.com ([209.85.160.176]:36540) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pX20w-0003UN-QZ for 61851@debbugs.gnu.org; Tue, 28 Feb 2023 10:35:40 -0500 Received: by mail-qt1-f176.google.com with SMTP id l13so10903615qtv.3 for <61851@debbugs.gnu.org>; Tue, 28 Feb 2023 07:35:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:user-agent:message-id :in-reply-to:date:references:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=n2rdSAEVDlMmlXpp2x4kV4uzFw4qKYO+QviUpLhNBLU=; b=lWeYSedGX+H3zQn5TWOo4BswROqQcqpQ4RkwamoPPBufi5s9vjKA7D6QtjtiEfC8lD MShe8CDbA/Z8qq4AFOOJYolPrzoPHnOQUkm3tW695EZnkgNjUUNg+CDPNsS+hO+aOFim +j2EZaTgdoZNgNFRNT+tYW/LJUvEHwqWPBtIBpxt1w0l8ghKicQZ2RF3vl+5Rx+f0VLG /226TV5GISYZbRycFXDu/CfYpV6dYQfNBdMamzkGiEDahIuETlDiFaFhX3HkvcjVcTeK bU1L58/9xvakwxyE4Q47eGgqF1c8GHHFAXaISIQkx0CJlEiqGs6U/TPNWeHJzXZwe2y4 Pp6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:user-agent:message-id :in-reply-to:date:references:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=n2rdSAEVDlMmlXpp2x4kV4uzFw4qKYO+QviUpLhNBLU=; b=iCvfBXZZBxm70s+UOTdUOSGL8LfVkoS2nWQ4qBMGakCiXFYIQvt8mjzkA711K8wBvs QZcM1fNslVjAZ4QoSJKicNhfOMkDZnF8BvJsRBlAtYg9WdlEk310UtxA2JDVsZdFXlIy wne1SBr+o6uavBAyjBMactv5GBe8WZN7gjqa+vYaEg8cxq/kdBgO6MLPyic0uneHy0cV 43y30mtwQ/89BPVtS4MniAihrJaQaIXUWvSOjS5UFYA5q5XLlDfyP2kFe7qp/m+OjuFA ofpugm7GVm1fyR8+YKU1mJhIT9d/MndJ+MFywMDF0Ujb2rw3RY8x+7oXKeNoR0XjiIiz 57Zg== X-Gm-Message-State: AO0yUKUP1ImwmxaUab4VDm/MIXpJejsCWr33IKsmew9ciUyM8dFQztM4 7FBIEIHE/Bqnfyj+wRIHtm2SER/aS95ZEp5P X-Google-Smtp-Source: AK7set8KYTNuo1ZTbMnT0KMiE0BwCiDc/1Nc2Ybm9+uYAHxLvZIpSigi/Julapy9wSuSB1tETKKp1A== X-Received: by 2002:a05:622a:1829:b0:3b8:6763:c25f with SMTP id t41-20020a05622a182900b003b86763c25fmr5618549qtc.13.1677598532917; Tue, 28 Feb 2023 07:35:32 -0800 (PST) Received: from hurd (dsl-10-130-29.b2b2c.ca. [72.10.130.29]) by smtp.gmail.com with ESMTPSA id i62-20020a37b841000000b0073b79edf46csm7009471qkf.83.2023.02.28.07.35.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Feb 2023 07:35:32 -0800 (PST) From: Maxim Cournoyer References: <878rgik9uo.fsf@simonsouth.net> <87bkle4olv.fsf@fsfe.org> <87h6v53kdn.fsf@simonsouth.net> Date: Tue, 28 Feb 2023 10:35:31 -0500 In-Reply-To: <87h6v53kdn.fsf@simonsouth.net> (Simon South's message of "Tue, 28 Feb 2023 10:00:36 -0500") Message-ID: <87mt4xiz0c.fsf@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: X-Migadu-Spam-Score: -2.59 X-Spam-Score: -2.59 X-Migadu-Scanner: scn0.migadu.com X-Migadu-Queue-Id: 43FDC200D0 List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+larch=yhetil.org@gnu.org Sender: guix-patches-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN X-TUID: 6+GmllvmJiDB Hi Simon, Simon South writes: > Jelle Licht writes: >> Cunningham's law strikes again :) > > Ha, interesting. That one's new to me. > >> This makes me believe the current situation was a deliberate choice... > > Yes, it was, and I realize now I didn't provide much in the way of > rationale in my previous email. So here's the background information > for anyone interested: > > Tesseract normally expects to find its data files in /usr/share/tessdata > and subfolders thereof. We'd like to use Guix's native-search-paths > functionality to pull together data from (for instance) multiple > language-specific data packages, and Tesseract conveniently honours a > TESSDATA_PREFIX environment variable that specifies its data folder's > location, so it seems we are all set. > > What should TESSDATA_PREFIX be set to? Tesseract's documentation[0] > says > > TESSDATA_PREFIX environment variable should be set to the parent > directory of =E2=80=9Ctessdata=E2=80=9D directory. > > So "share" then, presumably, to have the data files located at > "share/tessdata". The man page[1] seems to confirm this: > > To use a non-standard language pack named foo.traineddata, set the > TESSDATA_PREFIX environment variable so the file can be found at > TESSDATA_PREFIX/tessdata/foo.traineddata... > > This creates a problem, though, since defining a native-search-path of > just "share" will pull in files from virtually every single Guix > package. The solution then is to introduce an intermediate folder, > "tesseract-ocr", that sidesteps this problem, and to configure Tesseract > appropriately at build time so it installs its data files to > "share/tesseract-ocr/tessdata" instead. This is why the existing code > was written the way it was and what the comment you pointed out is > referring to. > > However there's a problem with this, too: Patching Makefile.am the way > the code does results in only some of Tesseract's data files being > placed in "share/tesseract-ocr/tessdata"; you can see in the package > output there is still a "share/tessdata" folder that contains > Tesseract's config files. Since these aren't also placed beneath > "share/tesseract-ocr/tessdata" Tesseract can't find them at runtime. > > The solution to this seems to be to remove this phase and instead use > the "--datadir" configure flag to specify the desired data-folder path. > Doing this results in all of Tesseract's data files being installed > beneath "share/tesseract-ocr/tessdata" and the resulting package works > as you'd expect. > > However the problem with this is... none of it is necessary in the first > place! It turns out Tesseract's documentation is simply WRONG and the > program actually expects TESSDATA_PREFIX to contain the complete path to > the "tessdata" data folder, not the path of the folder directly above > it. So Tesseract can be built as-is, the native-search-path can be > safely defined as "share/tessdata", and everything just works. > > This is what the patch I passed on yesterday does. Thanks for explaining, that makes sense! Would you be so kind as to open an issue with upstream about the misleading doc? That'd complete it and avoid any confusion in the future. --=20 Thanks, Maxim