From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id UKUnJItN9mKCowAAbAwnHQ (envelope-from ) for ; Fri, 12 Aug 2022 14:54:35 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id OB5ZI4tN9mICUAEAG6o9tA (envelope-from ) for ; Fri, 12 Aug 2022 14:54:35 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 328CE45AA0 for ; Fri, 12 Aug 2022 14:54:35 +0200 (CEST) Received: from localhost ([::1]:49588 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oMUBO-000152-C4 for larch@yhetil.org; Fri, 12 Aug 2022 08:54:34 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:49148) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oMU9u-000843-LL for guix-patches@gnu.org; Fri, 12 Aug 2022 08:53:02 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:37901) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1oMU9u-0006dz-Bx for guix-patches@gnu.org; Fri, 12 Aug 2022 08:53:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1oMU9u-0005op-2z for guix-patches@gnu.org; Fri, 12 Aug 2022 08:53:02 -0400 X-Loop: help-debbugs@gnu.org Subject: [bug#57151] [PATCH 1/2] gnu: Add tesseract-ocr-tessdata-fast. Resent-From: Maxim Cournoyer Original-Sender: "Debbugs-submit" Resent-CC: guix-patches@gnu.org Resent-Date: Fri, 12 Aug 2022 12:53:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 57151 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: patch To: Simon South Cc: 57151@debbugs.gnu.org Received: via spool by 57151-submit@debbugs.gnu.org id=B57151.166030875422331 (code B ref 57151); Fri, 12 Aug 2022 12:53:02 +0000 Received: (at 57151) by debbugs.gnu.org; 12 Aug 2022 12:52:34 +0000 Received: from localhost ([127.0.0.1]:55883 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oMU9S-0005o7-Av for submit@debbugs.gnu.org; Fri, 12 Aug 2022 08:52:34 -0400 Received: from mail-qv1-f50.google.com ([209.85.219.50]:39657) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oMU9R-0005nu-5I for 57151@debbugs.gnu.org; Fri, 12 Aug 2022 08:52:33 -0400 Received: by mail-qv1-f50.google.com with SMTP id h8so514770qvs.6 for <57151@debbugs.gnu.org>; Fri, 12 Aug 2022 05:52:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:user-agent:message-id:in-reply-to:date:references :subject:cc:to:from:from:to:cc; bh=k0hWN6ZvyJzokPpQLbZKyM7GXIM+SZQ8FL32aJyPBjs=; b=mlKWOAY4UeuzegGrSI2orh2SOPMryxGhqa64FiE64c1IN+ZYE6TkGkhstoZPU4XHGM 5/Pf7T2zq1S7d3Gf/SJuFbkpsZ5Bq5cHhAF3g0wtmiH1j/dU3I5UUowyjZoTYHzI5i+6 yGaaEdFpEu/igTIVpBSZjmINRlsEEGEr6kfdy2jZOR9TTgdQj/VgkI4rxRnn4KVhW8RK M2Ea1foFZKz8CXT4TEhYhVJi6RcOtvyIjmwru+ohwpFoUFBIZjWg68Ni+O05XylAxiLP 7fpu9DM+D8J2US0tJXQzWP+M1dclWLEocqdVkc9zKGswGk74CpoL8D3f+XHOEjt8Ox// QFFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:user-agent:message-id:in-reply-to:date:references :subject:cc:to:from:x-gm-message-state:from:to:cc; bh=k0hWN6ZvyJzokPpQLbZKyM7GXIM+SZQ8FL32aJyPBjs=; b=M9yk8T4dPPXStSF3RDgsNLq8vZoa5sBCFNBpWzchsK2FXOWBs0Yfh2+MZP1i4Aut/a k/HVjWYLE73fZc8VVE4yQ6NIzBLASc4lvUQ3m3lV2o8nV15zWuLIeCQ12xgZeHaMwItf YEjuNwX7XAVfcn05pDU6deYqlpCAJKGfV2TSZYdPwLNihWah4rXYPeP/+ggQklkNeNmN logw81VUvnYCZsdwLya4Z75pdL1IfA5XcPDc+94RFfRKgROUnirVXciSKRzOjPujBLkE c/65/sJzLzPkPage2nEEqriYrO4VHzsU3KOM4hmBQS/DOH7p6Nu7jRjza0BdgjWFVhSL Q2ZA== X-Gm-Message-State: ACgBeo0+g+ygDqMhIf5OKftjvtXJvz+HwNEh7qJs+Ej3LCameiYQdMvU Q8rOuuLPiqJk2IYER/QCa+/q+IVSXko= X-Google-Smtp-Source: AA6agR5+hhtvJefdE7zfLPtQ0CKN8ngL+nKxtEiM42tw7K2skxtkmnivsgm1nMAOFnLwWlFVG/aVRw== X-Received: by 2002:a05:6214:e66:b0:476:f6f1:404 with SMTP id jz6-20020a0562140e6600b00476f6f10404mr3233792qvb.65.1660308747339; Fri, 12 Aug 2022 05:52:27 -0700 (PDT) Received: from hurd (dsl-205-233-125-72.b2b2c.ca. [205.233.125.72]) by smtp.gmail.com with ESMTPSA id cb24-20020a05622a1f9800b0031ef6dd9700sm1592742qtb.55.2022.08.12.05.52.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 12 Aug 2022 05:52:26 -0700 (PDT) From: Maxim Cournoyer References: <20220812050543.3923-1-maxim.cournoyer@gmail.com> <20220812050752.3980-1-maxim.cournoyer@gmail.com> <87czd57lco.fsf@simonsouth.net> Date: Fri, 12 Aug 2022 08:52:25 -0400 In-Reply-To: <87czd57lco.fsf@simonsouth.net> (Simon South's message of "Fri, 12 Aug 2022 07:27:35 -0400") Message-ID: <87k07dlj3q.fsf@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+larch=yhetil.org@gnu.org Sender: "Guix-patches" X-Migadu-Flow: FLOW_IN X-Migadu-To: larch@yhetil.org X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1660308875; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:resent-cc:resent-from:resent-sender: resent-message-id:in-reply-to:in-reply-to:references:references: list-id:list-help:list-unsubscribe:list-subscribe:list-post: dkim-signature; bh=k0hWN6ZvyJzokPpQLbZKyM7GXIM+SZQ8FL32aJyPBjs=; b=hg4yeVKVj6aekyYlaTENs9R+jjFoYbBPcmDVtytmH8Tt4eWm//1B9QI4YNfp9ipC8QwMJC nPvgzld+JRm/ahtc0I00K7cefa7YUgFsrSWD71MaSSiXExKhrw4pPanGobieUfB0g9F3Yr JulLdk/MmEb1ee6J2bZskbhjGkSXB13dbw0J7yvBtB2z2Wzra/bpv3/zvfR8xcVGaALc1g el7v6mdKFp2nwsp1StJc5xTPGpYoW70qSB5ur70Zv5X4tED58DuzQyIazaDgP50Od0kA5r +N5NEf6ICBMufB5BvKmsxBHJqoaE0icAWKQiBhh1paf0H9DkCJU0E8s7d4P0yg== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1660308875; a=rsa-sha256; cv=none; b=ixXLWYWAcBlX5OErEj031qkjoAv+caHKxJ1ruEy9S1wkXLMO8LLXPljWVb0ImEdFprieuH t7ju+w+0Wx4dKfjh4uZd7yc9jFMGTpigjW2a5vsHk1W4iRJXUAsdqH7PKeMikkgFt0Y5K5 tzxGQfkNJ+K0H2xkDd7+uRI2BpVCqxOXNt1ioZhqygIuq6qYZILn5xywMsJu9CxhdNCpGp U2UePzpz+CkAITPZI90hRmFeRdF157np5bUsEw6CVBvbYe/kNRJ6uU8MKpLQn74ce7tb0l 3i7hN+KWqZuY//t4lAblHmxOuPlO0UZ4qzRMMQhAKfGX+YLRkJpj3Wwpbb/DvA== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20210112 header.b=mlKWOAY4; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of "guix-patches-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-patches-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: 7.63 Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20210112 header.b=mlKWOAY4; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of "guix-patches-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-patches-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 328CE45AA0 X-Spam-Score: 7.63 X-Migadu-Scanner: scn1.migadu.com X-TUID: JiC428YtZhk5 Hi Simon, Simon South writes: > Maxim Cournoyer writes: >> * gnu/packages/ocr.scm (tesseract-ocr-tessdata-fast): New variable. > > Maxim, > > Would it not be better to generate a separate package for each of the > languages and scripts this data covers, as is done by Debian for > instance? The entire dataset is about a gigabyte in size and supports > more than a hundred languages yet I imagine most people would be using > only one or two. > > This would mean tesseract-ocr could simply propagate the > "tesseract-ocr-tessdata-fast-eng" package rather than cherry-picking a > specific file, and would establish a convention that would be necessary > for packaging the "best" dataset as well, if that's desired. That's a good idea! I think we could have both, like Debian also has a 'tesseract-ocr-all' package for all the languages/scripts. Which means the individual variants could be added in at a later time by those interested, eh :-). A procedure returning a language-specific package variant would make sense for that. Thanks, Maxim