From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0.migadu.com ([2001:41d0:403:4876::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms13.migadu.com with LMTPS id 8FR0BWKp5mabhAEAqHPOHw:P1 (envelope-from ) for ; Sun, 15 Sep 2024 09:31:14 +0000 Received: from aspmx1.migadu.com ([2001:41d0:403:4876::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0.migadu.com with LMTPS id 8FR0BWKp5mabhAEAqHPOHw (envelope-from ) for ; Sun, 15 Sep 2024 11:31:14 +0200 X-Envelope-To: larch@yhetil.org Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=debbugs.gnu.org header.s=debbugs-gnu-org header.b=YNxcvz+K; dkim=fail ("headers rsa verify failed") header.d=ngraves.fr header.s=ovhmo4487190-selector1 header.b=iPiiooEg; spf=pass (aspmx1.migadu.com: domain of "guix-patches-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-patches-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=gnu.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1726392674; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding:resent-cc: resent-from:resent-sender:resent-message-id:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=LtX/X3eZWtDSE/9g2vHd9u8tz352SxTjXrW53B4pC+o=; b=kPSYSVLhFS6BuxUcLxJwHiFpgHtmNX08XZ8gJjUfQChnzKP1PMpYm9oXNMfcJNOWti/B8l cbn2X/u+8ySn7isxUmyaYWC62NEZB6aS2FNWDZlSPQxCHtR1GXsfrJ2uoHbaWQvgfuBuc8 Wc4z65k3abKq1RFCxBhnQM5UvYYfaNDw0lvYKjs4kre+eGpHbW84RarnKy8sIOwpSQ0sEp L8/7wvnuq6MOCyaTofAbFxwjA3PZhTH+NSmm8N2Ffbljw4O1MvnvLDksYiQJJCX2fzMCLn jglcrFuE97nFVNDsS74B0Zu9VE0MESgJo8wpktLOo8EgTQuhR/l9a7UwxkKXzQ== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=debbugs.gnu.org header.s=debbugs-gnu-org header.b=YNxcvz+K; dkim=fail ("headers rsa verify failed") header.d=ngraves.fr header.s=ovhmo4487190-selector1 header.b=iPiiooEg; spf=pass (aspmx1.migadu.com: domain of "guix-patches-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-patches-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=gnu.org ARC-Seal: i=1; s=key1; d=yhetil.org; t=1726392674; a=rsa-sha256; cv=none; b=Pc0Zqagyv+IQgeYG3lytE/fFMEHeprn3qrqsW8fI8DMFbL+IQp0vvwtAVV7UuHx4yrAFve uK2GBFPGhnRXUJS+TreUWSx76qu3UpwEa/9L/Q4qJmcshFdQFNTwBcxDOUb8wnB3l0QzHW K7NI3/qpETvYfPxdZJ6NjmpZW3/fj/S1phhAO08XvurQyg5WbEvToz/DlRxt53WUtw4D4t XGbiQY9jOK31RPgw83zK09iHrpLcPuigMSB/m8A/RYrjT2OT/5DfdyKHk2UBpcaf3bcm39 Y1gyUx3G96HuHtfPILRv3TwlwINdI6WZXYt50QxLKmuVi1e27D0zJTKODxXn8w== Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 76EE5E0A3 for ; Sun, 15 Sep 2024 11:31:13 +0200 (CEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1spl4v-0006u0-0M; Sun, 15 Sep 2024 04:57:57 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1spl4s-0006s8-8w for guix-patches@gnu.org; Sun, 15 Sep 2024 04:57:54 -0400 Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1spl4s-0005MH-0B for guix-patches@gnu.org; Sun, 15 Sep 2024 04:57:54 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debbugs.gnu.org; s=debbugs-gnu-org; h=MIME-Version:References:In-Reply-To:Date:From:To:Subject; bh=LtX/X3eZWtDSE/9g2vHd9u8tz352SxTjXrW53B4pC+o=; b=YNxcvz+KbgB7DiwzTEPR31u5/+rrArrno2ZNmViHfwTcqu0Y/hxFKh9krFrXpHwzyBuckJ3f9Z4KGECrwuGphLPIbbwW3K0atgfXFOpnKMcp4GBq68crrMMTuXxCD2++6N9a4zDAi+QvL7PJjbOJ/L3m9/0n+FNY+3h1bQq61mXrOm0Rdjr6SxcR0HPWxhcx6ggtz+rytI5SS9K5SnGLYL+/Td603AqxKIsK8+7jOfZV+x4xl8y6TjXqvBvXE20RvueiwUcq1248imOFCoCrHFw6AnA40qgwiowZfPEgqDNDtpDDfCY00U/CNVDdL04VPLDQTzjvKuu5l/z6ZUJjTA==; Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1spl53-0005wc-JY for guix-patches@gnu.org; Sun, 15 Sep 2024 04:58:05 -0400 X-Loop: help-debbugs@gnu.org Subject: [bug#73266] [PATCH 8/9] gnu: Add python-curated-tokenizers. Resent-From: Nicolas Graves Original-Sender: "Debbugs-submit" Resent-CC: guix-patches@gnu.org Resent-Date: Sun, 15 Sep 2024 08:58:05 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 73266 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: patch To: 73266@debbugs.gnu.org Cc: ngraves@ngraves.fr Received: via spool by 73266-submit@debbugs.gnu.org id=B73266.172639067122770 (code B ref 73266); Sun, 15 Sep 2024 08:58:05 +0000 Received: (at 73266) by debbugs.gnu.org; 15 Sep 2024 08:57:51 +0000 Received: from localhost ([127.0.0.1]:48402 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1spl4p-0005vB-30 for submit@debbugs.gnu.org; Sun, 15 Sep 2024 04:57:51 -0400 Received: from 19.mo561.mail-out.ovh.net ([178.32.98.231]:33023) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1spl4i-0005u1-3T for 73266@debbugs.gnu.org; Sun, 15 Sep 2024 04:57:45 -0400 Received: from director7.ghost.mail-out.ovh.net (unknown [10.109.148.110]) by mo561.mail-out.ovh.net (Postfix) with ESMTP id 4X624B605Nz1M73 for <73266@debbugs.gnu.org>; Sun, 15 Sep 2024 08:57:30 +0000 (UTC) Received: from ghost-submission-55b549bf7b-z5k9z (unknown [10.108.54.55]) by director7.ghost.mail-out.ovh.net (Postfix) with ESMTPS id 5BE491FD3E; Sun, 15 Sep 2024 08:57:30 +0000 (UTC) Received: from ngraves.fr ([37.59.142.101]) by ghost-submission-55b549bf7b-z5k9z with ESMTPSA id HbiVNXmh5mZyLw0AaDn4Ag (envelope-from ); Sun, 15 Sep 2024 08:57:30 +0000 X-OVh-ClientIp: 86.246.19.221 Date: Sun, 15 Sep 2024 10:57:13 +0200 Message-ID: <20240915085720.13323-8-ngraves@ngraves.fr> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240915085720.13323-1-ngraves@ngraves.fr> References: <20240915085720.13323-1-ngraves@ngraves.fr> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Ovh-Tracer-Id: 11635612588079833826 X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: 0 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgeeftddrudekfedgtdelucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuqfggjfdpvefjgfevmfevgfenuceurghilhhouhhtmecuhedttdenucenucfjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpefpihgtohhlrghsucfirhgrvhgvshcuoehnghhrrghvvghssehnghhrrghvvghsrdhfrheqnecuggftrfgrthhtvghrnhepvdehleeiffehtedvlefhffffjeefgfduhfetkeevheeiteduiedugfekuedtheejnecuffhomhgrihhnpehgihhthhhusgdrtghomhenucfkphepuddvjedrtddrtddruddpkeeirddvgeeirdduledrvddvuddpfeejrdehledrudegvddruddtudenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepihhnvghtpeduvdejrddtrddtrddupdhmrghilhhfrhhomhepnhhgrhgrvhgvshesnhhgrhgrvhgvshdrfhhrpdhnsggprhgtphhtthhopedupdhrtghpthhtohepjeefvdeiieesuggvsggsuhhgshdrghhnuhdrohhrghdpoffvtefjohhsthepmhhoheeiuddpmhhouggvpehsmhhtphhouhht DKIM-Signature: a=rsa-sha256; bh=LtX/X3eZWtDSE/9g2vHd9u8tz352SxTjXrW53B4pC+o=; c=relaxed/relaxed; d=ngraves.fr; h=From; s=ovhmo4487190-selector1; t=1726390650; v=1; b=iPiiooEgtnHJ0L52lxtZ85nU0qxnNDgCFf0/2rzwZe2j+GucAeWUwkl9VRff4WmvGn/O1Ym1 3khokhiSDAI0o7zaGyRiJXB64t+L8wTj91UQV6nWlvxf/cNLrRJUKchAjGJaAqU6V7HmGNboDNf k+I9W/Uz345iugh4ciz8ziuXyJxvdHrEnEjzFGWM6kd4iJf/5Bb1PmvKgYqVdP3RtFTRyeiZtDJ B4lq1V1Dm8HhfgHnc6CJdxqWPaz0t2C3PcoThKgIeYb3syQnXZB+5vtpefAa7dq2KHgwr0VfnUt 0NW3cHFbJqQnRlX/PVWWGvUTM+w01ThBxTPhibNnOARew== X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-to: Nicolas Graves X-ACL-Warn: , Nicolas Graves via Guix-patches From: Nicolas Graves via Guix-patches via Errors-To: guix-patches-bounces+larch=yhetil.org@gnu.org Sender: guix-patches-bounces+larch=yhetil.org@gnu.org X-Migadu-Flow: FLOW_IN X-Migadu-Country: US X-Migadu-Spam-Score: -4.41 X-Spam-Score: -4.41 X-Migadu-Queue-Id: 76EE5E0A3 X-Migadu-Scanner: mx10.migadu.com X-TUID: fAJ4zsTir25e * gnu/packages/machine-learning.scm (python-curated-tokenizers): New variable. Change-Id: I719d2ffd499c86e6bb2f9215ed979e47c0e32484 --- gnu/packages/machine-learning.scm | 41 +++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/gnu/packages/machine-learning.scm b/gnu/packages/machine-learning.scm index d1b282fea8..e80412ed41 100644 --- a/gnu/packages/machine-learning.scm +++ b/gnu/packages/machine-learning.scm @@ -2480,6 +2480,47 @@ (define-public python-cutlery @end itemize") (license license:expat))) +(define-public python-curated-tokenizers + (package + (name "python-curated-tokenizers") + (version "0.0.9") + ;; This source includes third_party protobuf, but a version that + ;; is not currently packaged in guix (3.6 < version <= 3.19.5). + ;; Try using guix's protobuf when updating. + (source + (origin + (method url-fetch) + (uri (pypi-uri "curated-tokenizers" version)) + (sha256 + (base32 "09ffs2qjlli35wnf8wf64s14xm75vi5ynvkrn9nqllmk9bjlfgf9")))) + (build-system pyproject-build-system) + (arguments + (list + #:phases + #~(modify-phases %standard-phases + ;; For some reason when both local and installed exist, + ;; local is chosen and is missing shared libraries. + ;; Use installed version to run tests instead. + (add-before 'check 'pre-check + (lambda* (#:key tests? inputs outputs #:allow-other-keys) + (when tests? + (copy-recursively "curated_tokenizers/tests" "tests") + (delete-file-recursively "curated_tokenizers") + (add-installed-pythonpath inputs outputs))))))) + (propagated-inputs (list python-regex)) + (native-inputs (list python-cython python-pytest)) + (home-page "https://github.com/explosion/curated-tokenizers") + (synopsis "Lightweight piece tokenization library") + (description "This package provides a lightweight wordpiece and +sentencepiece tokenization library. It supports multiple tokenizers: +@itemize +@item BPE +@item Byte BPE +@item Unigram +@item Wordpiece +@end itemize") + (license license:expat))) + (define-public python-curated-transformers (package (name "python-curated-transformers") -- 2.46.0