From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp11.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms9.migadu.com with LMTPS id qOfKJ1gUH2RlgAAASxT56A (envelope-from ) for ; Sat, 25 Mar 2023 16:33:44 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp11.migadu.com with LMTPS id kOC3J1gUH2RjQAAA9RJhRA (envelope-from ) for ; Sat, 25 Mar 2023 16:33:44 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 251C8CD1E for ; Sat, 25 Mar 2023 16:33:44 +0100 (CET) Authentication-Results: aspmx1.migadu.com; dkim=none; spf=pass (aspmx1.migadu.com: domain of "guix-patches-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-patches-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=gnu.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1679758424; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding:resent-cc: resent-from:resent-sender:resent-message-id:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post; bh=7ugUqS9FOBr5F/oBbn1QpctKYW/yIr1e3aSwsMLv75g=; b=qcaqzvxhth76quT3+oDSZ+QjdBmwVv6scv4NskgInxG/8HgErJ30ZcVOkLOH61dcBce+cl I1MKVLckUCvpfrakSbxHs8VFMBxDF4dEcIQ2VLh04PV4yDOogs/T3Ry622ElgZxlKRynPv 0KLn1dlgg9FHzkJ17wlH+OaXAihbH4+ElQrtN1wcNYZo8+iRzY4aJOdPuw9EDdoaE0RQiN byxKCQaGrWb6LG4pKusWXJwuASIyuAb3DOoNXGvJz++BFCg8eb9OdutVSKOlhea2UZ1paI PkCo3Uu1EbEfbRaRyWZvnNeG1v73e/xkq5NsPEW9NVRaNskJIeJcBIGEfTf7Mw== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1679758424; a=rsa-sha256; cv=none; b=t8UXnRG2x6OAobP9JtNf2r5CPUWpZDO2zdFiTIHDCHyhIFjavynaC9ccbS8mmuhmBZGtQY 46OSuZujhdiNvmE6jzgTaYvrmLgpEQKjJ5V2W1Pfs7RuRgJwa/1J9zOvYbvP42B51ReEN1 aQtBK7BGVe6F9nGk/ZgWzQ5MyCyyh70p3xf29E6fIjwUslBkKml3DzN7bXvtKYY/wfzUN4 YYPR0arF4yMYltMVj7QtJ8AZqnFs6G5R7k71LDA4OM4L9arKyQ5D6kJ4nhJjRHr7kg5pMz g0ZYKR64vO9lQQHm/G71wPPVYfJfGJgjYvU1h0pvWfDCWrgL1ZBtA8PYWDVugw== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; spf=pass (aspmx1.migadu.com: domain of "guix-patches-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-patches-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=gnu.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pg5tE-0000ns-85; Sat, 25 Mar 2023 11:33:08 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pg5tA-0000n7-R9 for guix-patches@gnu.org; Sat, 25 Mar 2023 11:33:04 -0400 Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pg5tA-0004P6-Ib for guix-patches@gnu.org; Sat, 25 Mar 2023 11:33:04 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pg5t7-0001oo-Tb for guix-patches@gnu.org; Sat, 25 Mar 2023 11:33:01 -0400 X-Loop: help-debbugs@gnu.org Subject: [bug#62443] [PATCH 1/3] gnu: Add sentencepiece. References: <875yaoc1nj.fsf@ngraves.fr> In-Reply-To: <875yaoc1nj.fsf@ngraves.fr> Resent-From: Nicolas Graves Original-Sender: "Debbugs-submit" Resent-CC: guix-patches@gnu.org Resent-Date: Sat, 25 Mar 2023 15:33:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 62443 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: To: 62443@debbugs.gnu.org Cc: ngraves@ngraves.fr Received: via spool by 62443-submit@debbugs.gnu.org id=B62443.16797583556931 (code B ref 62443); Sat, 25 Mar 2023 15:33:01 +0000 Received: (at 62443) by debbugs.gnu.org; 25 Mar 2023 15:32:35 +0000 Received: from localhost ([127.0.0.1]:43264 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pg5sh-0001ni-8x for submit@debbugs.gnu.org; Sat, 25 Mar 2023 11:32:35 -0400 Received: from 2.mo582.mail-out.ovh.net ([46.105.76.65]:49849) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pg5se-0001nW-P0 for 62443@debbugs.gnu.org; Sat, 25 Mar 2023 11:32:34 -0400 Received: from director2.ghost.mail-out.ovh.net (unknown [10.109.143.3]) by mo582.mail-out.ovh.net (Postfix) with ESMTP id 24A0424641 for <62443@debbugs.gnu.org>; Sat, 25 Mar 2023 15:32:29 +0000 (UTC) Received: from ghost-submission-6684bf9d7b-wpl9d (unknown [10.110.171.1]) by director2.ghost.mail-out.ovh.net (Postfix) with ESMTPS id 8AB871FD83; Sat, 25 Mar 2023 15:32:29 +0000 (UTC) Received: from ngraves.fr ([37.59.142.105]) by ghost-submission-6684bf9d7b-wpl9d with ESMTPSA id pgB1Gw0UH2Q8sTYA+2z0Pg (envelope-from ); Sat, 25 Mar 2023 15:32:29 +0000 X-OVh-ClientIp: 90.45.24.108 Date: Sat, 25 Mar 2023 16:32:18 +0100 Message-Id: <20230325153220.26027-1-ngraves@ngraves.fr> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Ovh-Tracer-Id: 1444811059201434338 X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: 0 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgedvhedrvdegkedgjeelucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuqfggjfdpvefjgfevmfevgfenuceurghilhhouhhtmecuhedttdenucenucfjughrpefhvfevufffkffoggfgsedtkeertdertddtnecuhfhrohhmpefpihgtohhlrghsucfirhgrvhgvshcuoehnghhrrghvvghssehnghhrrghvvghsrdhfrheqnecuggftrfgrthhtvghrnhepteeffefhfffhjeevleeuvdehgffgveekheeuhfekhfehuefgheffhedugfegleeinecuffhomhgrihhnpehgihhthhhusgdrtghomhenucfkphepuddvjedrtddrtddruddpfeejrdehledrudegvddruddtheenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepihhnvghtpeduvdejrddtrddtrddupdhmrghilhhfrhhomhepoehnghhrrghvvghssehnghhrrghvvghsrdhfrheqpdhnsggprhgtphhtthhopedupdhrtghpthhtohepiedvgeegfeesuggvsggsuhhgshdrghhnuhdrohhrghdpoffvtefjohhsthepmhhoheekvddpmhhouggvpehsmhhtphhouhht X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-to: Nicolas Graves X-ACL-Warn: , Nicolas Graves via Guix-patches X-Migadu-Queue-Id: 251C8CD1E X-Spam-Score: -4.83 X-Migadu-Spam-Score: -4.83 X-Migadu-Scanner: scn0.migadu.com From: Nicolas Graves via Guix-patches via Errors-To: guix-patches-bounces+larch=yhetil.org@gnu.org Sender: guix-patches-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN X-TUID: sU98nEQm+QfN * gnu/packages/machine-learning.scm (sentencepiece): New variable. --- gnu/packages/machine-learning.scm | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/gnu/packages/machine-learning.scm b/gnu/packages/machine-learning.scm index 37d4ef78ad..f6996af77b 100644 --- a/gnu/packages/machine-learning.scm +++ b/gnu/packages/machine-learning.scm @@ -583,6 +583,33 @@ (define openfst-for-vosk '("--enable-shared" "--enable-far" "--enable-ngram-fsts" "--enable-lookahead-fsts" "--with-pic" "--disable-bin"))))) +(define-public sentencepiece + (package + (name "sentencepiece") + (version "0.1.97") + (source + (origin + (method git-fetch) + (uri (git-reference + (url "https://github.com/google/sentencepiece") + (commit (string-append "v" version)))) + (file-name (git-file-name name version)) + (sha256 + (base32 "1kzfkp2pk0vabyw3wmkh16h11chzq63mzc20ddhsag5fp6s91ajg")))) + (build-system cmake-build-system) + (arguments '(#:tests? #f)) + (native-inputs (list gperftools)) + (home-page "https://github.com/google/sentencepiece") + (synopsis "Unsupervised tokenizer for Neural Network-based text generation") + (description "SentencePiece is an unsupervised text tokenizer and +detokenizer mainly for Neural Network-based text generation systems where the +vocabulary size is predetermined prior to the neural model training. +SentencePiece implements subword units (e.g., byte-pair-encoding +(BPE) and unigram language model) with the extension of direct training from +raw sentences. SentencePiece allows us to make a purely end-to-end system +that does not depend on language-specific pre/postprocessing.") + (license license:asl2.0))) + (define-public shogun (package (name "shogun") -- 2.39.2