From: Nicolas Graves via Guix-patches via <guix-patches@gnu.org>
To: 73266@debbugs.gnu.org
Cc: ngraves@ngraves.fr
Subject: [bug#73266] [PATCH 8/9] gnu: Add python-curated-tokenizers.
Date: Sun, 15 Sep 2024 10:57:13 +0200 [thread overview]
Message-ID: <20240915085720.13323-8-ngraves@ngraves.fr> (raw)
In-Reply-To: <20240915085720.13323-1-ngraves@ngraves.fr>
* gnu/packages/machine-learning.scm (python-curated-tokenizers): New variable.
Change-Id: I719d2ffd499c86e6bb2f9215ed979e47c0e32484
---
gnu/packages/machine-learning.scm | 41 +++++++++++++++++++++++++++++++
1 file changed, 41 insertions(+)
diff --git a/gnu/packages/machine-learning.scm b/gnu/packages/machine-learning.scm
index d1b282fea8..e80412ed41 100644
--- a/gnu/packages/machine-learning.scm
+++ b/gnu/packages/machine-learning.scm
@@ -2480,6 +2480,47 @@ (define-public python-cutlery
@end itemize")
(license license:expat)))
+(define-public python-curated-tokenizers
+ (package
+ (name "python-curated-tokenizers")
+ (version "0.0.9")
+ ;; This source includes third_party protobuf, but a version that
+ ;; is not currently packaged in guix (3.6 < version <= 3.19.5).
+ ;; Try using guix's protobuf when updating.
+ (source
+ (origin
+ (method url-fetch)
+ (uri (pypi-uri "curated-tokenizers" version))
+ (sha256
+ (base32 "09ffs2qjlli35wnf8wf64s14xm75vi5ynvkrn9nqllmk9bjlfgf9"))))
+ (build-system pyproject-build-system)
+ (arguments
+ (list
+ #:phases
+ #~(modify-phases %standard-phases
+ ;; For some reason when both local and installed exist,
+ ;; local is chosen and is missing shared libraries.
+ ;; Use installed version to run tests instead.
+ (add-before 'check 'pre-check
+ (lambda* (#:key tests? inputs outputs #:allow-other-keys)
+ (when tests?
+ (copy-recursively "curated_tokenizers/tests" "tests")
+ (delete-file-recursively "curated_tokenizers")
+ (add-installed-pythonpath inputs outputs)))))))
+ (propagated-inputs (list python-regex))
+ (native-inputs (list python-cython python-pytest))
+ (home-page "https://github.com/explosion/curated-tokenizers")
+ (synopsis "Lightweight piece tokenization library")
+ (description "This package provides a lightweight wordpiece and
+sentencepiece tokenization library. It supports multiple tokenizers:
+@itemize
+@item BPE
+@item Byte BPE
+@item Unigram
+@item Wordpiece
+@end itemize")
+ (license license:expat)))
+
(define-public python-curated-transformers
(package
(name "python-curated-transformers")
--
2.46.0
next prev parent reply other threads:[~2024-09-15 9:31 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-15 8:11 [bug#73266] [PATCH 0/9] Add python-spacy-curated-transformers Nicolas Graves via Guix-patches via
2024-09-15 8:57 ` [bug#73266] [PATCH 1/9] gnu: Add python-azure-storage-file-datalake Nicolas Graves via Guix-patches via
2024-09-15 8:57 ` [bug#73266] [PATCH 2/9] gnu: Add python-cloudpathlib Nicolas Graves via Guix-patches via
2024-09-15 8:57 ` [bug#73266] [PATCH 3/9] gnu: Add python-weasel Nicolas Graves via Guix-patches via
2024-09-15 8:57 ` [bug#73266] [PATCH 4/9] gnu: python-thinc: Update to 8.2.2 Nicolas Graves via Guix-patches via
2024-09-15 8:57 ` [bug#73266] [PATCH 5/9] gnu: python-spacy: Update to 3.7.5 Nicolas Graves via Guix-patches via
2024-09-15 8:57 ` [bug#73266] [PATCH 6/9] gnu: Add python-cutlery Nicolas Graves via Guix-patches via
2024-09-15 8:57 ` [bug#73266] [PATCH 7/9] gnu: Add python-curated-transformers Nicolas Graves via Guix-patches via
2024-09-15 8:57 ` Nicolas Graves via Guix-patches via [this message]
2024-09-15 8:57 ` [bug#73266] [PATCH 9/9] gnu: Add python-spacy-curated-transformers Nicolas Graves via Guix-patches via
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240915085720.13323-8-ngraves@ngraves.fr \
--to=guix-patches@gnu.org \
--cc=73266@debbugs.gnu.org \
--cc=ngraves@ngraves.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).