unofficial mirror of guix-patches@gnu.org 
 help / color / mirror / code / Atom feed
* [bug#69794] [PATCH 0/2] Package some dependencies for Argos Translate
@ 2024-03-14  8:29 guix-patches--- via
  2024-03-14  8:32 ` [bug#69794] [PATCH 1/2] gnu: Add python-sacremoses guix-patches--- via
  2024-03-14  8:32 ` [bug#69794] [PATCH 2/2] gnu: Add python-stanza guix-patches--- via
  0 siblings, 2 replies; 3+ messages in thread
From: guix-patches--- via @ 2024-03-14  8:29 UTC (permalink / raw)
  To: 69794; +Cc: Nguyễn Gia Phong

Argos Translate <https://www.argosopentech.com>
is an offline translation library based on OpenNMT.

Below are some of its dependencies that are trivial to package.
The last one missing is CTranslate2 <https://opennmt.net/CTranslate2>.

Nguyễn Gia Phong (2):
  gnu: Add python-sacremoses.
  gnu: Add python-stanza.

 gnu/packages/machine-learning.scm | 30 +++++++++++++++++++++++++++
 gnu/packages/python-xyz.scm       | 34 +++++++++++++++++++++++++++++++
 2 files changed, 64 insertions(+)


base-commit: 76a3414a1bc500626a9feca013673f994eb51a34
-- 
2.41.0





^ permalink raw reply	[flat|nested] 3+ messages in thread

* [bug#69794] [PATCH 1/2] gnu: Add python-sacremoses.
  2024-03-14  8:29 [bug#69794] [PATCH 0/2] Package some dependencies for Argos Translate guix-patches--- via
@ 2024-03-14  8:32 ` guix-patches--- via
  2024-03-14  8:32 ` [bug#69794] [PATCH 2/2] gnu: Add python-stanza guix-patches--- via
  1 sibling, 0 replies; 3+ messages in thread
From: guix-patches--- via @ 2024-03-14  8:32 UTC (permalink / raw)
  To: 69794
  Cc: Nguyễn Gia Phong, Lars-Dominik Braun, Marius Bakke,
	Munyoki Kilyungi, Sharlatan Hellseher, jgart

* gnu/packages/python-xyz.scm (python-sacremoses): New variable.

Change-Id: I2c2cd94c054d7e952ffb4b3afdedd2ee8ce905bf
---
 gnu/packages/python-xyz.scm | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/gnu/packages/python-xyz.scm b/gnu/packages/python-xyz.scm
index 232b5d69993c..ad33d98db142 100644
--- a/gnu/packages/python-xyz.scm
+++ b/gnu/packages/python-xyz.scm
@@ -149,6 +149,7 @@
 ;;; Copyright © 2024 Timothee Mathieu <timothee.mathieu@inria.fr>
 ;;; Copyright © 2024 Ian Eure <ian@retrospec.tv>
 ;;; Copyright © 2024 Adriel Dumas--Jondeau <leirda@disroot.org>
+;;; Copyright © 2024 Nguyễn Gia Phong <mcsinyx@disroot.org>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -21897,6 +21898,39 @@ (define-public python-nltk
      reasoning, wrappers for natural language processing libraries.")
     (license license:asl2.0)))
 
+(define-public python-sacremoses
+  (package
+    (name "python-sacremoses")
+    (version "0.1.0")
+    (source (origin
+              (method git-fetch)
+              (uri (git-reference
+                     (url "https://github.com/hplt-project/sacremoses")
+                     (commit version)))
+              (sha256
+                (base32
+                  "0g70vchfniknp65n4wnx7chg6g49d4xrz1wagv7f7ir2swdzyn9b"))))
+    (build-system python-build-system)
+    (arguments
+      '(#:phases
+         (modify-phases %standard-phases
+           (replace 'check
+             (lambda* (#:key tests? #:allow-other-keys)
+               (when tests?
+                 ;; Skip truecaser tests which fetch https://norvig.com/big.txt
+                 (invoke "python" "-m" "unittest"
+                         "sacremoses/test/test_corpus.py"
+                         "sacremoses/test/test_no_redos_has_numeric_only.py"
+                         "sacremoses/test/test_normalizer.py"
+                         "sacremoses/test/test_tokenizer.py")))))))
+    (propagated-inputs
+      (list python-click-7 python-joblib python-regex python-tqdm))
+    (home-page "https://github.com/hplt-project/sacremoses")
+    (synopsis "Natural language tokenizer, truecaser and normalizer")
+    (description "SacreMoses is a Python port of Moses'
+tokenizer, detokenizer, truecaser and punctuation normalizer.")
+    (license license:expat)))
+
 (define-public python-pymongo
   (package
     (name "python-pymongo")
-- 
2.41.0





^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [bug#69794] [PATCH 2/2] gnu: Add python-stanza.
  2024-03-14  8:29 [bug#69794] [PATCH 0/2] Package some dependencies for Argos Translate guix-patches--- via
  2024-03-14  8:32 ` [bug#69794] [PATCH 1/2] gnu: Add python-sacremoses guix-patches--- via
@ 2024-03-14  8:32 ` guix-patches--- via
  1 sibling, 0 replies; 3+ messages in thread
From: guix-patches--- via @ 2024-03-14  8:32 UTC (permalink / raw)
  To: 69794; +Cc: Nguyễn Gia Phong

* gnu/packages/machine-learning.scm (python-stanza): New variable.

Change-Id: Ibde67dcb8a015b91554f6a1e36dbf5eef0b73f36
---
 gnu/packages/machine-learning.scm | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/gnu/packages/machine-learning.scm b/gnu/packages/machine-learning.scm
index 5c18a2e9d57d..5e403d905c49 100644
--- a/gnu/packages/machine-learning.scm
+++ b/gnu/packages/machine-learning.scm
@@ -27,6 +27,7 @@
 ;;; Copyright © 2024 David Pflug <david@pflug.io>
 ;;; Copyright © 2024 Timothee Mathieu <timothee.mathieu@inria.fr>
 ;;; Copyright © 2024 Spencer King <spencer.king@geneoscopy.com>
+;;; Copyright © 2024 Nguyễn Gia Phong <mcsinyx@disroot.org>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -1127,6 +1128,35 @@ (define-public python-spacy
 model packaging, deployment and workflow management.")
     (license license:expat)))
 
+(define-public python-stanza
+  (package
+    (name "python-stanza")
+    (version "1.8.1")
+    (source
+      (origin
+        (method url-fetch)
+        (uri (pypi-uri "stanza" version))
+        (sha256
+          (base32 "1drq9wyafisnf44jgby1sh45svp0pj2svb01v397i9h0bczc5i08"))))
+    (build-system python-build-system)
+    (propagated-inputs (list python-emoji
+                             python-numpy
+                             python-protobuf
+                             python-requests
+                             python-networkx
+                             python-toml
+                             python-pytorch
+                             python-tqdm))
+    ;; Tests require downloading of datasets.
+    (arguments (list #:tests? #false))
+    (home-page "https://stanfordnlp.github.io/stanza")
+    (synopsis "Stanford NLP Python library for many human languages")
+    (description "Stanza is a collection of accurate and efficient tools
+for the linguistic analysis of many human languages.  Starting from raw text,
+Stanza divides it into sentences and words, and then can recognize
+parts of speech and entities, do syntactic analysis, and more.")
+    (license license:asl2.0)))
+
 (define-public shogun
   (package
     (name "shogun")
-- 
2.41.0





^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-03-14  8:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-14  8:29 [bug#69794] [PATCH 0/2] Package some dependencies for Argos Translate guix-patches--- via
2024-03-14  8:32 ` [bug#69794] [PATCH 1/2] gnu: Add python-sacremoses guix-patches--- via
2024-03-14  8:32 ` [bug#69794] [PATCH 2/2] gnu: Add python-stanza guix-patches--- via

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).