unofficial mirror of guix-patches@gnu.org 
 help / color / mirror / code / Atom feed
From: Peter Lo <peterloleungyau@gmail.com>
To: 42117@debbugs.gnu.org
Cc: Peter Lo <peterloleungyau@gmail.com>
Subject: [bug#42117] [PATCH 12/17] gnu: Add r-tokenizers.
Date: Mon, 29 Jun 2020 13:50:37 +0800	[thread overview]
Message-ID: <20200629055042.8565-12-peterloleungyau@gmail.com> (raw)
In-Reply-To: <20200629055042.8565-1-peterloleungyau@gmail.com>

* gnu/packages/cran.scm (r-tokenizers): New variable.
---
 gnu/packages/cran.scm | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/gnu/packages/cran.scm b/gnu/packages/cran.scm
index 0dcf8d20f3..26c3c1e562 100644
--- a/gnu/packages/cran.scm
+++ b/gnu/packages/cran.scm
@@ -22670,3 +22670,37 @@ analysis.  These novels are \"Sense and Sensibility\", \"Pride and
 Prejudice\", \"Mansfield Park\", \"Emma\", \"Northanger Abbey\", and
 \"Persuasion\".")
     (license license:expat)))
+
+(define-public r-tokenizers
+  (package
+    (name "r-tokenizers")
+    (version "0.2.1")
+    (source
+      (origin
+        (method url-fetch)
+        (uri (cran-uri "tokenizers" version))
+        (sha256
+          (base32
+            "006xf1vdrmp9skhpss9ldhmk4cwqk512cjp1pxm2gxfybpf7qq98"))))
+    (properties `((upstream-name . "tokenizers")))
+    (build-system r-build-system)
+    (propagated-inputs
+      `(("r-rcpp" ,r-rcpp)
+        ("r-snowballc" ,r-snowballc)
+        ("r-stringi" ,r-stringi)))
+    (native-inputs `(("r-knitr" ,r-knitr)))
+    (home-page
+      "https://lincolnmullen.com/software/tokenizers/")
+    (synopsis
+      "Fast, Consistent Tokenization of Natural Language Text")
+    (description
+      "Convert natural language text into tokens.  Includes tokenizers
+for shingled n-grams, skip n-grams, words, word stems, sentences,
+paragraphs, characters, shingled characters, lines, tweets, Penn
+Treebank, regular expressions, as well as functions for counting
+characters, words, and sentences, and a function for splitting longer
+texts into separate documents, each with the same number of words.
+The tokenizers have a consistent interface, and the package is built
+on the @code{stringi} and @code{Rcpp} packages for fast yet correct
+tokenization in 'UTF-8'.")
+    (license license:expat)))
-- 
2.17.1





  parent reply	other threads:[~2020-06-29  5:52 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-29  5:48 [bug#42117] [PATCH] gnu: Add r-tidymodels and its many dependencies Lo Peter
2020-06-29  5:50 ` [bug#42117] [PATCH 01/17] gnu: Add r-hardhat Peter Lo
2020-06-29  5:50   ` [bug#42117] [PATCH 02/17] gnu: Add r-workflows Peter Lo
2020-06-29  5:50   ` [bug#42117] [PATCH 03/17] gnu: Add r-gpfit Peter Lo
2020-06-29  5:50   ` [bug#42117] [PATCH 04/17] gnu: Add r-yardstick Peter Lo
2020-06-29  5:50   ` [bug#42117] [PATCH 05/17] gnu: Add r-rsample Peter Lo
2020-06-29  5:50   ` [bug#42117] [PATCH 06/17] gnu: Add r-dicedesign Peter Lo
2020-06-29  5:50   ` [bug#42117] [PATCH 07/17] gnu: Add r-dials Peter Lo
2020-06-29  5:50   ` [bug#42117] [PATCH 08/17] gnu: Add r-tune Peter Lo
2020-06-29  5:50   ` [bug#42117] [PATCH 09/17] gnu: Add r-tidyposterior Peter Lo
2020-06-29  5:50   ` [bug#42117] [PATCH 10/17] gnu: Add r-tidypredict Peter Lo
2020-06-29  5:50   ` [bug#42117] [PATCH 11/17] gnu: Add r-janeaustenr Peter Lo
2020-06-29  5:50   ` Peter Lo [this message]
2020-06-29  5:50   ` [bug#42117] [PATCH 13/17] gnu: Add r-hunspell Peter Lo
2020-06-29  5:50   ` [bug#42117] [PATCH 14/17] gnu: Add r-tidytext Peter Lo
2020-06-29  5:50   ` [bug#42117] [PATCH 15/17] gnu: Add r-parsnip Peter Lo
2020-06-29  5:50   ` [bug#42117] [PATCH 16/17] gnu: Add r-infer Peter Lo
2020-06-29  5:50   ` [bug#42117] [PATCH 17/17] gnu: Add r-tidymodels Peter Lo
2020-09-11 16:58 ` bug#42117: [PATCH] gnu: Add r-tidymodels and its many dependencies Ricardo Wurmus

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200629055042.8565-12-peterloleungyau@gmail.com \
    --to=peterloleungyau@gmail.com \
    --cc=42117@debbugs.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).