From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id GJl3DqSB+V4gRwAA0tVLHw (envelope-from ) for ; Mon, 29 Jun 2020 05:52:36 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id wHpRCqSB+V6wYwAAB5/wlQ (envelope-from ) for ; Mon, 29 Jun 2020 05:52:36 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id D0B44940220 for ; Mon, 29 Jun 2020 05:52:35 +0000 (UTC) Received: from localhost ([::1]:35158 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jpmiY-0003uZ-RP for larch@yhetil.org; Mon, 29 Jun 2020 01:52:34 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:48628) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jpmi5-00030f-Oz for guix-patches@gnu.org; Mon, 29 Jun 2020 01:52:05 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:36008) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jpmi5-0006nq-Dt for guix-patches@gnu.org; Mon, 29 Jun 2020 01:52:05 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1jpmi5-0002AF-DP for guix-patches@gnu.org; Mon, 29 Jun 2020 01:52:05 -0400 X-Loop: help-debbugs@gnu.org Subject: [bug#42117] [PATCH 12/17] gnu: Add r-tokenizers. Resent-From: Peter Lo Original-Sender: "Debbugs-submit" Resent-CC: guix-patches@gnu.org Resent-Date: Mon, 29 Jun 2020 05:52:05 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 42117 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: patch To: 42117@debbugs.gnu.org Cc: Peter Lo Received: via spool by 42117-submit@debbugs.gnu.org id=B42117.15934098778149 (code B ref 42117); Mon, 29 Jun 2020 05:52:05 +0000 Received: (at 42117) by debbugs.gnu.org; 29 Jun 2020 05:51:17 +0000 Received: from localhost ([127.0.0.1]:47533 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jpmhI-00027I-LA for submit@debbugs.gnu.org; Mon, 29 Jun 2020 01:51:16 -0400 Received: from mail-pj1-f42.google.com ([209.85.216.42]:34145) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jpmhE-00026A-AJ for 42117@debbugs.gnu.org; Mon, 29 Jun 2020 01:51:12 -0400 Received: by mail-pj1-f42.google.com with SMTP id cv18so2772082pjb.1 for <42117@debbugs.gnu.org>; Sun, 28 Jun 2020 22:51:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=L5QHqXNhzi+jcviXOqHkZf4LeNzuuyaqbYRnPqnFHkk=; b=G1ZPXGItcG/H85Xlm56RFbzV7g8NrQXf59QFoLL/GrKJbHrD/VB607hpwPMJQOeVMb iOv/VeCpTQ1UmzB4J3bATRmjUCAwRqaDWwyUSXtyjUqNCFq/7Oy2Kns5bN3za0u5hGwG SmtTUejF8PmUvDw7psdgXU4LU4orhTGBn3Vec4PW1GhTi3EX+dAQEFa3g8c0LVhQtD4d LNd5QiQW5ofuKZjx/0z/r3nQf88lQT5CaH28UTgyg+J1O3OdoomM1gG0YbjLIafxARNT WrL5yH/P+jSw7Nszd7FSv3+Z7GPN0Nx9r0DwqAIOtBsic0VQnaV9IPsOP45BnnuCBFqa ZATQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=L5QHqXNhzi+jcviXOqHkZf4LeNzuuyaqbYRnPqnFHkk=; b=AL4S8PQ3bnMwS9/YooBoieNnJ+uu6CjlC0JDpBLvjlJUI2p1yA1p+joPXrocAzqbp/ QrNLtX0NRlVI9ip8aKRoaYS92mEkOy+p1pxLKVymek/z2Fv/O+NtkIkLuYzyrpwv+49O v1FZWY1FtcigOHqf28CgCUslCLWlFlF2RvWA7wZwcKlWyfmMeclt/igxnGkrjrJinJJv +SYLvanF/j9nnQTYf17lsMXZ7azMOLrxs/NGqOduzTFJhhykex3rS1+nADbU54PuXSqX 0C/mqo/yujUYh1G3E3scETUz9hb2w655FEOZh71W7ZUSLWsXrWs2jtRrBgFrDSGj7h/S SrPA== X-Gm-Message-State: AOAM532lVskoYViCmp1osuw8SRaUcLcH4v/+4+2dfjUQHZEu+cI58JA5 ji9p75yRCeG8HkZ3ud3c755WarguyV8= X-Google-Smtp-Source: ABdhPJwP5+1WSOvxIo9ONAbHIkWfMO/djr9EY8zQErh5K3385Gl8NzF1kVIrJTaNF4H54fuxZi0DXQ== X-Received: by 2002:a17:902:6bc6:: with SMTP id m6mr12365597plt.6.1593409866160; Sun, 28 Jun 2020 22:51:06 -0700 (PDT) Received: from localhost.localdomain (059149170072.ctinets.com. [59.149.170.72]) by smtp.gmail.com with ESMTPSA id w135sm12985234pfc.106.2020.06.28.22.51.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Jun 2020 22:51:05 -0700 (PDT) From: Peter Lo Date: Mon, 29 Jun 2020 13:50:37 +0800 Message-Id: <20200629055042.8565-12-peterloleungyau@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200629055042.8565-1-peterloleungyau@gmail.com> References: <20200629055042.8565-1-peterloleungyau@gmail.com> X-Spam-Score: -0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-Spam-Score: -1.0 (-) X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+larch=yhetil.org@gnu.org Sender: "Guix-patches" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=fail (rsa verify failed) header.d=gmail.com header.s=20161025 header.b=G1ZPXGIt; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of guix-patches-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-patches-bounces@gnu.org X-Spam-Score: 1.09 X-TUID: Cc6IXH+lDwHW * gnu/packages/cran.scm (r-tokenizers): New variable. --- gnu/packages/cran.scm | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/gnu/packages/cran.scm b/gnu/packages/cran.scm index 0dcf8d20f3..26c3c1e562 100644 --- a/gnu/packages/cran.scm +++ b/gnu/packages/cran.scm @@ -22670,3 +22670,37 @@ analysis. These novels are \"Sense and Sensibility\", \"Pride and Prejudice\", \"Mansfield Park\", \"Emma\", \"Northanger Abbey\", and \"Persuasion\".") (license license:expat))) + +(define-public r-tokenizers + (package + (name "r-tokenizers") + (version "0.2.1") + (source + (origin + (method url-fetch) + (uri (cran-uri "tokenizers" version)) + (sha256 + (base32 + "006xf1vdrmp9skhpss9ldhmk4cwqk512cjp1pxm2gxfybpf7qq98")))) + (properties `((upstream-name . "tokenizers"))) + (build-system r-build-system) + (propagated-inputs + `(("r-rcpp" ,r-rcpp) + ("r-snowballc" ,r-snowballc) + ("r-stringi" ,r-stringi))) + (native-inputs `(("r-knitr" ,r-knitr))) + (home-page + "https://lincolnmullen.com/software/tokenizers/") + (synopsis + "Fast, Consistent Tokenization of Natural Language Text") + (description + "Convert natural language text into tokens. Includes tokenizers +for shingled n-grams, skip n-grams, words, word stems, sentences, +paragraphs, characters, shingled characters, lines, tweets, Penn +Treebank, regular expressions, as well as functions for counting +characters, words, and sentences, and a function for splitting longer +texts into separate documents, each with the same number of words. +The tokenizers have a consistent interface, and the package is built +on the @code{stringi} and @code{Rcpp} packages for fast yet correct +tokenization in 'UTF-8'.") + (license license:expat))) -- 2.17.1