[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug#42117] [PATCH 12/17] gnu: Add r-tokenizers.
From: |
Peter Lo |
Subject: |
[bug#42117] [PATCH 12/17] gnu: Add r-tokenizers. |
Date: |
Mon, 29 Jun 2020 13:50:37 +0800 |
* gnu/packages/cran.scm (r-tokenizers): New variable.
---
gnu/packages/cran.scm | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)
diff --git a/gnu/packages/cran.scm b/gnu/packages/cran.scm
index 0dcf8d20f3..26c3c1e562 100644
--- a/gnu/packages/cran.scm
+++ b/gnu/packages/cran.scm
@@ -22670,3 +22670,37 @@ analysis. These novels are \"Sense and Sensibility\",
\"Pride and
Prejudice\", \"Mansfield Park\", \"Emma\", \"Northanger Abbey\", and
\"Persuasion\".")
(license license:expat)))
+
+(define-public r-tokenizers
+ (package
+ (name "r-tokenizers")
+ (version "0.2.1")
+ (source
+ (origin
+ (method url-fetch)
+ (uri (cran-uri "tokenizers" version))
+ (sha256
+ (base32
+ "006xf1vdrmp9skhpss9ldhmk4cwqk512cjp1pxm2gxfybpf7qq98"))))
+ (properties `((upstream-name . "tokenizers")))
+ (build-system r-build-system)
+ (propagated-inputs
+ `(("r-rcpp" ,r-rcpp)
+ ("r-snowballc" ,r-snowballc)
+ ("r-stringi" ,r-stringi)))
+ (native-inputs `(("r-knitr" ,r-knitr)))
+ (home-page
+ "https://lincolnmullen.com/software/tokenizers/")
+ (synopsis
+ "Fast, Consistent Tokenization of Natural Language Text")
+ (description
+ "Convert natural language text into tokens. Includes tokenizers
+for shingled n-grams, skip n-grams, words, word stems, sentences,
+paragraphs, characters, shingled characters, lines, tweets, Penn
+Treebank, regular expressions, as well as functions for counting
+characters, words, and sentences, and a function for splitting longer
+texts into separate documents, each with the same number of words.
+The tokenizers have a consistent interface, and the package is built
+on the @code{stringi} and @code{Rcpp} packages for fast yet correct
+tokenization in 'UTF-8'.")
+ (license license:expat)))
--
2.17.1
- [bug#42117] [PATCH 02/17] gnu: Add r-workflows., (continued)
- [bug#42117] [PATCH 02/17] gnu: Add r-workflows., Peter Lo, 2020/06/29
- [bug#42117] [PATCH 03/17] gnu: Add r-gpfit., Peter Lo, 2020/06/29
- [bug#42117] [PATCH 04/17] gnu: Add r-yardstick., Peter Lo, 2020/06/29
- [bug#42117] [PATCH 06/17] gnu: Add r-dicedesign., Peter Lo, 2020/06/29
- [bug#42117] [PATCH 05/17] gnu: Add r-rsample., Peter Lo, 2020/06/29
- [bug#42117] [PATCH 07/17] gnu: Add r-dials., Peter Lo, 2020/06/29
- [bug#42117] [PATCH 08/17] gnu: Add r-tune., Peter Lo, 2020/06/29
- [bug#42117] [PATCH 09/17] gnu: Add r-tidyposterior., Peter Lo, 2020/06/29
- [bug#42117] [PATCH 10/17] gnu: Add r-tidypredict., Peter Lo, 2020/06/29
- [bug#42117] [PATCH 11/17] gnu: Add r-janeaustenr., Peter Lo, 2020/06/29
- [bug#42117] [PATCH 12/17] gnu: Add r-tokenizers.,
Peter Lo <=
- [bug#42117] [PATCH 14/17] gnu: Add r-tidytext., Peter Lo, 2020/06/29
- [bug#42117] [PATCH 15/17] gnu: Add r-parsnip., Peter Lo, 2020/06/29
- [bug#42117] [PATCH 13/17] gnu: Add r-hunspell., Peter Lo, 2020/06/29
- [bug#42117] [PATCH 16/17] gnu: Add r-infer., Peter Lo, 2020/06/29
- [bug#42117] [PATCH 17/17] gnu: Add r-tidymodels., Peter Lo, 2020/06/29