* [bug#56386] [PATCH] gnu: Add mecab. @ 2022-07-04 19:09 Julien Lepiller 2022-07-04 19:42 ` [bug#56386] [PATCH 1/3] " Julien Lepiller 2023-03-30 22:43 ` Bruno Victal 0 siblings, 2 replies; 7+ messages in thread From: Julien Lepiller @ 2022-07-04 19:09 UTC (permalink / raw) To: 56386 Hi Guix! This small series adds mecab and two dictionaries. MeCab is a morphological analysis engine. I'm not sure what that previous sentence means (:p) but I use it as a segmenter for Japanese in one of my projects. In fact, the two patches that follow add two dictionary sources. You need one of them in the same profile as mecab for it to be useful (with no dictionaries, it segfaults). ^ permalink raw reply [flat|nested] 7+ messages in thread
* [bug#56386] [PATCH 1/3] gnu: Add mecab. 2022-07-04 19:09 [bug#56386] [PATCH] gnu: Add mecab Julien Lepiller @ 2022-07-04 19:42 ` Julien Lepiller 2022-07-04 19:42 ` [bug#56386] [PATCH 2/3] gnu: Add mecab-ipadic Julien Lepiller 2022-07-04 19:42 ` [bug#56386] [PATCH 3/3] gnu: Add mecab-unidic Julien Lepiller 2023-03-30 22:43 ` Bruno Victal 1 sibling, 2 replies; 7+ messages in thread From: Julien Lepiller @ 2022-07-04 19:42 UTC (permalink / raw) To: 56386 * gnu/packages/language.scm (mecab): New variable. * gnu/packages/patches/mecab-variable-param.patch: New file. * gnu/local.mk (dist_patch_DATA): Add it. --- gnu/local.mk | 1 + gnu/packages/language.scm | 51 ++++++++++++++++++- .../patches/mecab-variable-param.patch | 30 +++++++++++ 3 files changed, 81 insertions(+), 1 deletion(-) create mode 100644 gnu/packages/patches/mecab-variable-param.patch diff --git a/gnu/local.mk b/gnu/local.mk index faad6cc6b2..87fe75082c 100644 --- a/gnu/local.mk +++ b/gnu/local.mk @@ -1490,6 +1490,7 @@ dist_patch_DATA = \ %D%/packages/patches/libmemcached-build-with-gcc7.patch \ %D%/packages/patches/libmhash-hmac-fix-uaf.patch \ %D%/packages/patches/libsigrokdecode-python3.9-fix.patch \ + %D%/packages/patches/mecab-variable-param.patch \ %D%/packages/patches/mercurial-hg-extension-path.patch \ %D%/packages/patches/mesa-opencl-all-targets.patch \ %D%/packages/patches/mesa-skip-tests.patch \ diff --git a/gnu/packages/language.scm b/gnu/packages/language.scm index 61c9e682ed..3ffe115b51 100644 --- a/gnu/packages/language.scm +++ b/gnu/packages/language.scm @@ -4,7 +4,7 @@ ;;; Copyright © 2018 Nikita <nikita@n0.is> ;;; Copyright © 2019 Alex Vong <alexvong1995@gmail.com> ;;; Copyright © 2020 Ricardo Wurmus <rekado@elephly.net> -;;; Copyright © 2020 Julien Lepiller <julien@lepiller.eu> +;;; Copyright © 2020, 2022 Julien Lepiller <julien@lepiller.eu> ;;; ;;; This file is part of GNU Guix. ;;; @@ -921,3 +921,52 @@ (define-public praat analysis (pitch, formant, intensity, ...), speech synthesis, labelling, segmenting and manipulation.") (license license:gpl2+))) + +(define-public mecab + (package + (name "mecab") + (version "0.996") + (source (origin + (method git-fetch) + (uri (git-reference + (url "https://github.com/taku910/mecab") + ;; latest commit + (commit "046fa78b2ed56fbd4fac312040f6d62fc1bc31e3"))) + (file-name (git-file-name name version)) + (sha256 + (base32 + "1hdv7rgn8j0ym9gsbigydwrbxa8cx2fb0qngg1ya15vvbw0lk4aa")) + (patches + (search-patches + "mecab-variable-param.patch")))) + (build-system gnu-build-system) + (native-search-paths + (list (search-path-specification + (variable "MECAB_DICDIR") + (separator #f) + (files '("lib/mecab/dic"))))) + (arguments + `(#:phases + (modify-phases %standard-phases + (add-after 'unpack 'chdir + (lambda _ + (chdir "mecab"))) + (add-before 'build 'add-mecab-dicdir-variable + (lambda _ + (substitute* "mecabrc.in" + (("dicdir = .*") + "dicdir = $MECAB_DICDIR")) + (substitute* "mecab-config.in" + (("echo @libdir@/mecab/dic") + "if [ -z \"$MECAB_DICDIR\" ]; then + echo @libdir@/mecab/dic +else + echo \"$MECAB_DICDIR\" +fi"))))))) + (inputs (list libiconv)) + (home-page "https://taku910.github.io/mecab") + (synopsis "Morphological analysis engine for texts") + (description "Mecab is a morphological analysis engine developped as a +collaboration between the Kyoto university and Nippon Telegraph and Telephone +Corporation. The engine is independent of any language, dictionary or corpus.") + (license (list license:gpl2+ license:lgpl2.1+ license:bsd-3)))) diff --git a/gnu/packages/patches/mecab-variable-param.patch b/gnu/packages/patches/mecab-variable-param.patch new file mode 100644 index 0000000000..4457cf3f44 --- /dev/null +++ b/gnu/packages/patches/mecab-variable-param.patch @@ -0,0 +1,30 @@ +From 2396e90056706ef897acab3aaa081289c7336483 Mon Sep 17 00:00:00 2001 +From: LEPILLER Julien <julien.lepiller@irisa.fr> +Date: Fri, 19 Apr 2019 11:48:39 +0200 +Subject: [PATCH] Allow variable parameters + +--- + mecab/src/param.cpp | 6 +++++- + 1 file changed, 5 insertions(+), 1 deletion(-) + +diff --git a/mecab/src/param.cpp b/mecab/src/param.cpp +index 65328a2..006b1b5 100644 +--- a/mecab/src/param.cpp ++++ b/mecab/src/param.cpp +@@ -79,8 +79,12 @@ bool Param::load(const char *filename) { + size_t s1, s2; + for (s1 = pos+1; s1 < line.size() && isspace(line[s1]); s1++); + for (s2 = pos-1; static_cast<long>(s2) >= 0 && isspace(line[s2]); s2--); +- const std::string value = line.substr(s1, line.size() - s1); ++ std::string value = line.substr(s1, line.size() - s1); + const std::string key = line.substr(0, s2 + 1); ++ ++ if(value.find('$') == 0) { ++ value = std::getenv(value.substr(1).c_str()); ++ } + set<std::string>(key.c_str(), value, false); + } + +-- +2.20.1 + -- 2.36.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [bug#56386] [PATCH 2/3] gnu: Add mecab-ipadic. 2022-07-04 19:42 ` [bug#56386] [PATCH 1/3] " Julien Lepiller @ 2022-07-04 19:42 ` Julien Lepiller 2022-07-04 19:42 ` [bug#56386] [PATCH 3/3] gnu: Add mecab-unidic Julien Lepiller 1 sibling, 0 replies; 7+ messages in thread From: Julien Lepiller @ 2022-07-04 19:42 UTC (permalink / raw) To: 56386 * gnu/packages/language.scm (mecab-ipadic): New variable. --- gnu/packages/language.scm | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/gnu/packages/language.scm b/gnu/packages/language.scm index 3ffe115b51..63654c544b 100644 --- a/gnu/packages/language.scm +++ b/gnu/packages/language.scm @@ -970,3 +970,30 @@ (define-public mecab collaboration between the Kyoto university and Nippon Telegraph and Telephone Corporation. The engine is independent of any language, dictionary or corpus.") (license (list license:gpl2+ license:lgpl2.1+ license:bsd-3)))) + +(define-public mecab-ipadic + (package + (name "mecab-ipadic") + (version "2.7.0") + (source (package-source mecab)) + (build-system gnu-build-system) + (arguments + `(#:configure-flags + (list (string-append "--with-dicdir=" (assoc-ref %outputs "out") + "/lib/mecab/dic") + "--with-charset=utf8") + #:phases + (modify-phases %standard-phases + (add-after 'unpack 'chdir + (lambda _ + (chdir "mecab-ipadic"))) + (add-before 'configure 'set-mecab-dir + (lambda* (#:key outputs #:allow-other-keys) + (setenv "MECAB_DICDIR" (string-append (assoc-ref outputs "out") + "/lib/mecab/dic"))))))) + (native-inputs (list mecab)); for mecab-config + (home-page "https://taku910.github.io/mecab") + (synopsis "Dictionary data for MeCab") + (description "This package contains dictionnary data derived from +ipadic for use with MeCab.") + (license (license:non-copyleft "mecab-ipadic/COPYING")))) -- 2.36.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [bug#56386] [PATCH 3/3] gnu: Add mecab-unidic. 2022-07-04 19:42 ` [bug#56386] [PATCH 1/3] " Julien Lepiller 2022-07-04 19:42 ` [bug#56386] [PATCH 2/3] gnu: Add mecab-ipadic Julien Lepiller @ 2022-07-04 19:42 ` Julien Lepiller 2022-07-17 19:33 ` [bug#56386] [PATCH] gnu: Add mecab Ludovic Courtès 1 sibling, 1 reply; 7+ messages in thread From: Julien Lepiller @ 2022-07-04 19:42 UTC (permalink / raw) To: 56386 * gnu/packages/language.scm (mecab-unidic): New variable. --- gnu/packages/language.scm | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/gnu/packages/language.scm b/gnu/packages/language.scm index 63654c544b..f97b982cb9 100644 --- a/gnu/packages/language.scm +++ b/gnu/packages/language.scm @@ -27,6 +27,7 @@ (define-module (gnu packages language) #:use-module (gnu packages autotools) #:use-module (gnu packages audio) #:use-module (gnu packages base) + #:use-module (gnu packages compression) #:use-module (gnu packages docbook) #:use-module (gnu packages emacs) #:use-module (gnu packages freedesktop) @@ -57,6 +58,7 @@ (define-module (gnu packages language) #:use-module (gnu packages xorg) #:use-module (guix packages) #:use-module (guix build-system cmake) + #:use-module (guix build-system copy) #:use-module (guix build-system glib-or-gtk) #:use-module (guix build-system gnu) #:use-module (guix build-system perl) @@ -997,3 +999,27 @@ (define-public mecab-ipadic (description "This package contains dictionnary data derived from ipadic for use with MeCab.") (license (license:non-copyleft "mecab-ipadic/COPYING")))) + +(define-public mecab-unidic + (package + (name "mecab-unidic") + (version "3.1.0") + (source (origin + (method url-fetch) + (uri (string-append "https://clrd.ninjal.ac.jp/unidic_archive/cwj/" + version "/unidic-cwj-" version ".zip")) + (sha256 + (base32 + "1z132p2q3bgchiw529j2d7dari21kn0fhkgrj3vcl0ncg2m521il")))) + (build-system copy-build-system) + (arguments + `(#:install-plan + '(("." "lib/mecab/dic" + #:include-regexp ("\\.bin$" "\\.def$" "\\.dic$" "dicrc"))))) + (native-inputs (list unzip)) + (home-page "https://clrd.ninjal.ac.jp/unidic/en/") + (synopsis "Dictionary data for MeCab") + (description "UniDic for morphological analysis is a dictionary for +analysis with the morphological analyser MeCab, where the short units exported +from the database are used as entries (heading terms).") + (license (list license:gpl2+ license:lgpl2.1 license:bsd-3)))) -- 2.36.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [bug#56386] [PATCH] gnu: Add mecab. 2022-07-04 19:42 ` [bug#56386] [PATCH 3/3] gnu: Add mecab-unidic Julien Lepiller @ 2022-07-17 19:33 ` Ludovic Courtès 0 siblings, 0 replies; 7+ messages in thread From: Ludovic Courtès @ 2022-07-17 19:33 UTC (permalink / raw) To: Julien Lepiller; +Cc: 56386 Hi, Julien Lepiller <julien@lepiller.eu> skribis: > + (synopsis "Dictionary data for MeCab") > + (description "UniDic for morphological analysis is a dictionary for > +analysis with the morphological analyser MeCab, where the short units exported > +from the database are used as entries (heading terms).") > + (license (list license:gpl2+ license:lgpl2.1 license:bsd-3)))) Maybe add a comment stating whether this is triple-licensed (at the user’s choice) or if that means that there are files under each of these. Otherwise the whole series LGTM! Ludo’. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [bug#56386] [PATCH] gnu: Add mecab. 2022-07-04 19:09 [bug#56386] [PATCH] gnu: Add mecab Julien Lepiller 2022-07-04 19:42 ` [bug#56386] [PATCH 1/3] " Julien Lepiller @ 2023-03-30 22:43 ` Bruno Victal 2023-04-01 14:43 ` bug#56386: " Julien Lepiller 1 sibling, 1 reply; 7+ messages in thread From: Bruno Victal @ 2023-03-30 22:43 UTC (permalink / raw) To: Julien Lepiller; +Cc: 56386 On 2022-07-04 20:09, Julien Lepiller wrote: > Hi Guix! > > This small series adds mecab and two dictionaries. MeCab is a > morphological analysis engine. I'm not sure what that previous sentence > means (:p) but I use it as a segmenter for Japanese in one of my > projects. In fact, the two patches that follow add two dictionary > sources. You need one of them in the same profile as mecab for it to be > useful (with no dictionaries, it segfaults). > > > Any updates regarding this? Cheers, Bruno ^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#56386: [PATCH] gnu: Add mecab. 2023-03-30 22:43 ` Bruno Victal @ 2023-04-01 14:43 ` Julien Lepiller 0 siblings, 0 replies; 7+ messages in thread From: Julien Lepiller @ 2023-04-01 14:43 UTC (permalink / raw) To: Bruno Victal; +Cc: 56386-done Le Thu, 30 Mar 2023 23:43:22 +0100, Bruno Victal <mirai@makinata.eu> a écrit : > On 2022-07-04 20:09, Julien Lepiller wrote: > > Hi Guix! > > > > This small series adds mecab and two dictionaries. MeCab is a > > morphological analysis engine. I'm not sure what that previous > > sentence means (:p) but I use it as a segmenter for Japanese in one > > of my projects. In fact, the two patches that follow add two > > dictionary sources. You need one of them in the same profile as > > mecab for it to be useful (with no dictionaries, it segfaults). > > > > > > > > Any updates regarding this? > > > Cheers, > Bruno I had forgotten about this. It's a triple license (at the user's choice), so I added a comment. Pushed to master as 3ab24ba216ce91210b93ec61554b3343fbc3aaab to 4483296da3e2e1424d12d92d0f56fb428765ca43. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-04-01 14:44 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-07-04 19:09 [bug#56386] [PATCH] gnu: Add mecab Julien Lepiller 2022-07-04 19:42 ` [bug#56386] [PATCH 1/3] " Julien Lepiller 2022-07-04 19:42 ` [bug#56386] [PATCH 2/3] gnu: Add mecab-ipadic Julien Lepiller 2022-07-04 19:42 ` [bug#56386] [PATCH 3/3] gnu: Add mecab-unidic Julien Lepiller 2022-07-17 19:33 ` [bug#56386] [PATCH] gnu: Add mecab Ludovic Courtès 2023-03-30 22:43 ` Bruno Victal 2023-04-01 14:43 ` bug#56386: " Julien Lepiller
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/guix.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).