unofficial mirror of guix-patches@gnu.org 
 help / color / mirror / code / Atom feed
* [bug#56386] [PATCH] gnu: Add mecab.
@ 2022-07-04 19:09 Julien Lepiller
  2022-07-04 19:42 ` [bug#56386] [PATCH 1/3] " Julien Lepiller
  2023-03-30 22:43 ` Bruno Victal
  0 siblings, 2 replies; 7+ messages in thread
From: Julien Lepiller @ 2022-07-04 19:09 UTC (permalink / raw)
  To: 56386

Hi Guix!

This small series adds mecab and two dictionaries. MeCab is a
morphological analysis engine. I'm not sure what that previous sentence
means (:p) but I use it as a segmenter for Japanese in one of my
projects. In fact, the two patches that follow add two dictionary
sources. You need one of them in the same profile as mecab for it to be
useful (with no dictionaries, it segfaults).




^ permalink raw reply	[flat|nested] 7+ messages in thread

* [bug#56386] [PATCH 1/3] gnu: Add mecab.
  2022-07-04 19:09 [bug#56386] [PATCH] gnu: Add mecab Julien Lepiller
@ 2022-07-04 19:42 ` Julien Lepiller
  2022-07-04 19:42   ` [bug#56386] [PATCH 2/3] gnu: Add mecab-ipadic Julien Lepiller
  2022-07-04 19:42   ` [bug#56386] [PATCH 3/3] gnu: Add mecab-unidic Julien Lepiller
  2023-03-30 22:43 ` Bruno Victal
  1 sibling, 2 replies; 7+ messages in thread
From: Julien Lepiller @ 2022-07-04 19:42 UTC (permalink / raw)
  To: 56386

* gnu/packages/language.scm (mecab): New variable.
* gnu/packages/patches/mecab-variable-param.patch: New file.
* gnu/local.mk (dist_patch_DATA): Add it.
---
 gnu/local.mk                                  |  1 +
 gnu/packages/language.scm                     | 51 ++++++++++++++++++-
 .../patches/mecab-variable-param.patch        | 30 +++++++++++
 3 files changed, 81 insertions(+), 1 deletion(-)
 create mode 100644 gnu/packages/patches/mecab-variable-param.patch

diff --git a/gnu/local.mk b/gnu/local.mk
index faad6cc6b2..87fe75082c 100644
--- a/gnu/local.mk
+++ b/gnu/local.mk
@@ -1490,6 +1490,7 @@ dist_patch_DATA =						\
   %D%/packages/patches/libmemcached-build-with-gcc7.patch	\
   %D%/packages/patches/libmhash-hmac-fix-uaf.patch		\
   %D%/packages/patches/libsigrokdecode-python3.9-fix.patch	\
+  %D%/packages/patches/mecab-variable-param.patch		\
   %D%/packages/patches/mercurial-hg-extension-path.patch       \
   %D%/packages/patches/mesa-opencl-all-targets.patch		\
   %D%/packages/patches/mesa-skip-tests.patch			\
diff --git a/gnu/packages/language.scm b/gnu/packages/language.scm
index 61c9e682ed..3ffe115b51 100644
--- a/gnu/packages/language.scm
+++ b/gnu/packages/language.scm
@@ -4,7 +4,7 @@
 ;;; Copyright © 2018 Nikita <nikita@n0.is>
 ;;; Copyright © 2019 Alex Vong <alexvong1995@gmail.com>
 ;;; Copyright © 2020 Ricardo Wurmus <rekado@elephly.net>
-;;; Copyright © 2020 Julien Lepiller <julien@lepiller.eu>
+;;; Copyright © 2020, 2022 Julien Lepiller <julien@lepiller.eu>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -921,3 +921,52 @@ (define-public praat
 analysis (pitch, formant, intensity, ...), speech synthesis, labelling, segmenting
 and manipulation.")
     (license license:gpl2+)))
+
+(define-public mecab
+  (package
+    (name "mecab")
+    (version "0.996")
+    (source (origin
+              (method git-fetch)
+              (uri (git-reference
+                     (url "https://github.com/taku910/mecab")
+                     ;; latest commit
+                     (commit "046fa78b2ed56fbd4fac312040f6d62fc1bc31e3")))
+              (file-name (git-file-name name version))
+              (sha256
+               (base32
+                "1hdv7rgn8j0ym9gsbigydwrbxa8cx2fb0qngg1ya15vvbw0lk4aa"))
+              (patches
+                (search-patches
+                  "mecab-variable-param.patch"))))
+    (build-system gnu-build-system)
+    (native-search-paths
+      (list (search-path-specification
+              (variable "MECAB_DICDIR")
+              (separator #f)
+              (files '("lib/mecab/dic")))))
+    (arguments
+     `(#:phases
+       (modify-phases %standard-phases
+         (add-after 'unpack 'chdir
+           (lambda _
+             (chdir "mecab")))
+         (add-before 'build 'add-mecab-dicdir-variable
+           (lambda _
+             (substitute* "mecabrc.in"
+               (("dicdir = .*")
+                "dicdir = $MECAB_DICDIR"))
+             (substitute* "mecab-config.in"
+               (("echo @libdir@/mecab/dic")
+                "if [ -z \"$MECAB_DICDIR\" ]; then
+  echo @libdir@/mecab/dic
+else
+  echo \"$MECAB_DICDIR\"
+fi")))))))
+    (inputs (list libiconv))
+    (home-page "https://taku910.github.io/mecab")
+    (synopsis "Morphological analysis engine for texts")
+    (description "Mecab is a morphological analysis engine developped as a
+collaboration between the Kyoto university and Nippon Telegraph and Telephone
+Corporation.  The engine is independent of any language, dictionary or corpus.")
+    (license (list license:gpl2+ license:lgpl2.1+ license:bsd-3))))
diff --git a/gnu/packages/patches/mecab-variable-param.patch b/gnu/packages/patches/mecab-variable-param.patch
new file mode 100644
index 0000000000..4457cf3f44
--- /dev/null
+++ b/gnu/packages/patches/mecab-variable-param.patch
@@ -0,0 +1,30 @@
+From 2396e90056706ef897acab3aaa081289c7336483 Mon Sep 17 00:00:00 2001
+From: LEPILLER Julien <julien.lepiller@irisa.fr>
+Date: Fri, 19 Apr 2019 11:48:39 +0200
+Subject: [PATCH] Allow variable parameters
+
+---
+ mecab/src/param.cpp | 6 +++++-
+ 1 file changed, 5 insertions(+), 1 deletion(-)
+
+diff --git a/mecab/src/param.cpp b/mecab/src/param.cpp
+index 65328a2..006b1b5 100644
+--- a/mecab/src/param.cpp
++++ b/mecab/src/param.cpp
+@@ -79,8 +79,12 @@ bool Param::load(const char *filename) {
+     size_t s1, s2;
+     for (s1 = pos+1; s1 < line.size() && isspace(line[s1]); s1++);
+     for (s2 = pos-1; static_cast<long>(s2) >= 0 && isspace(line[s2]); s2--);
+-    const std::string value = line.substr(s1, line.size() - s1);
++    std::string value = line.substr(s1, line.size() - s1);
+     const std::string key   = line.substr(0, s2 + 1);
++
++    if(value.find('$') == 0) {
++        value = std::getenv(value.substr(1).c_str());
++    }
+     set<std::string>(key.c_str(), value, false);
+   }
+ 
+-- 
+2.20.1
+
-- 
2.36.1





^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [bug#56386] [PATCH 2/3] gnu: Add mecab-ipadic.
  2022-07-04 19:42 ` [bug#56386] [PATCH 1/3] " Julien Lepiller
@ 2022-07-04 19:42   ` Julien Lepiller
  2022-07-04 19:42   ` [bug#56386] [PATCH 3/3] gnu: Add mecab-unidic Julien Lepiller
  1 sibling, 0 replies; 7+ messages in thread
From: Julien Lepiller @ 2022-07-04 19:42 UTC (permalink / raw)
  To: 56386

* gnu/packages/language.scm (mecab-ipadic): New variable.
---
 gnu/packages/language.scm | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/gnu/packages/language.scm b/gnu/packages/language.scm
index 3ffe115b51..63654c544b 100644
--- a/gnu/packages/language.scm
+++ b/gnu/packages/language.scm
@@ -970,3 +970,30 @@ (define-public mecab
 collaboration between the Kyoto university and Nippon Telegraph and Telephone
 Corporation.  The engine is independent of any language, dictionary or corpus.")
     (license (list license:gpl2+ license:lgpl2.1+ license:bsd-3))))
+
+(define-public mecab-ipadic
+  (package
+    (name "mecab-ipadic")
+    (version "2.7.0")
+    (source (package-source mecab))
+    (build-system gnu-build-system)
+    (arguments
+     `(#:configure-flags
+       (list (string-append "--with-dicdir=" (assoc-ref %outputs "out")
+                            "/lib/mecab/dic")
+             "--with-charset=utf8")
+       #:phases
+       (modify-phases %standard-phases
+         (add-after 'unpack 'chdir
+           (lambda _
+             (chdir "mecab-ipadic")))
+         (add-before 'configure 'set-mecab-dir
+           (lambda* (#:key outputs #:allow-other-keys)
+             (setenv "MECAB_DICDIR" (string-append (assoc-ref outputs "out")
+                                                   "/lib/mecab/dic")))))))
+    (native-inputs (list mecab)); for mecab-config
+    (home-page "https://taku910.github.io/mecab")
+    (synopsis "Dictionary data for MeCab")
+    (description "This package contains dictionnary data derived from
+ipadic for use with MeCab.")
+    (license (license:non-copyleft "mecab-ipadic/COPYING"))))
-- 
2.36.1





^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [bug#56386] [PATCH 3/3] gnu: Add mecab-unidic.
  2022-07-04 19:42 ` [bug#56386] [PATCH 1/3] " Julien Lepiller
  2022-07-04 19:42   ` [bug#56386] [PATCH 2/3] gnu: Add mecab-ipadic Julien Lepiller
@ 2022-07-04 19:42   ` Julien Lepiller
  2022-07-17 19:33     ` [bug#56386] [PATCH] gnu: Add mecab Ludovic Courtès
  1 sibling, 1 reply; 7+ messages in thread
From: Julien Lepiller @ 2022-07-04 19:42 UTC (permalink / raw)
  To: 56386

* gnu/packages/language.scm (mecab-unidic): New variable.
---
 gnu/packages/language.scm | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/gnu/packages/language.scm b/gnu/packages/language.scm
index 63654c544b..f97b982cb9 100644
--- a/gnu/packages/language.scm
+++ b/gnu/packages/language.scm
@@ -27,6 +27,7 @@ (define-module (gnu packages language)
   #:use-module (gnu packages autotools)
   #:use-module (gnu packages audio)
   #:use-module (gnu packages base)
+  #:use-module (gnu packages compression)
   #:use-module (gnu packages docbook)
   #:use-module (gnu packages emacs)
   #:use-module (gnu packages freedesktop)
@@ -57,6 +58,7 @@ (define-module (gnu packages language)
   #:use-module (gnu packages xorg)
   #:use-module (guix packages)
   #:use-module (guix build-system cmake)
+  #:use-module (guix build-system copy)
   #:use-module (guix build-system glib-or-gtk)
   #:use-module (guix build-system gnu)
   #:use-module (guix build-system perl)
@@ -997,3 +999,27 @@ (define-public mecab-ipadic
     (description "This package contains dictionnary data derived from
 ipadic for use with MeCab.")
     (license (license:non-copyleft "mecab-ipadic/COPYING"))))
+
+(define-public mecab-unidic
+  (package
+    (name "mecab-unidic")
+    (version "3.1.0")
+    (source (origin
+              (method url-fetch)
+              (uri (string-append "https://clrd.ninjal.ac.jp/unidic_archive/cwj/"
+                                  version "/unidic-cwj-" version ".zip"))
+              (sha256
+               (base32
+                "1z132p2q3bgchiw529j2d7dari21kn0fhkgrj3vcl0ncg2m521il"))))
+    (build-system copy-build-system)
+    (arguments
+     `(#:install-plan
+       '(("." "lib/mecab/dic"
+          #:include-regexp ("\\.bin$" "\\.def$" "\\.dic$" "dicrc")))))
+    (native-inputs (list unzip))
+    (home-page "https://clrd.ninjal.ac.jp/unidic/en/")
+    (synopsis "Dictionary data for MeCab")
+    (description "UniDic for morphological analysis is a dictionary for
+analysis with the morphological analyser MeCab, where the short units exported
+from the database are used as entries (heading terms).")
+    (license (list license:gpl2+ license:lgpl2.1 license:bsd-3))))
-- 
2.36.1





^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [bug#56386] [PATCH] gnu: Add mecab.
  2022-07-04 19:42   ` [bug#56386] [PATCH 3/3] gnu: Add mecab-unidic Julien Lepiller
@ 2022-07-17 19:33     ` Ludovic Courtès
  0 siblings, 0 replies; 7+ messages in thread
From: Ludovic Courtès @ 2022-07-17 19:33 UTC (permalink / raw)
  To: Julien Lepiller; +Cc: 56386

Hi,

Julien Lepiller <julien@lepiller.eu> skribis:

> +    (synopsis "Dictionary data for MeCab")
> +    (description "UniDic for morphological analysis is a dictionary for
> +analysis with the morphological analyser MeCab, where the short units exported
> +from the database are used as entries (heading terms).")
> +    (license (list license:gpl2+ license:lgpl2.1 license:bsd-3))))

Maybe add a comment stating whether this is triple-licensed (at the
user’s choice) or if that means that there are files under each of
these.

Otherwise the whole series LGTM!

Ludo’.




^ permalink raw reply	[flat|nested] 7+ messages in thread

* [bug#56386] [PATCH] gnu: Add mecab.
  2022-07-04 19:09 [bug#56386] [PATCH] gnu: Add mecab Julien Lepiller
  2022-07-04 19:42 ` [bug#56386] [PATCH 1/3] " Julien Lepiller
@ 2023-03-30 22:43 ` Bruno Victal
  2023-04-01 14:43   ` bug#56386: " Julien Lepiller
  1 sibling, 1 reply; 7+ messages in thread
From: Bruno Victal @ 2023-03-30 22:43 UTC (permalink / raw)
  To: Julien Lepiller; +Cc: 56386

On 2022-07-04 20:09, Julien Lepiller wrote:
> Hi Guix!
> 
> This small series adds mecab and two dictionaries. MeCab is a
> morphological analysis engine. I'm not sure what that previous sentence
> means (:p) but I use it as a segmenter for Japanese in one of my
> projects. In fact, the two patches that follow add two dictionary
> sources. You need one of them in the same profile as mecab for it to be
> useful (with no dictionaries, it segfaults).
> 
> 
> 

Any updates regarding this?


Cheers,
Bruno




^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#56386: [PATCH] gnu: Add mecab.
  2023-03-30 22:43 ` Bruno Victal
@ 2023-04-01 14:43   ` Julien Lepiller
  0 siblings, 0 replies; 7+ messages in thread
From: Julien Lepiller @ 2023-04-01 14:43 UTC (permalink / raw)
  To: Bruno Victal; +Cc: 56386-done

Le Thu, 30 Mar 2023 23:43:22 +0100,
Bruno Victal <mirai@makinata.eu> a écrit :

> On 2022-07-04 20:09, Julien Lepiller wrote:
> > Hi Guix!
> > 
> > This small series adds mecab and two dictionaries. MeCab is a
> > morphological analysis engine. I'm not sure what that previous
> > sentence means (:p) but I use it as a segmenter for Japanese in one
> > of my projects. In fact, the two patches that follow add two
> > dictionary sources. You need one of them in the same profile as
> > mecab for it to be useful (with no dictionaries, it segfaults).
> > 
> > 
> >   
> 
> Any updates regarding this?
> 
> 
> Cheers,
> Bruno

I had forgotten about this. It's a triple license (at the user's
choice), so I added a comment. Pushed to master as
3ab24ba216ce91210b93ec61554b3343fbc3aaab to
4483296da3e2e1424d12d92d0f56fb428765ca43.




^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-04-01 14:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-07-04 19:09 [bug#56386] [PATCH] gnu: Add mecab Julien Lepiller
2022-07-04 19:42 ` [bug#56386] [PATCH 1/3] " Julien Lepiller
2022-07-04 19:42   ` [bug#56386] [PATCH 2/3] gnu: Add mecab-ipadic Julien Lepiller
2022-07-04 19:42   ` [bug#56386] [PATCH 3/3] gnu: Add mecab-unidic Julien Lepiller
2022-07-17 19:33     ` [bug#56386] [PATCH] gnu: Add mecab Ludovic Courtès
2023-03-30 22:43 ` Bruno Victal
2023-04-01 14:43   ` bug#56386: " Julien Lepiller

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).