From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Taiju HIGASHI Newsgroups: gmane.emacs.devel Subject: [PATCH] Add an option to not reduce vocabulary of the Japanese Date: Fri, 03 Jun 2022 12:16:06 +0900 Message-ID: <87r146o2rt.fsf@taiju.info> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="1423"; mail-complaints-to="usenet@ciao.gmane.io" Cc: higashi@taiju.info To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Jun 03 05:17:14 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nwxoH-000Acg-Sz for ged-emacs-devel@m.gmane-mx.org; Fri, 03 Jun 2022 05:17:14 +0200 Original-Received: from localhost ([::1]:60084 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nwxoG-0003O6-GR for ged-emacs-devel@m.gmane-mx.org; Thu, 02 Jun 2022 23:17:12 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:55890) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nwxnN-0002gg-DJ for emacs-devel@gnu.org; Thu, 02 Jun 2022 23:16:17 -0400 Original-Received: from mail-pj1-x102f.google.com ([2607:f8b0:4864:20::102f]:52216) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nwxnI-0003ki-KH for emacs-devel@gnu.org; Thu, 02 Jun 2022 23:16:16 -0400 Original-Received: by mail-pj1-x102f.google.com with SMTP id cx11so6474363pjb.1 for ; Thu, 02 Jun 2022 20:16:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=taiju-info.20210112.gappssmtp.com; s=20210112; h=from:to:subject:cc:date:message-id:mime-version; bh=3Gr9RgsA0x9tgvebcN1uq5XrBgY0xMQ4ZF71m1XMYtw=; b=2UEyP9Sq5Z3/waFaGBwEwAOMpHt0O55FCv7WSlx9IpJDgM8KhmGEdOI6K2xNqIafH2 z6IpipietUE+4lXqupA7hXpjDy64tQwvsYaj/NGJjIFcq91dF5w9LlPXPajG3nRKE+rj aGiWc8NkaLVaXgGDyF7C7EfdLVIVMQuNYTJOG1/tboToyUQtNbJX0mqlHaujHyzW25Ap ID7nue6veD6pbJcVs0WQg0lEYtAiBeiwLbkpdtKvJQZZoV5TCJgMLjMB24kxGVrUL5yY Y1IecR6Wcx+wXQdSjcKzILfvzZTzrXpR0I+48FeRJ0emF/c8x5yOnAHG5yGJ/el4ii/g ah/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:cc:date:message-id:mime-version; bh=3Gr9RgsA0x9tgvebcN1uq5XrBgY0xMQ4ZF71m1XMYtw=; b=7iac9iuZemr2bVFnSKKgL1pakIrGm+DGLH/yqnXkRkdxxs2DErNzt4YYxiZ0X+vlfR 5EoO+2C1/9uWFJU8vnHo+D91qvyN0SXYKXzX6QmzJCJT8aIgZGJmbTm6XPHKQZRtzGlm JqRI8Dd/C58zCD/kutrQZpSJRywKCh6glNxInRgyhJJAbv+tDJEOiccYCJsKHkZxSugC 4ppZ0m/GCFj4N9IQcWX0neVJXPHFllWb4HDjdrlTCpoIFHD1SlVgdQtzuNbQWrugRvAW 0IfZDtn81JjS5y5Xl2uqRs1smxN8UvzYWlmS6VGs1EOLJ3nJ0ErjpygaBfr+tg9glgH2 bNoQ== X-Gm-Message-State: AOAM532X80wIgG8NpFK9TwYSm2S+MATUTdtobC+lnxRbbJtfb/Uy8cQd UHjrh/56xfGKn0oQrV/Rgp2T0Q== X-Google-Smtp-Source: ABdhPJyFKAMmemjv0gdnYYfwXOJZUmSUx7A1M9F7o3k2FFMGw2M95Dc5Y8fG+8IbBUft0Z7a6wJSeQ== X-Received: by 2002:a17:90b:17d2:b0:1e2:c0a2:80fb with SMTP id me18-20020a17090b17d200b001e2c0a280fbmr32191563pjb.67.1654226170304; Thu, 02 Jun 2022 20:16:10 -0700 (PDT) Original-Received: from Taix ([240b:253:ec40:2400:b7d1:436e:2d61:e925]) by smtp.gmail.com with ESMTPSA id d5-20020a634f05000000b003f655cf45c0sm4044699pgb.63.2022.06.02.20.16.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Jun 2022 20:16:09 -0700 (PDT) Received-SPF: none client-ip=2607:f8b0:4864:20::102f; envelope-from=higashi@taiju.info; helo=mail-pj1-x102f.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:290588 Archived-At: --=-=-= Content-Type: text/plain; charset=iso-2022-jp Hi, The Japanese dictionary bundled with Emacs has a small vocabulary. For example, to convert "なごや" to "名古屋" (Nagoya) in Kanji, I would enter "なご" and convert it to "名古", then enter "や" and convert it to "屋". Because the Japanese dictionary bundled with Emacs does not have "名古屋 ". The skkdic-convert function in the ja-dic-cnv package generates the Japanese dictionary, but the logic includes the dictionary vocabulary reduction process. So I have created a patch to add an option to skip this reduction process. I would be happy to receive your review and feedback. --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=0001-Add-an-option-to-not-reduce-vocabulary-of-the-Japane.patch >From 8afafacf87af38ef0fd3193d5133cf06de365c65 Mon Sep 17 00:00:00 2001 From: Taiju HIGASHI Date: Thu, 2 Jun 2022 23:24:13 +0900 Subject: [PATCH] Add an option to not reduce vocabulary of the Japanese dictionary. * configure.ac: Add "with-ja-dic-reduction" configure argument. * leim/Makefile.in: Add "JA_DIC_NO_REDUCTION_OPTION" variable. * lisp/international/ja-dic-cnv.el (skkdic-convert-okuri-nasi): Add "no-reduction" argument. (skkdic-convert): Add "no-reduction" optional argument. (batch-skkdic-convert): Add "--no-reduction" command line argument. --- configure.ac | 7 +++++++ leim/Makefile.in | 4 +++- lisp/international/ja-dic-cnv.el | 26 ++++++++++++++++++-------- 3 files changed, 28 insertions(+), 9 deletions(-) diff --git a/configure.ac b/configure.ac index ed8ec890ac..e28715ad43 100644 --- a/configure.ac +++ b/configure.ac @@ -491,6 +491,7 @@ OPTION_DEFAULT_ON([threads],[don't compile with elisp threading support]) OPTION_DEFAULT_OFF([native-compilation],[compile with Emacs Lisp native compiler support]) OPTION_DEFAULT_OFF([cygwin32-native-compilation],[use native compilation on 32-bit Cygwin]) OPTION_DEFAULT_ON([xinput2],[don't use version 2 of the X Input Extension for input]) +OPTION_DEFAULT_ON([ja-dic-reduction],[don't reduce the Japanese dictionary]) AC_ARG_WITH([file-notification],[AS_HELP_STRING([--with-file-notification=LIB], [use a file notification library (LIB one of: yes, inotify, kqueue, gfile, w32, no)])], @@ -6491,6 +6492,7 @@ AS_ECHO([" Does Emacs use -lXaw3d? ${HAVE_XAW3D Which dumping strategy does Emacs use? ${with_dumping} Does Emacs have native lisp compiler? ${HAVE_NATIVE_COMP} Does Emacs use version 2 of the the X Input Extension? ${HAVE_XINPUT2} + Does Emacs reduce the Japanese dictionary? ${with_ja_dic_reduction} "]) if test -n "${EMACSDATA}"; then @@ -6589,6 +6591,11 @@ SUBDIR_MAKEFILES_IN=`echo " ${SUBDIR_MAKEFILES}" | sed -e 's| | $(srcdir)/|g' -e AC_SUBST(SUBDIR_MAKEFILES_IN) +if test "$with_ja_dic_reduction" = "no"; then + JA_DIC_NO_REDUCTION_OPTION=--no-reduction +fi +AC_SUBST([JA_DIC_NO_REDUCTION_OPTION]) + dnl You might wonder (I did) why epaths.h is generated by running make, dnl rather than just letting configure generate it from epaths.in. dnl One reason is that the various paths are not fully expanded (see above); diff --git a/leim/Makefile.in b/leim/Makefile.in index 3b4216c0b8..f1a476a035 100644 --- a/leim/Makefile.in +++ b/leim/Makefile.in @@ -32,6 +32,8 @@ leimdir = ${srcdir}/../lisp/leim EXEEXT = @EXEEXT@ +JA_DIC_NO_REDUCTION_OPTION = @JA_DIC_NO_REDUCTION_OPTION@ + -include ${top_builddir}/src/verbose.mk # Prevent any settings in the user environment causing problems. @@ -134,7 +136,7 @@ generate-ja-dic: ${leimdir}/ja-dic/ja-dic.el ${leimdir}/ja-dic/ja-dic.el: $(srcdir)/SKK-DIC/SKK-JISYO.L $(AM_V_GEN)$(RUN_EMACS) -batch -l ja-dic-cnv \ --eval "(setq max-specpdl-size 5000)" \ - -f batch-skkdic-convert -dir "$(leimdir)/ja-dic" "$<" + -f batch-skkdic-convert -dir "$(leimdir)/ja-dic" $(JA_DIC_NO_REDUCTION_OPTION) "$<" ${srcdir}/../lisp/language/pinyin.el: ${srcdir}/MISC-DIC/pinyin.map $(AM_V_GEN)${RUN_EMACS} -l titdic-cnv -f pinyin-convert $< $@ diff --git a/lisp/international/ja-dic-cnv.el b/lisp/international/ja-dic-cnv.el index 704f1a1ae6..7d3103fd8d 100644 --- a/lisp/international/ja-dic-cnv.el +++ b/lisp/international/ja-dic-cnv.el @@ -295,7 +295,7 @@ (setq skkdic-okuri-nasi-entries-count (length skkdic-okuri-nasi-entries)) (progress-reporter-done progress)))) -(defun skkdic-convert-okuri-nasi (skkbuf buf) +(defun skkdic-convert-okuri-nasi (skkbuf buf no-reduction) (with-current-buffer buf (insert ";; Setting okuri-nasi entries.\n" "(skkdic-set-okuri-nasi\n") @@ -311,7 +311,9 @@ (setq count (1+ count)) (progress-reporter-update progress count) (if (setq candidates - (skkdic-reduced-candidates skkbuf kana candidates)) + (if no-reduction + candidates + (skkdic-reduced-candidates skkbuf kana candidates))) (progn (insert "\"" kana) (while candidates @@ -322,10 +324,12 @@ (progress-reporter-done progress)) (insert ")\n\n"))) -(defun skkdic-convert (filename &optional dirname) +(defun skkdic-convert (filename &optional dirname no-reduction) "Generate Emacs Lisp file from Japanese dictionary file FILENAME. The format of the dictionary file should be the same as SKK dictionaries. -Saves the output as `ja-dic-filename', in directory DIRNAME (if specified)." +Saves the output as `ja-dic-filename', in directory DIRNAME (if specified). +When NO-REDUCTION is t, then the dictionary is not reduced. +" (interactive "FSKK dictionary file: ") (let* ((skkbuf (get-buffer-create " *skkdic-unannotated*")) (buf (get-buffer-create "*skkdic-work*"))) @@ -389,7 +393,7 @@ Saves the output as `ja-dic-filename', in directory DIRNAME (if specified)." (skkdic-collect-okuri-nasi) ;; Convert okuri-nasi general entries. - (skkdic-convert-okuri-nasi skkbuf buf) + (skkdic-convert-okuri-nasi skkbuf buf no-reduction) ;; Postfix (with-current-buffer buf @@ -427,15 +431,21 @@ To get complete usage, invoke: (message "To convert SKK-JISYO.L into skkdic.el:") (message " %% emacs -batch -l ja-dic-cnv -f batch-skkdic-convert SKK-JISYO.L") (message "To convert SKK-JISYO.L into DIR/ja-dic.el:") - (message " %% emacs -batch -l ja-dic-cnv -f batch-skkdic-convert -dir DIR SKK-JISYO.L")) - (let (targetdir filename) + (message " %% emacs -batch -l ja-dic-cnv -f batch-skkdic-convert -dir DIR SKK-JISYO.L") + (message "To convert SKK-JISYO.L into skkdic.el without reduction:") + (message " %% emacs -batch -l ja-dic-cnv -f batch-skkdic-convert SKK-JISYO.L --no-reduction")) + (let (targetdir filename no-reduction) (if (string= (car command-line-args-left) "-dir") (progn (setq command-line-args-left (cdr command-line-args-left)) (setq targetdir (expand-file-name (car command-line-args-left))) (setq command-line-args-left (cdr command-line-args-left)))) + (if (string= (car command-line-args-left) "--no-reduction") + (progn + (setq no-reduction t) + (setq command-line-args-left (cdr command-line-args-left)))) (setq filename (expand-file-name (car command-line-args-left))) - (skkdic-convert filename targetdir))) + (skkdic-convert filename targetdir no-reduction))) (kill-emacs 0)) -- 2.36.1 --=-=-= Content-Type: text/plain; charset=iso-2022-jp By the way, if I may be honest, I would like to remove this reduction process. "名古屋" (Nagoya) [0] is the name of one of Japan's major cities and is a proper noun. I don't think most people, myself included, recognize that the word is a composite of "名古" and "屋". I am Japanese, so my sense may be different, but I recognize "New York" as one word and "Spider-man" as one word. In other words, instead of converting "名古" and "屋" respectively, we want to convert "名古屋" as it is. It is stressful to have to separate the words I imagine in my head from the words I use in Kanji conversion. I would like to reduce that frequency at least a little. Although the skkdic-reduced-candidates function mechanically eliminates words that can be entered by combining them with other words, it does not judge the importance of words, so even frequently used words like " 名古屋" are eliminated. That is very inconvenient. My concern is that Emacs' standard Kanji conversion engine will be regarded as useless. Despite being based on a dictionary with a sufficient vocabulary (SKK-JISYO.L), it generates an inconvenient dictionary by the reduction process. Most of the people who rated Emacs' standard kanji conversion engine as useless are probably unaware of this fact. I also rated the standard Emacs kanji conversion engine as useless. Because I did not know that fact. However, when I learned the facts, I realized that this was a misunderstanding and that I had disrespectful feelings toward Emacs. This is simply a disrepute due to misunderstanding. The reduction of dictionaries would reduce the file size by less than half. While significant, how important is this in today's computing environment? In my personal opinion, I feel that reducing the vocabulary of the dictionary has more disadvantages than advantages. My English is not very good, so I apologize if I did not convey my intentions. [0]: https://en.wikipedia.org/wiki/Nagoya Best Regards, -- Taiju --=-=-=--