From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Taiju HIGASHI Newsgroups: gmane.emacs.devel Subject: Re: [PATCH] Add an option to not reduce vocabulary of the Japanese Date: Tue, 07 Jun 2022 09:47:31 +0900 Message-ID: <87v8td710c.fsf@taiju.info> References: <87r142ypnq.fsf@gnu.org> <878rqa9cm8.fsf@taiju.info> <87k09tubfa.fsf@gnus.org> <837d5t98qq.fsf@gnu.org> <87h74x96em.fsf@taiju.info> <834k0x93ra.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="1174"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.1 (gnu/linux) Cc: larsi@gnus.org, handa@gnu.org, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Jun 07 02:48:56 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nyNOy-00005P-KC for ged-emacs-devel@m.gmane-mx.org; Tue, 07 Jun 2022 02:48:56 +0200 Original-Received: from localhost ([::1]:53242 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nyNOx-0005U8-At for ged-emacs-devel@m.gmane-mx.org; Mon, 06 Jun 2022 20:48:55 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:46238) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nyNNl-0004jk-2C for emacs-devel@gnu.org; Mon, 06 Jun 2022 20:47:41 -0400 Original-Received: from mail-pg1-x533.google.com ([2607:f8b0:4864:20::533]:33293) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nyNNg-00088Y-Ez for emacs-devel@gnu.org; Mon, 06 Jun 2022 20:47:39 -0400 Original-Received: by mail-pg1-x533.google.com with SMTP id r71so14383287pgr.0 for ; Mon, 06 Jun 2022 17:47:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=taiju-info.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=Yu0XvGc9hht64qAw5t0JhhXxcIC+z5SM4/UVB7ew2GE=; b=Qxk/dBmR+dN4kE/FWcx7qnMPXlZ0d2LsiElRVn3EseeDykjeTRnI28F7zymKa2Bj5v eXrngup90j9y+LElvp184l8tI8TXvg80gUybHYLlvQD4fyw+WMnUh/VqzAhTF9VhLcBH Q9qaO7CBCUN5a5eQS1GhyfTkM3tx6VylKs0hCWMvzFeU28mVLB+1vfcLLTDykN79bKCd 9lN/oVSA8inm1TikQTfaWAXnl6zCZ47DbxNA2iagO+GlFRj94mf9Dvr98g8ylo5fCQMS fm9A0XBUg/FBq0E67WfVLOyzPhDX3mu8y1uHgFuZP8YsjTkfZoy4sMPUVdHHhdQcqaqF PrUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version; bh=Yu0XvGc9hht64qAw5t0JhhXxcIC+z5SM4/UVB7ew2GE=; b=EWKy4mNjCECANLulYxjathtyVOIWRmiorwrpSMQuCXBpIhDRkYTVzfesIFmmYfn6Ca AIorJt6uNYL/mXzId2lUFC6WHDLTXdxDggbO00Sw6qYFuTOK3JVajXBuTHmUu2yqpsbp BL7TXRNLyjn79fCGF2xdYZxjl+TndZLLFk3EERWzhnox62O8dr1UTr1/PkR3IUJAlCdu znrywn2669uKarsQAT8trmxRyE2psNXKIE6SYchq7rzwZ3h5wDFh80m1mQeop6v6fKb3 JiBU4J6Zr5E0O60EQiMv3hfCgGrKc0oJohx38s2O0WA7LbvLBE4mCdGRB7bqMENCclLs Nt2Q== X-Gm-Message-State: AOAM531eB9YKzh+S6eeqHeizoY/LK5+OW7Vx5/hHiWDYTg4hXdYOdK93 nMV2UfML0+in+ZrUvM8LBaxuQ3mglqqdaDgcKy8= X-Google-Smtp-Source: ABdhPJxP3u2FAxZLO1oiqJCKm5qB63EMvTrU0R+0hJMHsUpVlQFqEYGuLY2bq4q6BYEj7rYfryytsA== X-Received: by 2002:a65:5a43:0:b0:3f2:779e:8bf2 with SMTP id z3-20020a655a43000000b003f2779e8bf2mr23447338pgs.584.1654562854539; Mon, 06 Jun 2022 17:47:34 -0700 (PDT) Original-Received: from Taix ([240b:253:ec40:2400:b7d1:436e:2d61:e925]) by smtp.gmail.com with ESMTPSA id j12-20020a170902c3cc00b00163ebd0f16asm11146155plj.78.2022.06.06.17.47.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Jun 2022 17:47:33 -0700 (PDT) In-Reply-To: <834k0x93ra.fsf@gnu.org> (Eli Zaretskii's message of "Mon, 06 Jun 2022 19:05:13 +0300") Received-SPF: none client-ip=2607:f8b0:4864:20::533; envelope-from=higashi@taiju.info; helo=mail-pg1-x533.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:290818 Archived-At: --=-=-= Content-Type: text/plain >> Based on the totality of the discussion so far, would the following >> policy be the best? >> >> 1. make the build-time option selectable to install a dictionary with a >> reduced vocabulary >> 2. install dictionaries without reduced vocabulary by default >> 3. make it possible to regenerate dictionaries from make or Emacs >> command. > > I think 2+3 is the best. I attached the v3 patch. --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=v3-0001-The-vocabulary-in-ja-dic.el-should-not-be-reduced.patch Content-Description: v3 patch >From 615f07c0a6c565b70838834d319ff97407d1e59c Mon Sep 17 00:00:00 2001 From: Taiju HIGASHI Date: Tue, 7 Jun 2022 09:21:10 +0900 Subject: [PATCH v3] The vocabulary in ja-dic.el should not be reduced by default. * configure.ac: Add "with-small-ja-dic" configure option. * leim/Makefile.in (${leimdir}/ja-dic/ja-dic.el): Change the build method depending on whether or not the with-small-ja-dic option is specified. * lisp/international/ja-dic-cnv.el (skkdic-convert-okuri-nasi): Add "with-reduction" argument and change "skkdic-reduced-candidates" not to be called if the "with-reduction" argument is unspecified. (breaking changes) (skkdic-convert): Add "with-reduction" optional argument. (batch-skkdic-convert): Add "--with-reduction" command line argument. --- configure.ac | 5 +++++ leim/Makefile.in | 8 +++++++- lisp/international/ja-dic-cnv.el | 26 ++++++++++++++++++-------- 3 files changed, 30 insertions(+), 9 deletions(-) diff --git a/configure.ac b/configure.ac index 313a1436b5..3e6eab94f8 100644 --- a/configure.ac +++ b/configure.ac @@ -491,6 +491,7 @@ OPTION_DEFAULT_ON([threads],[don't compile with elisp threading support]) OPTION_DEFAULT_OFF([native-compilation],[compile with Emacs Lisp native compiler support]) OPTION_DEFAULT_OFF([cygwin32-native-compilation],[use native compilation on 32-bit Cygwin]) OPTION_DEFAULT_ON([xinput2],[don't use version 2 of the X Input Extension for input]) +OPTION_DEFAULT_OFF([small-ja-dic],[generate a small-sized Japanese dictionary]) AC_ARG_WITH([file-notification],[AS_HELP_STRING([--with-file-notification=LIB], [use a file notification library (LIB one of: yes, inotify, kqueue, gfile, w32, no)])], @@ -6492,6 +6493,7 @@ AS_ECHO([" Does Emacs use -lXaw3d? ${HAVE_XAW3D Which dumping strategy does Emacs use? ${with_dumping} Does Emacs have native lisp compiler? ${HAVE_NATIVE_COMP} Does Emacs use version 2 of the the X Input Extension? ${HAVE_XINPUT2} + Should Emacs use a small-sized Japanese dictionary? ${with_small_ja_dic} "]) if test -n "${EMACSDATA}"; then @@ -6590,6 +6592,9 @@ SUBDIR_MAKEFILES_IN=`echo " ${SUBDIR_MAKEFILES}" | sed -e 's| | $(srcdir)/|g' -e AC_SUBST(SUBDIR_MAKEFILES_IN) +SMALL_JA_DIC=$with_small_ja_dic +AC_SUBST(SMALL_JA_DIC) + dnl You might wonder (I did) why epaths.h is generated by running make, dnl rather than just letting configure generate it from epaths.in. dnl One reason is that the various paths are not fully expanded (see above); diff --git a/leim/Makefile.in b/leim/Makefile.in index 3b4216c0b8..a256ca539b 100644 --- a/leim/Makefile.in +++ b/leim/Makefile.in @@ -32,6 +32,12 @@ leimdir = ${srcdir}/../lisp/leim EXEEXT = @EXEEXT@ +SMALL_JA_DIC = @SMALL_JA_DIC@ +JA_DIC_REDUCTION_OPTION = +ifeq ($(SMALL_JA_DIC), yes) + JA_DIC_REDUCTION_OPTION = --with-reduction +endif + -include ${top_builddir}/src/verbose.mk # Prevent any settings in the user environment causing problems. @@ -134,7 +140,7 @@ generate-ja-dic: ${leimdir}/ja-dic/ja-dic.el ${leimdir}/ja-dic/ja-dic.el: $(srcdir)/SKK-DIC/SKK-JISYO.L $(AM_V_GEN)$(RUN_EMACS) -batch -l ja-dic-cnv \ --eval "(setq max-specpdl-size 5000)" \ - -f batch-skkdic-convert -dir "$(leimdir)/ja-dic" "$<" + -f batch-skkdic-convert -dir "$(leimdir)/ja-dic" $(JA_DIC_REDUCTION_OPTION) "$<" ${srcdir}/../lisp/language/pinyin.el: ${srcdir}/MISC-DIC/pinyin.map $(AM_V_GEN)${RUN_EMACS} -l titdic-cnv -f pinyin-convert $< $@ diff --git a/lisp/international/ja-dic-cnv.el b/lisp/international/ja-dic-cnv.el index 7f7c0261dc..7451773912 100644 --- a/lisp/international/ja-dic-cnv.el +++ b/lisp/international/ja-dic-cnv.el @@ -295,7 +295,7 @@ (setq skkdic-okuri-nasi-entries-count (length skkdic-okuri-nasi-entries)) (progress-reporter-done progress)))) -(defun skkdic-convert-okuri-nasi (skkbuf buf) +(defun skkdic-convert-okuri-nasi (skkbuf buf with-reduction) (with-current-buffer buf (insert ";; Setting okuri-nasi entries.\n" "(skkdic-set-okuri-nasi\n") @@ -311,7 +311,9 @@ (setq count (1+ count)) (progress-reporter-update progress count) (if (setq candidates - (skkdic-reduced-candidates skkbuf kana candidates)) + (if with-reduction + (skkdic-reduced-candidates skkbuf kana candidates) + candidates)) (progn (insert "\"" kana) (while candidates @@ -322,10 +324,12 @@ (progress-reporter-done progress)) (insert ")\n\n"))) -(defun skkdic-convert (filename &optional dirname) +(defun skkdic-convert (filename &optional dirname with-reduction) "Generate Emacs Lisp file from Japanese dictionary file FILENAME. The format of the dictionary file should be the same as SKK dictionaries. -Saves the output as `ja-dic-filename', in directory DIRNAME (if specified)." +Saves the output as `ja-dic-filename', in directory DIRNAME (if specified). +When WITH-REDUCTION is t, then reduce dictionary vocabulary. +" (interactive "FSKK dictionary file: ") (let* ((skkbuf (get-buffer-create " *skkdic-unannotated*")) (buf (get-buffer-create "*skkdic-work*"))) @@ -389,7 +393,7 @@ Saves the output as `ja-dic-filename', in directory DIRNAME (if specified)." (skkdic-collect-okuri-nasi) ;; Convert okuri-nasi general entries. - (skkdic-convert-okuri-nasi skkbuf buf) + (skkdic-convert-okuri-nasi skkbuf buf with-reduction) ;; Postfix (with-current-buffer buf @@ -427,15 +431,21 @@ To get complete usage, invoke: (message "To convert SKK-JISYO.L into skkdic.el:") (message " %% emacs -batch -l ja-dic-cnv -f batch-skkdic-convert SKK-JISYO.L") (message "To convert SKK-JISYO.L into DIR/ja-dic.el:") - (message " %% emacs -batch -l ja-dic-cnv -f batch-skkdic-convert -dir DIR SKK-JISYO.L")) - (let (targetdir filename) + (message " %% emacs -batch -l ja-dic-cnv -f batch-skkdic-convert -dir DIR SKK-JISYO.L") + (message "To convert SKK-JISYO.L into skkdic.el with reduce dictionary vocabulary:") + (message " %% emacs -batch -l ja-dic-cnv -f batch-skkdic-convert SKK-JISYO.L --with-reduction")) + (let (targetdir filename with-reduction) (if (string= (car command-line-args-left) "-dir") (progn (setq command-line-args-left (cdr command-line-args-left)) (setq targetdir (expand-file-name (car command-line-args-left))) (setq command-line-args-left (cdr command-line-args-left)))) + (if (string= (car command-line-args-left) "--with-reduction") + (progn + (setq with-reduction t) + (setq command-line-args-left (cdr command-line-args-left)))) (setq filename (expand-file-name (car command-line-args-left))) - (skkdic-convert filename targetdir))) + (skkdic-convert filename targetdir with-reduction))) (kill-emacs 0)) -- 2.36.1 --=-=-= Content-Type: text/plain I thought that if reducing vocabulary is an option, some people might question what reducing vocabulary means. So, to make it easier to convey the intent, I changed the configure option to with-small-ja-dic and also changed the description. Please point out if the original is better. Thanks, -- Taiju --=-=-=--