From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Eric Abrahamsen Newsgroups: gmane.emacs.bugs Subject: bug#34215: 27.0.50; Provide elisp access to Chinese pinyin-to-character mapping Date: Tue, 29 Jan 2019 09:48:30 -0800 Message-ID: <87va27cq35.fsf@ericabrahamsen.net> References: <87imyafyts.fsf@ericabrahamsen.net> <83r2cy3y69.fsf@gnu.org> <87a7jmf06v.fsf@ericabrahamsen.net> <83ef8y3r40.fsf@gnu.org> <875zu9gb96.fsf@ericabrahamsen.net> <83a7jl51c5.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="231536"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) To: 34215@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Jan 29 18:50:54 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1goXXB-000y2R-3O for geb-bug-gnu-emacs@m.gmane.org; Tue, 29 Jan 2019 18:50:53 +0100 Original-Received: from localhost ([127.0.0.1]:53126 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1goXX9-00080h-JY for geb-bug-gnu-emacs@m.gmane.org; Tue, 29 Jan 2019 12:50:51 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:61000) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1goXWN-0007XZ-AD for bug-gnu-emacs@gnu.org; Tue, 29 Jan 2019 12:50:05 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1goXWM-0006YU-6P for bug-gnu-emacs@gnu.org; Tue, 29 Jan 2019 12:50:03 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:51508) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1goXWM-0006YC-2U for bug-gnu-emacs@gnu.org; Tue, 29 Jan 2019 12:50:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1goXWL-0000cJ-QI for bug-gnu-emacs@gnu.org; Tue, 29 Jan 2019 12:50:01 -0500 X-Loop: help-debbugs@gnu.org In-Reply-To: <87imyafyts.fsf@ericabrahamsen.net> Resent-From: Eric Abrahamsen Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 29 Jan 2019 17:50:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 34215 X-GNU-PR-Package: emacs X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.15487841962355 (code B ref -1); Tue, 29 Jan 2019 17:50:01 +0000 Original-Received: (at submit) by debbugs.gnu.org; 29 Jan 2019 17:49:56 +0000 Original-Received: from localhost ([127.0.0.1]:50789 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1goXWF-0000bv-W8 for submit@debbugs.gnu.org; Tue, 29 Jan 2019 12:49:56 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:50728) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1goXWE-0000bj-NN for submit@debbugs.gnu.org; Tue, 29 Jan 2019 12:49:55 -0500 Original-Received: from lists.gnu.org ([209.51.188.17]:56077) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1goXW9-0006Uy-A9 for submit@debbugs.gnu.org; Tue, 29 Jan 2019 12:49:49 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:60922) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1goXW8-0007Oy-7O for bug-gnu-emacs@gnu.org; Tue, 29 Jan 2019 12:49:49 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1goXW7-0006UN-BO for bug-gnu-emacs@gnu.org; Tue, 29 Jan 2019 12:49:48 -0500 Original-Received: from [195.159.176.226] (port=58500 helo=blaine.gmane.org) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1goXW6-0006Rd-04 for bug-gnu-emacs@gnu.org; Tue, 29 Jan 2019 12:49:47 -0500 Original-Received: from list by blaine.gmane.org with local (Exim 4.89) (envelope-from ) id 1goXVt-000wSN-An for bug-gnu-emacs@gnu.org; Tue, 29 Jan 2019 18:49:33 +0100 X-Injected-Via-Gmane: http://gmane.org/ Cancel-Lock: sha1:vzuwQR63CAdo5baFnOuTW2CMLHw= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:154867 Archived-At: --=-=-= Content-Type: text/plain Eli Zaretskii writes: >> From: Eric Abrahamsen >> Date: Sun, 27 Jan 2019 11:18:29 -0800 I've attached a diff adding the conversion function itself, but I'm not familiar with makefiles and so far haven't been able to figure out how to call it. It looks like the invocation I want will look like: $(AM_V_GEN)${RUN_EMACS} -l titdic-cnv -f pinyin-convert \ ${srcdir}/MISC-DIC/pinyin.map ${srcdir}/../lisp/language/pinyin.el Where ${srcdir} is the leim directory, but I don't actually know how to get this code called by make... Additionally, I could factor the common code in py-converter and pinyin-convert out into a separate defsubst. >> >> PS: pinyin.map is ancient and is missing a lot of good correspondences. >> >> Google's pinyin input method uses a much larger map, licensed with >> >> Apache v2.0. This[1] seems to indicate that Apache 2.0 is okay for Gnu >> >> projects, maybe we could consider switching to that map? >> > >> > Maybe. Unfortunately, I don't know enough about these input methods >> > to tell whether replacing the file is a good idea. I wonder who can >> > we ask about this. >> >> It's more or less a drop-in replacement -- the format of the data would >> be the same, only a bit more of it. > > I understand, but I wonder if someone could try that for a while and > see if it makes better input method(s), before we decide to import it. FWIW, that mapping is used by the pyim package, which I believe is the most popular pinyin-based Chinese input method out there. I also use it via the system-wide input framework fcitx, and it works very well. >> I'm not sure who is "in charge" of these files, though. > > No one, I'm afraid. Not these days. That's too bad. Eric --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=pinyinconvert.diff diff --git a/lisp/international/titdic-cnv.el b/lisp/international/titdic-cnv.el index 2ce2c527b9..54d9fc6211 100644 --- a/lisp/international/titdic-cnv.el +++ b/lisp/international/titdic-cnv.el @@ -1203,6 +1203,37 @@ batch-miscdic-convert (miscdic-convert filename dir)))) (kill-emacs 0)) +(defun pinyin-convert () + "Convert text file pinyin.map into an elisp library. +The library is named pinyin.el, and contains the constant +`pinyin-character-map'." + (let ((src-file (car command-line-args-left)) + (dst-file (cadr command-line-args-left))) + (with-temp-file dst-file + (insert ";; This file is automatically generated from pinyin.map,\ + by the function pinyin-convert.") + (insert "(defconst pinyin-character-map\n(") + (let ((pos (point))) + (insert-file-contents src-file) + (goto-char pos) + (re-search-forward "^[a-z]") + (beginning-of-line) + (delete-region pos (point)) + (while (not (eobp)) + (insert "(\"") + (skip-chars-forward "a-z") + (insert "\" \"") + (delete-char 1) + (end-of-line) + (while (= (preceding-char) ?\r) + (delete-char -1)) + (insert "\")") + (forward-line 1))) + (insert ")\n\"An alist holding correspondences between pinyin syllables\ + and Chinese characters.\")\n") + (insert "(provide 'pinyin)\n")) + (kill-emacs 0))) + ;; Prevent "Local Variables" above confusing Emacs. --=-=-=--