From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eric Abrahamsen Newsgroups: gmane.emacs.devel Subject: Building intermediate Chinese language romanization alists Date: Tue, 15 Jan 2019 13:19:23 -0800 Message-ID: <87h8e94o10.fsf@ericabrahamsen.net> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1547587064 28903 195.159.176.226 (15 Jan 2019 21:17:44 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 15 Jan 2019 21:17:44 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Jan 15 22:17:40 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gjW5a-0007OG-VQ for ged-emacs-devel@m.gmane.org; Tue, 15 Jan 2019 22:17:39 +0100 Original-Received: from localhost ([127.0.0.1]:59655 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gjW7i-00017N-1r for ged-emacs-devel@m.gmane.org; Tue, 15 Jan 2019 16:19:50 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:47859) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gjW7W-00016P-Lv for emacs-devel@gnu.org; Tue, 15 Jan 2019 16:19:41 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gjW7U-0004AK-Hf for emacs-devel@gnu.org; Tue, 15 Jan 2019 16:19:37 -0500 Original-Received: from [195.159.176.226] (port=41865 helo=blaine.gmane.org) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gjW7U-00048U-82 for emacs-devel@gnu.org; Tue, 15 Jan 2019 16:19:36 -0500 Original-Received: from list by blaine.gmane.org with local (Exim 4.84_2) (envelope-from ) id 1gjW5I-0006zD-CM for emacs-devel@gnu.org; Tue, 15 Jan 2019 22:17:20 +0100 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 50 Original-X-Complaints-To: usenet@blaine.gmane.org Cancel-Lock: sha1:qCY12llR69vOQDpS+KtiKkMwGP4= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 195.159.176.226 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:232372 Archived-At: Hi, I often would like to get access to the correspondences between romanized Chinese, and Chinese characters. E.g., in the pinyin romanization method, the string "zhong" can map to any of the characters "中种重众终钟忠衷肿仲锺踵盅冢忪舯螽". This is useful for creating language utilities, and other people have put together their own correspondences for their own purposes[1]. Emacs ships with several of these mappings (though I understand they are not included in the distribution), which are used to build the relevant input methods. In the case of pinyin, the text file ./leim/MISC-DIC/pinyin.map is converted with `titdic-convert' into the file ./lisp/leim/quail/PY.el. PY.el is automatically generated (by the function `py-converter' in titdic-cnv.el): the mapping in pinyin.map is directly inserted into the generated file, then wrapped in quotes and parens, to construct a call to `quail-define-rules'. I might be able to get the map back out of quail somehow, but since this seems to be something that more than a few people would like access to, I wonder if it would be acceptable to add an intermediary step, creating (for instance) a defconst called `pinyin-map-alist' that holds the contents of pinyin.map, and then changing the `quail-define-rules' call to: (apply #'quail-define-rules pinyin-map-alist) The input method wouldn't be affected, but we'd have access to the mapping via the constant, which would be very useful. Pinyin would be the most useful romanization method to do this for, but it looks like the CTLau and possibly ziranma methods might benefit from similar treatment. (Another issue is that if the constant is written into PY.el, which isn't a library, it might be a bit difficult to get out again, but perhaps the defconst could be appended to one of./lisp/language/{chinese.el,china-util.el}. Or PY.el could be made a library.) I'm not entirely familiar with the language-related build process, but I hope there might be an appropriate stage at which to hang the alist on a variable name. Thanks, Eric [1]: https://github.com/tumashu/pyim/blob/master/pyim-pymap.el