* Introduce pyim is a Chinese input method support quanpin, shuangpin, wubi and cangjie More details: https://github.com/tumashu/pyim this package may be useful to Chinese emacser :-) * GPL paper status: https://github.com/tumashu/pyim/issues/376 * Need help pyim-pymap.el's (https://github.com/tumashu/pyim/blob/master/pyim-pymap.el) Chinese char pinyin info come from rime's luna-pinyin-simple. (https://github.com/rime/rime-luna-pinyin) while lun-pinyin-simple's Chinese char pinyin come from android project http://android.git.kernel.org/?p=platform/packages/inputmethods/PinyinIME.git I do not know how to deal with this issue :-\
At 2021-02-22 10:42:30, "tumashu" <tumashu@163.com> wrote: > >* Introduce > >pyim is a Chinese input method support quanpin, shuangpin, wubi and cangjie >More details: https://github.com/tumashu/pyim > > > >this package may be useful to Chinese emacser :-) > > > >* GPL paper status: >https://github.com/tumashu/pyim/issues/376 > > > >* Need help > >pyim-pymap.el's (https://github.com/tumashu/pyim/blob/master/pyim-pymap.el) > > Chinese char pinyin info come from rime's luna-pinyin-simple. (https://github.com/rime/rime-luna-pinyin) > >while lun-pinyin-simple's Chinese char pinyin come from android project > >http://android.git.kernel.org/?p=platform/packages/inputmethods/PinyinIME.git > >I do not know how to deal with this issue :-\ > The situation is similar: http://git.savannah.gnu.org/cgit/emacs.git/tree/leim/MISC-DIC/pinyin.map?id=0d8e15757ed610bbe1833b7540006bbf7363c776 > > > > >
> pyim is a Chinese input method support quanpin, shuangpin, wubi and cangjie > More details: https://github.com/tumashu/pyim Yes, it would be a nice addition to GNU ELPA. > * GPL paper status: > https://github.com/tumashu/pyim/issues/376 There's a fair bit of that discussion written in a language I sadly can't read. What's still missing? > * Need help > > pyim-pymap.el's (https://github.com/tumashu/pyim/blob/master/pyim-pymap.el) > > Chinese char pinyin info come from rime's > luna-pinyin-simple. (https://github.com/rime/rime-luna-pinyin) > > while lun-pinyin-simple's Chinese char pinyin come from android project > > http://android.git.kernel.org/?p=platform/packages/inputmethods/PinyinIME.git > > I do not know how to deal with this issue :-\ This is quite outside of my area of expertise, so I don't really understand what this is about, but if the issue is to find a pinyin table, then Emacs already comes with one, so we could both arrange for Emacs to make it available in a more convenient form (currently it's only available in the form of an input method, which may be somewhat inconvenient to work with), and we could create a GNU ELPA package containing it (or just add it to pyim) for use with older Emacsen. Stefan
>There's a fair bit of that discussion written in a language I sadly >can't read. What's still missing? Not missing at the moment :-) I think. >> * Need help >> >> pyim-pymap.el's (https://github.com/tumashu/pyim/blob/master/pyim-pymap.el) >> >> Chinese char pinyin info come from rime's >> luna-pinyin-simple. (https://github.com/rime/rime-luna-pinyin) >> >> while lun-pinyin-simple's Chinese char pinyin come from android project >> >> http://android.git.kernel.org/?p=platform/packages/inputmethods/PinyinIME.git >> >> I do not know how to deal with this issue :-\ > >This is quite outside of my area of expertise, so I don't really >understand what this is about, but if the issue is to find a pinyin >table, yes. then Emacs already comes with one, so we could both yes, the problem is that the pinyin.map in emacs do not include many Chinese Char. it is a *small* table :-) pyim use it before. >arrange for Emacs to make it available in a more convenient form >(currently it's only available in the form of an input method, which may >be somewhat inconvenient to work with), and we could create a GNU ELPA >package containing it (or just add it to pyim) for use with older Emacsen. > > > Stefan >
>>There's a fair bit of that discussion written in a language I sadly >>can't read. What's still missing? > Not missing at the moment :-) I think. Even better. > yes, the problem is that the pinyin.map in Emacs do not include many Chinese Char. > it is a *small* table :-) Then we should improve it within Emacs, not just within pyim. Presumably there's an appropriately licensed pinyin map somewhere that's more complete, like maybe one used by some part of Gnome, or something? Stefan
At 2021-02-22 12:42:59, "Stefan Monnier" <monnier@iro.umontreal.ca> wrote: >>>There's a fair bit of that discussion written in a language I sadly >>>can't read. What's still missing? >> Not missing at the moment :-) I think. > >Even better. > >> yes, the problem is that the pinyin.map in Emacs do not include many Chinese Char. >> it is a *small* table :-) > >Then we should improve it within Emacs, not just within pyim. > >Presumably there's an appropriately licensed pinyin map somewhere that's >more complete, like maybe one used by some part of Gnome, or something? > > > Stefan > should We find a GPL license pinyin map? ibus-libpinyin's tabe come from android im too, it is Apache License Version 2.0 https://github.com/libpinyin/libpinyin/blob/master/data/opengram.license
tumashu <tumashu@163.com> writes: > At 2021-02-22 12:42:59, "Stefan Monnier" <monnier@iro.umontreal.ca> wrote: >>>>There's a fair bit of that discussion written in a language I sadly >>>>can't read. What's still missing? >>> Not missing at the moment :-) I think. >> >>Even better. >> >>> yes, the problem is that the pinyin.map in Emacs do not include many Chinese Char. >>> it is a *small* table :-) >> >>Then we should improve it within Emacs, not just within pyim. >> >>Presumably there's an appropriately licensed pinyin map somewhere that's >>more complete, like maybe one used by some part of Gnome, or something? >> >> >> Stefan >> > > should We find a GPL license pinyin map? > > ibus-libpinyin's tabe come from android im too, it is Apache License Version 2.0 > https://github.com/libpinyin/libpinyin/blob/master/data/opengram.license I went a little ways towards this in #34862, but it fizzled out: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=34862 I'd be happy to help pitch in here.
At 2021-02-22 14:31:54, "Eric Abrahamsen" <eric@ericabrahamsen.net> wrote:
>tumashu <tumashu@163.com> writes:
>
>> At 2021-02-22 12:42:59, "Stefan Monnier" <monnier@iro.umontreal.ca> wrote:
>>>>>There's a fair bit of that discussion written in a language I sadly
>>>>>can't read. What's still missing?
>>>> Not missing at the moment :-) I think.
>>>
>>>Even better.
>>>
>>>> yes, the problem is that the pinyin.map in Emacs do not include many Chinese Char.
>>>> it is a *small* table :-)
>>>
>>>Then we should improve it within Emacs, not just within pyim.
>>>
>>>Presumably there's an appropriately licensed pinyin map somewhere that's
>>>more complete, like maybe one used by some part of Gnome, or something?
>>>
>>>
>>> Stefan
>>>
>>
>> should We find a GPL license pinyin map?
>>
>> ibus-libpinyin's tabe come from android im too, it is Apache License Version 2.0
>> https://github.com/libpinyin/libpinyin/blob/master/data/opengram.license
>
>I went a little ways towards this in #34862, but it fizzled out:
>
>https://debbugs.gnu.org/cgi/bugreport.cgi?bug=34862
>
>I'd be happy to help pitch in here.
the real quesiton for me, is that is it acceptable for GNU elpa by import and convert libpinyin's table to pyim.
If acceptable, no other things need to do, If not acceptable, I have replace the current pymap.el to other.
Chinese Char and Pinyin is the core basic of Chinese Language, really have **Copyright issues**? I doubt it.
On 2/22/21 1:49 AM, tumashu wrote:
> the real quesiton for me, is that is it acceptable for GNU elpa by import and convert libpinyin's table to pyim.
Libpinyin seems to be GPLv3; if that is correct, then there is no issue at all and your use is completely acceptable.
At 2021-02-22 16:19:34, "Clément Pit-Claudel" <cpitclaudel@gmail.com> wrote: >On 2/22/21 1:49 AM, tumashu wrote: >> the real quesiton for me, is that is it acceptable for GNU elpa by import and convert libpinyin's table to pyim. > >Libpinyin seems to be GPLv3; if that is correct, then there is no issue at all and your use is completely acceptable. sorry, I remeber wrong: pyim-pymap.el is converted from pinyin_simp.dict.yaml of [[https://github.com/rime/rime-pinyin-simp][rime-pinyin-simp]] rime-pinyin-simp use LGPL-3.0 License, while pinyin_simp.dict.yaml has the header like below: # Rime dictionary # encoding: utf-8 # # A minimal Pinyin dictionary for simplified Chinese script # # Derived from android open source project: # http://android.git.kernel.org/?p=platform/packages/inputmethods/PinyinIME.git # ----------------------------------------------------- the below is convert script. (defun pyim-pymap-new nil) (defun pyim-pymap-importer () "将格式为: 你 ni 我 wo 的文件内容,转换为 pyim 的 pymap 注意: 这个文件中词条的词率由低到高排列。" (interactive) (let ((hashtable (make-hash-table :size 1000000 :test #'equal)) alist) (while (not (eobp)) (let* ((begin (line-beginning-position)) (end (line-end-position)) (items (split-string (buffer-substring-no-properties begin end) " ")) (word (nth 0 items)) (pinyin (nth 1 items))) (when (and pinyin word) (puthash pinyin (concat word (replace-regexp-in-string word "" (or (gethash pinyin hashtable) ""))) hashtable))) (forward-line 1)) (maphash #'(lambda (key value) (push (list key value) alist)) hashtable) (setq pyim-pymap-new (seq-sort #'(lambda (a b) (string< (car a) (car b))) alist))))
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Sun, 21 Feb 2021 22:01:09 -0500
> Cc: "emacs-devel@gnu.org" <emacs-devel@gnu.org>
>
> > pyim is a Chinese input method support quanpin, shuangpin, wubi and cangjie
> > More details: https://github.com/tumashu/pyim
>
> Yes, it would be a nice addition to GNU ELPA.
I'd rather prefer to have input methods bundled with Emacs, not on
ELPA. Two reasons:
. input methods are basic keyboard input, users should have them OOTB
. adding an input method currently requires adding it to
leim-list.el and re-dumping Emacs, so AFAIU at least some
infrastructure is currently missing for installing input methods
outside of the build
(Of course, if we are unfortunate enough to bump into legal matters
which cannot be resolved, then this is a moot point.)
Hi, tumashu. I'm a big fan of pyim input method, I would be very happy to see pyim can be a part of GNU ELPA or even Emacs! I just do some research on the issue we faced. We can extract the bopomofo data from unihan database. For example, the pinyin of CJK character "爱"(means love) is "ai". Can be found at https://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=7231. The full database can be downloaded at https://www.unicode.org/Public/UCD/latest/ucd/Unihan.zip. Unihan database is licensed under Unicode license, a free license approved by FSF(https://www.gnu.org/licenses/license-list.html#Unicode). We can write a simple program to extract the pinyin of needed characters in Unihan_Readings.txt. Unihan database contains almost all reading of CJK character, we have to limit it to a subset. Maybe refer to 《通用汉字规范表》 (Table of General Standard Chinese Characters) http://www.gov.cn/gzdt/att/att/site1/20130819/tygfhzb.pdf. -- Retrieve my PGP public key: gpg --recv-keys D47A9C8B2AE3905B563D9135BE42B352A9F6821F Zihao
[-- Attachment #1: Type: text/plain, Size: 423 bytes --] IMO, the CJK character in BMP is enough for a input method in daily use. It contains "CJK Unified Ideographs"(U+4E00 - U+9FFF) and its A extension(U+3400 - U+4DBF). It'll be perfect if we also support the B extension (U+20000 - U+2A6D6), but rarely used (and even many fonts cannot display these characters). -- Retrieve my PGP public key: gpg --recv-keys D47A9C8B2AE3905B563D9135BE42B352A9F6821F Zihao [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 255 bytes --]
On 2/22/21 11:36 AM, Zhu Zihao wrote: > IMO, the CJK character in BMP is enough for a input method in daily use. > It contains "CJK Unified Ideographs"(U+4E00 - U+9FFF) and its A > extension(U+3400 - U+4DBF). > > It'll be perfect if we also support the B extension (U+20000 - U+2A6D6), > but rarely used (and even many fonts cannot display these characters). If you're looking for good sources of data, MDBG is under CC-BY-SA 4, which is one-way compatible with GPLv3 https://www.mdbg.net/chinese/dictionary?page=cc-cedict
>>>>> On Mon, 22 Feb 2021 17:09:19 +0200, Eli Zaretskii <eliz@gnu.org> said:
Eli> . adding an input method currently requires adding it to
Eli> leim-list.el and re-dumping Emacs, so AFAIU at least some
Eli> infrastructure is currently missing for installing input methods
Eli> outside of the build
'quail-define-package' and 'quail-define-rules' work perfectly well to
define private input methods, so Iʼm missing something here.
Robert
> From: Robert Pluim <rpluim@gmail.com>
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, tumashu@163.com,
> emacs-devel@gnu.org
> Date: Mon, 22 Feb 2021 18:41:44 +0100
>
> >>>>> On Mon, 22 Feb 2021 17:09:19 +0200, Eli Zaretskii <eliz@gnu.org> said:
>
> Eli> . adding an input method currently requires adding it to
> Eli> leim-list.el and re-dumping Emacs, so AFAIU at least some
> Eli> infrastructure is currently missing for installing input methods
> Eli> outside of the build
>
> 'quail-define-package' and 'quail-define-rules' work perfectly well to
> define private input methods, so Iʼm missing something here.
So you want to do this every session?
>>>>> On Mon, 22 Feb 2021 20:28:02 +0200, Eli Zaretskii <eliz@gnu.org> said:
>> 'quail-define-package' and 'quail-define-rules' work perfectly well to
>> define private input methods, so Iʼm missing something here.
Eli> So you want to do this every session?
Do what every session? They go in .emacs
Robert
> From: Robert Pluim <rpluim@gmail.com>
> Cc: monnier@iro.umontreal.ca, tumashu@163.com, emacs-devel@gnu.org
> Date: Mon, 22 Feb 2021 19:40:44 +0100
>
> >>>>> On Mon, 22 Feb 2021 20:28:02 +0200, Eli Zaretskii <eliz@gnu.org> said:
>
> >> 'quail-define-package' and 'quail-define-rules' work perfectly well to
> >> define private input methods, so Iʼm missing something here.
>
> Eli> So you want to do this every session?
>
> Do what every session? They go in .emacs
Who or what would put them there? We are talking about a package
installation, yes?
And what about register-input-method?
> And what about register-input-method?
pyim does:
;;;###autoload
(register-input-method "pyim" "euc-cn" 'pyim-start (nth 0 pyim-titles))
is that a problem?
Stefan
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Robert Pluim <rpluim@gmail.com>, tumashu@163.com, emacs-devel@gnu.org
> Date: Mon, 22 Feb 2021 17:27:57 -0500
>
> > And what about register-input-method?
>
> pyim does:
>
> ;;;###autoload
> (register-input-method "pyim" "euc-cn" 'pyim-start (nth 0 pyim-titles))
>
> is that a problem?
I was talking about the installation. What would that have to do, and
how would it do it?
>> > And what about register-input-method?
>>
>> pyim does:
>>
>> ;;;###autoload
>> (register-input-method "pyim" "euc-cn" 'pyim-start (nth 0 pyim-titles))
>>
>> is that a problem?
>
> I was talking about the installation. What would that have to do, and
> how would it do it?
I don't understand the question: the above line shows that the package
autoloads this `register-input-method` call. IOW if the user has that
package activated (which is the default for all packages that the user
installed), then the above line will be executed between `early-init.el`
and `init.el`.
Stefan
At 2021-02-23 00:29:42, "Zhu Zihao" <all_but_last@163.com> wrote:
>Hi, tumashu.
>
>I'm a big fan of pyim input method, I would be very happy to see pyim
>can be a part of GNU ELPA or even Emacs!
>
>I just do some research on the issue we faced. We can extract the
>bopomofo data from unihan database.
>
>For example, the pinyin of CJK character "爱"(means love) is "ai". Can be found at
>https://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=7231.
>
>The full database can be downloaded at
>https://www.unicode.org/Public/UCD/latest/ucd/Unihan.zip. Unihan
>database is licensed under Unicode license, a free license approved by
>FSF(https://www.gnu.org/licenses/license-list.html#Unicode).
>
>We can write a simple program to extract the pinyin of needed characters
>in Unihan_Readings.txt. Unihan database contains almost all reading of
>CJK character, we have to limit it to a subset. Maybe refer to
>《通用汉字规范表》 (Table of General Standard Chinese Characters)
>http://www.gov.cn/gzdt/att/att/site1/20130819/tygfhzb.pdf.
>--
>Retrieve my PGP public key:
>
> gpg --recv-keys D47A9C8B2AE3905B563D9135BE42B352A9F6821F
>
>Zihao
>
Today, I let pyim-pymap.el use pinyin.map (include in emacs repo),
If we find other good resource, we will switch to use it :-)
At 2021-02-23 13:07:34, "Stefan Monnier" <monnier@iro.umontreal.ca> wrote:
>>> > And what about register-input-method?
>>>
>>> pyim does:
>>>
>>> ;;;###autoload
>>> (register-input-method "pyim" "euc-cn" 'pyim-start (nth 0 pyim-titles))
>>>
>>> is that a problem?
>>
>> I was talking about the installation. What would that have to do, and
>> how would it do it?
>
>I don't understand the question: the above line shows that the package
>autoloads this `register-input-method` call. IOW if the user has that
>package activated (which is the default for all packages that the user
>installed), then the above line will be executed between `early-init.el`
>and `init.el`.
>
>
> Stefan
pyim use quail, so it should register it to quail before use.
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: rpluim@gmail.com, tumashu@163.com, emacs-devel@gnu.org
> Date: Tue, 23 Feb 2021 00:07:34 -0500
>
> >> ;;;###autoload
> >> (register-input-method "pyim" "euc-cn" 'pyim-start (nth 0 pyim-titles))
> >>
> >> is that a problem?
> >
> > I was talking about the installation. What would that have to do, and
> > how would it do it?
>
> I don't understand the question: the above line shows that the package
> autoloads this `register-input-method` call. IOW if the user has that
> package activated (which is the default for all packages that the user
> installed), then the above line will be executed between `early-init.el`
> and `init.el`.
Is that enough to correctly set up the user's language environment?
Also, leim-list.el is not just for register-input-method calls; even
if pyim doesn't need anything else (does it?), input methods in
general will, so if we want to install input methods from elpa's, we
need to support that somehow.
But in general (see my first point) I think input methods should be in
core, available in "emacs -Q" etc.
>> I don't understand the question: the above line shows that the package >> autoloads this `register-input-method` call. IOW if the user has that >> package activated (which is the default for all packages that the user >> installed), then the above line will be executed between `early-init.el` >> and `init.el`. > Is that enough to correctly set up the user's language environment? I have no idea. I think I'd have to ask Eli or Handa. [ Notice how I carefully phrased the above quoted paragraph such that it only contains facts I know to be true because of how package.el and autoloads work: I know nothing about how input methods are registered and how that might interact with the user's language environment (and to be honest, I must admit that I don't really know what is a language environment either). ] > Also, leim-list.el is not just for register-input-method calls; even > if pyim doesn't need anything else (does it?), input methods in > general will, so if we want to install input methods from elpa's, we > need to support that somehow. I think we should aim to make it possible without too many contortions to distribute and install input methods as external packages. > But in general (see my first point) I think input methods should be in > core, available in "emacs -Q" etc. I tend to agree, but I don't think it makes it less important to support external input methods. Stefan
> From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: rpluim@gmail.com, tumashu@163.com, emacs-devel@gnu.org > Date: Tue, 23 Feb 2021 11:17:57 -0500 > > >> I don't understand the question: the above line shows that the package > >> autoloads this `register-input-method` call. IOW if the user has that > >> package activated (which is the default for all packages that the user > >> installed), then the above line will be executed between `early-init.el` > >> and `init.el`. > > Is that enough to correctly set up the user's language environment? > > I have no idea. I think I'd have to ask Eli or Handa. Then let's ask them, by all means. > > Also, leim-list.el is not just for register-input-method calls; even > > if pyim doesn't need anything else (does it?), input methods in > > general will, so if we want to install input methods from elpa's, we > > need to support that somehow. > > I think we should aim to make it possible without too many contortions > to distribute and install input methods as external packages. I'm fine with that, but first we need to know what those contortions are. So let's ask Eli or Handa about that as well. > > But in general (see my first point) I think input methods should be in > > core, available in "emacs -Q" etc. > > I tend to agree, but I don't think it makes it less important to support > external input methods. I'm fine with having such support, yes.
Eli Zaretskii <eliz@gnu.org> writes:
> But in general (see my first point) I think input methods should be in
> core, available in "emacs -Q" etc.
Especially in case of Chinese input methods, the current builtin
chinese-py input method is quite poor, would love to see that being
updated/replaced by pyim.
--
William