unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Building intermediate Chinese language romanization alists
@ 2019-01-15 21:19 Eric Abrahamsen
  2019-01-16  0:09 ` Karl Fogel
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Eric Abrahamsen @ 2019-01-15 21:19 UTC (permalink / raw)
  To: emacs-devel

Hi,

I often would like to get access to the correspondences between
romanized Chinese, and Chinese characters. E.g., in the pinyin
romanization method, the string "zhong" can map to any of the characters
"中种重众终钟忠衷肿仲锺踵盅冢忪舯螽". This is useful for creating
language utilities, and other people have put together their own
correspondences for their own purposes[1].

Emacs ships with several of these mappings (though I understand they are
not included in the distribution), which are used to build the relevant
input methods. In the case of pinyin, the text
file ./leim/MISC-DIC/pinyin.map is converted with `titdic-convert' into
the file ./lisp/leim/quail/PY.el.

PY.el is automatically generated (by the function `py-converter' in
titdic-cnv.el): the mapping in pinyin.map is directly inserted into the
generated file, then wrapped in quotes and parens, to construct a call
to `quail-define-rules'.

I might be able to get the map back out of quail somehow, but since this
seems to be something that more than a few people would like access to,
I wonder if it would be acceptable to add an intermediary step, creating
(for instance) a defconst called `pinyin-map-alist' that holds the
contents of pinyin.map, and then changing the `quail-define-rules' call
to:

(apply #'quail-define-rules pinyin-map-alist)

The input method wouldn't be affected, but we'd have access to the
mapping via the constant, which would be very useful.

Pinyin would be the most useful romanization method to do this for, but
it looks like the CTLau and possibly ziranma methods might benefit from
similar treatment.

(Another issue is that if the constant is written into PY.el, which
isn't a library, it might be a bit difficult to get out again, but
perhaps the defconst could be appended to one
of./lisp/language/{chinese.el,china-util.el}. Or PY.el could be made a
library.)

I'm not entirely familiar with the language-related build process, but I
hope there might be an appropriate stage at which to hang the alist on a
variable name.

Thanks,
Eric

[1]: https://github.com/tumashu/pyim/blob/master/pyim-pymap.el




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Building intermediate Chinese language romanization alists
  2019-01-15 21:19 Building intermediate Chinese language romanization alists Eric Abrahamsen
@ 2019-01-16  0:09 ` Karl Fogel
  2019-01-16  0:26   ` Eric Abrahamsen
  2019-01-16  0:23 ` Karl Fogel
  2019-01-20  6:01 ` Feng Shu
  2 siblings, 1 reply; 8+ messages in thread
From: Karl Fogel @ 2019-01-16  0:09 UTC (permalink / raw)
  To: Eric Abrahamsen; +Cc: emacs-devel

Eric Abrahamsen <eric@ericabrahamsen.net> writes:
>I might be able to get the map back out of quail somehow, but since this
>seems to be something that more than a few people would like access to,
>I wonder if it would be acceptable to add an intermediary step, creating
>(for instance) a defconst called `pinyin-map-alist' that holds the
>contents of pinyin.map, and then changing the `quail-define-rules' call
>to:

I think I've done the reverse-mapping you want.  In my .emacs (http://svn.red-bean.com/repos/kfogel/trunk/.emacs), start at the function `kf-pinyin-from-char'.  See also the code to build `kf-quail-inverted-map' right above it.

I'm happy to answer any questions, of course.

Best regards and 新年快乐,
-Karl

>(apply #'quail-define-rules pinyin-map-alist)
>
>The input method wouldn't be affected, but we'd have access to the
>mapping via the constant, which would be very useful.
>
>Pinyin would be the most useful romanization method to do this for, but
>it looks like the CTLau and possibly ziranma methods might benefit from
>similar treatment.
>
>(Another issue is that if the constant is written into PY.el, which
>isn't a library, it might be a bit difficult to get out again, but
>perhaps the defconst could be appended to one
>of./lisp/language/{chinese.el,china-util.el}. Or PY.el could be made a
>library.)
>
>I'm not entirely familiar with the language-related build process, but I
>hope there might be an appropriate stage at which to hang the alist on a
>variable name.
>
>Thanks,
>Eric
>
>[1]: https://github.com/tumashu/pyim/blob/master/pyim-pymap.el



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Building intermediate Chinese language romanization alists
  2019-01-15 21:19 Building intermediate Chinese language romanization alists Eric Abrahamsen
  2019-01-16  0:09 ` Karl Fogel
@ 2019-01-16  0:23 ` Karl Fogel
  2019-01-16  0:29   ` Eric Abrahamsen
  2019-01-20  6:01 ` Feng Shu
  2 siblings, 1 reply; 8+ messages in thread
From: Karl Fogel @ 2019-01-16  0:23 UTC (permalink / raw)
  To: Eric Abrahamsen; +Cc: emacs-devel

>Eric Abrahamsen <eric@ericabrahamsen.net> writes:
>>I might be able to get the map back out of quail somehow, but since this
>>seems to be something that more than a few people would like access to,
>>I wonder if it would be acceptable to add an intermediary step, creating
>>(for instance) a defconst called `pinyin-map-alist' that holds the
>>contents of pinyin.map, and then changing the `quail-define-rules' call
>>to:
>
>I think I've done the reverse-mapping you want.  In my .emacs >(http://svn.red-bean.com/repos/kfogel/trunk/.emacs), start at the function >`kf-pinyin-from-char'.  See also the code to build `kf-quail-inverted-map' >right above it.

Ah, I meant to preface that response with "I don't know if this is useful to your purposes, but... [etc]"

Basically I was just saying "Yes, you can get the map back out of quail", by providing you an existence proof :-).  

I totally agree with you that it would be better for Emacs to offer the bidirectional mapping by default.  I don't know enough to comment on whether your proposed method is a good way to do this.  (I wrote that inversion code in 2004, and although I use it often, I haven't had to hack on it since then or explore quail mappings further.)

Best regards,
-Karl

>>(apply #'quail-define-rules pinyin-map-alist)
>>
>>The input method wouldn't be affected, but we'd have access to the
>>mapping via the constant, which would be very useful.
>>
>>Pinyin would be the most useful romanization method to do this for, but
>>it looks like the CTLau and possibly ziranma methods might benefit from
>>similar treatment.
>>
>>(Another issue is that if the constant is written into PY.el, which
>>isn't a library, it might be a bit difficult to get out again, but
>>perhaps the defconst could be appended to one
>>of./lisp/language/{chinese.el,china-util.el}. Or PY.el could be made a
>>library.)
>>
>>I'm not entirely familiar with the language-related build process, but I
>>hope there might be an appropriate stage at which to hang the alist on a
>>variable name.
>>
>>Thanks,
>>Eric
>>
>>[1]: https://github.com/tumashu/pyim/blob/master/pyim-pymap.el



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Building intermediate Chinese language romanization alists
  2019-01-16  0:09 ` Karl Fogel
@ 2019-01-16  0:26   ` Eric Abrahamsen
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Abrahamsen @ 2019-01-16  0:26 UTC (permalink / raw)
  To: Karl Fogel; +Cc: emacs-devel

Karl Fogel <kfogel@red-bean.com> writes:

> Eric Abrahamsen <eric@ericabrahamsen.net> writes:
>>I might be able to get the map back out of quail somehow, but since this
>>seems to be something that more than a few people would like access to,
>>I wonder if it would be acceptable to add an intermediary step, creating
>>(for instance) a defconst called `pinyin-map-alist' that holds the
>>contents of pinyin.map, and then changing the `quail-define-rules' call
>>to:
>
> I think I've done the reverse-mapping you want. In my .emacs
> (http://svn.red-bean.com/repos/kfogel/trunk/.emacs), start at the
> function `kf-pinyin-from-char'. See also the code to build
> `kf-quail-inverted-map' right above it.
>
> I'm happy to answer any questions, of course.
>
> Best regards and 新年快乐,
> -Karl

Interesting... But really this is just a further argument for providing
access to this alist from elisp to begin with -- then none of us would
have to go to this effort!

It's good to see how one might reverse-engineer quail maps, though --
thank you!

祝好,
Eric



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Building intermediate Chinese language romanization alists
  2019-01-16  0:23 ` Karl Fogel
@ 2019-01-16  0:29   ` Eric Abrahamsen
  2019-01-18  0:34     ` Karl Fogel
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Abrahamsen @ 2019-01-16  0:29 UTC (permalink / raw)
  To: emacs-devel

Karl Fogel <kfogel@red-bean.com> writes:

>>Eric Abrahamsen <eric@ericabrahamsen.net> writes:
>>>I might be able to get the map back out of quail somehow, but since this
>>>seems to be something that more than a few people would like access to,
>>>I wonder if it would be acceptable to add an intermediary step, creating
>>>(for instance) a defconst called `pinyin-map-alist' that holds the
>>>contents of pinyin.map, and then changing the `quail-define-rules' call
>>>to:
>>
>>I think I've done the reverse-mapping you want. In my .emacs
>> >(http://svn.red-bean.com/repos/kfogel/trunk/.emacs), start at the
>> function >`kf-pinyin-from-char'. See also the code to build
>> `kf-quail-inverted-map' >right above it.
>
> Ah, I meant to preface that response with "I don't know if this is useful to your purposes, but... [etc]"
>
> Basically I was just saying "Yes, you can get the map back out of quail", by providing you an existence proof :-).  
>
> I totally agree with you that it would be better for Emacs to offer
> the bidirectional mapping by default. I don't know enough to comment
> on whether your proposed method is a good way to do this. (I wrote
> that inversion code in 2004, and although I use it often, I haven't
> had to hack on it since then or explore quail mappings further.)

My proposal would create a pinyin-to-character alist, that's all. If you
wanted character-to-pinyin you'd still have to reverse that, but it
would still be quite a bit simpler than pulling open the quail-map yourself!




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Building intermediate Chinese language romanization alists
  2019-01-16  0:29   ` Eric Abrahamsen
@ 2019-01-18  0:34     ` Karl Fogel
  0 siblings, 0 replies; 8+ messages in thread
From: Karl Fogel @ 2019-01-18  0:34 UTC (permalink / raw)
  To: Eric Abrahamsen; +Cc: emacs-devel

Eric Abrahamsen <eric@ericabrahamsen.net> writes:
>My proposal would create a pinyin-to-character alist, that's all. If
>you wanted character-to-pinyin you'd still have to reverse that, but it
>would still be quite a bit simpler than pulling open the quail-map
>yourself!

Agreed, and I think it's a great idea.  (I guess I just don't need it personally, because I already did it the hard way :-), but it would be a fine thing if it were in Emacs.)



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Building intermediate Chinese language romanization alists
  2019-01-15 21:19 Building intermediate Chinese language romanization alists Eric Abrahamsen
  2019-01-16  0:09 ` Karl Fogel
  2019-01-16  0:23 ` Karl Fogel
@ 2019-01-20  6:01 ` Feng Shu
  2019-01-27  5:50   ` Eric Abrahamsen
  2 siblings, 1 reply; 8+ messages in thread
From: Feng Shu @ 2019-01-20  6:01 UTC (permalink / raw)
  To: Eric Abrahamsen; +Cc: emacs-devel


>
> (apply #'quail-define-rules pinyin-map-alist)
>
> The input method wouldn't be affected, but we'd have access to the
> mapping via the constant, which would be very useful.
>

I agree this idea very much, 3 years ago, pyim do not have
pyim-pymap.el, it use regexp to parse PY.el to get
(CJKchar -> pinyin) map, works but I think this is a ugly
way, so I create pyim-pymap.el based on PY.el

In my opinion, as a CJK input method, PY.el nearly useless
for it is very very hard to use. but pinyin->CJKchar map and
CJK-char->pinyin map very useful, for example, we can create
search tool, which can search CJKchar with pinyin.


> Eric
>
> [1]: https://github.com/tumashu/pyim/blob/master/pyim-pymap.el

-- 




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Building intermediate Chinese language romanization alists
  2019-01-20  6:01 ` Feng Shu
@ 2019-01-27  5:50   ` Eric Abrahamsen
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Abrahamsen @ 2019-01-27  5:50 UTC (permalink / raw)
  To: emacs-devel

"Feng Shu" <tumashu@163.com> writes:

>>
>> (apply #'quail-define-rules pinyin-map-alist)
>>
>> The input method wouldn't be affected, but we'd have access to the
>> mapping via the constant, which would be very useful.
>>
>
> I agree this idea very much, 3 years ago, pyim do not have
> pyim-pymap.el, it use regexp to parse PY.el to get
> (CJKchar -> pinyin) map, works but I think this is a ugly
> way, so I create pyim-pymap.el based on PY.el
>
> In my opinion, as a CJK input method, PY.el nearly useless
> for it is very very hard to use. but pinyin->CJKchar map and
> CJK-char->pinyin map very useful, for example, we can create
> search tool, which can search CJKchar with pinyin.

I've put a possible patch up as bug #34215.

Do note that the pyim-pymap.el mapping is actually much better than the
pinyin.map that comes with Emacs!

Eric




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-01-27  5:50 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-01-15 21:19 Building intermediate Chinese language romanization alists Eric Abrahamsen
2019-01-16  0:09 ` Karl Fogel
2019-01-16  0:26   ` Eric Abrahamsen
2019-01-16  0:23 ` Karl Fogel
2019-01-16  0:29   ` Eric Abrahamsen
2019-01-18  0:34     ` Karl Fogel
2019-01-20  6:01 ` Feng Shu
2019-01-27  5:50   ` Eric Abrahamsen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).