unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld.
@ 2003-05-14 20:03 Hin-Tak Leung
  2003-05-14 20:55 ` Jason Rumney
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Hin-Tak Leung @ 2003-05-14 20:03 UTC (permalink / raw)
  Cc: Richard Stallman

This is not meant to be a flame - Mr Stallman asked me why I would rather
continue to use emacs 19.34 (and fixes its problems) in combination
with an old cemacs elisp script instead of using MULE, and I am writing
this in the hope that MULE will satisfy my editing needs one day. I
am native Chinese, and can also do a small amount of Japanese, so
my experience is probably quite representative.

(1) Associations: the ability to let the user choose the next possible
associated characters. In English, "Search" is often followed by "engine"
"for" or "through". In Chinese, When I type "Leung" (not in common sentence 
vocabulary, but it is a common surname), it is almost certain that I would
follow with the rest of my name. Another example in Chinese, the simplest
one character word "one", is often followed by a few specific characters
to make up small compound units which means "definitely", "a few",
"all", "generally", "once", "unified", "together", "one sided", etc. It
saves a lot of typing by a factor or 2 or 3 in Chinese - It is 3 letters
(for "Leung") then "1" (for choice), instead of the 3+5+4 for the
shortest method (ChangJie). The input of many common phrases which
would normally requires 10+ key strokes, can be shorten by associations.
The facility is available in localised version of MacOS, in English MacOS's
CJK add-on's (I have used the latter, and seen a Japanese friend using
the former), and available in 3rd party add-ons to English MS Windows
for many years. This would most certainly require extending MULE with
the ability of loading distionaries of commonly used phrases in
various languages. And will make the leim package a lot bigger.

(2) Hints: quite similiar to (1), e.g. sometimes I can't quite
remember the code for "Leung", but vaguely know it is "e*f" in
ChangJie. (it is actually 'eif'). On just about any other systems
(MacOS's CJK extensions, etc), the full list is displayed and it
narrows down as the user types, so the user can select the correct
one visually if he can't remember the exact code. On Cxterm,
one can do 'e?f' or 'e??f' to obtain a list of matches.

(3) new input methods, and per-user input methods: adding new input
mapping methods on a per-user basis and make that the default. I know
this is possible in MULE - but the procedure is not in the obvious places.
There are new methods coming out, e.g. new ones developed in view of
handhelds, which only requires the key-pad numeric keys. And there
are personal per-user needs e.g. I might like an enhanced ChangJie
method, which includes a special short-hand for my own name, to
over-ride the system one.

(4) The inability to process part of a file in one encoding and
save it as a binary stream: This might be possible in MULE, but
I can't work out how - MULE seems to insist that I save or
convert documents into its internal representation. e.g. I have a file,
part of which is in GB2312, part in JIS-euc, and part BIG5, separated
by clear ASCII markers. I would like to edit the different parts
individually, and without breaking the others, and without converting
to MULE's internal format or a common one like UTF-8. I don't think
it is difficult to implement, but it is more like the MULE developers
think they know my needs better than I do, and insist that I do
things their way.

As for the portability problem with the elisp script cemacs.el
(the origin of the whole thread) to current emacs versions, I have been
looking deeper into the documentation of current emacs and just
found my answer; starting current emacs with the --unibyte option
would do. I also looked a bit further - apparently the --unibyte option
was not available before emacs 20.3, so cemacs.el had indeed been
broken for about 3 years between 1995 and 1998. As a result, cemacs's
README contains a warning that it doesn't run under emacs 20+,
and a later fork+enhancement of it even contains codes for
version checking and abort under emacs 20+. I'll write
to the author of cemacs to rectify that.

And lastly, I don't need to keep on porting emacs 19.34 forward anymore -
However, the answer: adding LDFLAGS='-z nocombreloc', should be applied
to current version of emacs's configure to stop it from being broken
by recent change to default 'combreloc' in GNU ld.

I would like to thank everybody for the work on emacs and the other
works from FSF.

-------- Original Message --------
Subject: Re:  Re: emacs-19.34 segfauls when built with Xfree 4.3.0 (glibc 
2.3.x,gcc 3.2)
Date: Wed, 14 May 2003 09:48:09 -0400
From: Richard Stallman <rms@gnu.org>
Reply-To: rms@gnu.org

<snipped>
In contrast,, we might be interested in trying to fix this problem

     I want to use a elisp script called cemacs (for Chinese inputs)
     but unfortunately the inclusion the MULE (Multi-lingual Extension)
     since version 20 has broken it.

if you send a bug report with a precise complete test case.

We are certainly interested in improving MULE.  Could you write to
emacs-devel@gnu.org and tell us why specifically MULE is not as good
as cemacs for your usage?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld.
  2003-05-14 20:03 a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld Hin-Tak Leung
@ 2003-05-14 20:55 ` Jason Rumney
  2003-05-14 22:05   ` a few MULE criticisms Hin-Tak Leung
  2003-05-14 21:55 ` a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld Stefan Monnier
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Jason Rumney @ 2003-05-14 20:55 UTC (permalink / raw)
  Cc: emacs-devel

Hin-Tak Leung <hintak_leung@yahoo.co.uk> writes:

Your first two comments look like they could be handled by writing new
input methods. Because they are likely to require big dictionaries,
they don't necessarily have to be bundled with Emacs. They could exist
as an external package just as cemacs does now, but taking advantage
of the leim framework that is in place now.

> (3) new input methods, and per-user input methods: adding new input
> mapping methods on a per-user basis and make that the default. I know
> this is possible in MULE - but the procedure is not in the obvious places.
> There are new methods coming out, e.g. new ones developed in view of
> handhelds, which only requires the key-pad numeric keys. And there
> are personal per-user needs e.g. I might like an enhanced ChangJie
> method, which includes a special short-hand for my own name, to
> over-ride the system one.

There is no reason why you can't add new input methods to your
site-lisp directory, I think.

As for minor personal modifications to existing input methods,
abbrev-mode provides functionality in this direction already. If it
is not good enough, maybe it could be tied in more closely with
leim/quail somehow.

> (4) The inability to process part of a file in one encoding and
> save it as a binary stream: This might be possible in MULE, but
> I can't work out how

You'd have to initially read in the buffer as binary, and do the
en/decoding conversion into an indirect buffer. I think Gnus already
does something similar to handle multiple text parts in MIME messages
with different encodings.

> it is more like the MULE developers think they know my needs better
> than I do, and insist that I do things their way.

The usual case is that files are saved in a single encoding. The MULE
developers have done a very good job of making that work, but I doubt
they have encountered this use-case before outside of Gnus, or there
would already be a way to handle it.

> And lastly, I don't need to keep on porting emacs 19.34 forward anymore -
> However, the answer: adding LDFLAGS='-z nocombreloc', should be applied
> to current version of emacs's configure to stop it from being broken
> by recent change to default 'combreloc' in GNU ld.

As Stefan replied in gnu.emacs.bug, this exact change was made almost
two years ago, so if there are bugs remaining in the current release,
there must be something wrong with the way the change was made that
makes it ineffective for you.  Can you confirm that you still need to
make this change yourself in 21.3 or the latest CVS?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld.
  2003-05-14 20:03 a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld Hin-Tak Leung
  2003-05-14 20:55 ` Jason Rumney
@ 2003-05-14 21:55 ` Stefan Monnier
  2003-05-15  2:03   ` a few MULE criticisms Hin-Tak Leung
  2003-05-15  1:18 ` a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld Kenichi Handa
  2003-05-15  7:03 ` a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld Stephen J. Turnbull
  3 siblings, 1 reply; 18+ messages in thread
From: Stefan Monnier @ 2003-05-14 21:55 UTC (permalink / raw)
  Cc: emacs-devel

> This is not meant to be a flame - Mr Stallman asked me why I would rather
> continue to use emacs 19.34 (and fixes its problems) in combination
> with an old cemacs elisp script instead of using MULE, and I am writing
> this in the hope that MULE will satisfy my editing needs one day. I
> am native Chinese, and can also do a small amount of Japanese, so
> my experience is probably quite representative.

I have no knowledge of any non-latin script, so could you explain
to me how Emacs-19.34 can be used to edit in a character-set larger
than 256 chars ?

> (4) The inability to process part of a file in one encoding and
> save it as a binary stream: This might be possible in MULE, but
> I can't work out how - MULE seems to insist that I save or
> convert documents into its internal representation.

That is true but not fundamentally bothersome, when you think about it:
the way the file is internally represented is not important.
What matters is what you see on screen and what you get when you
save the file (i.e. mostly that the unedited parts of the file are
preserved bit-for-bit).
Emacs-21 is much better at "preserving bytes that we don't understand".

> e.g. I have a file,
> part of which is in GB2312, part in JIS-euc, and part BIG5, separated
> by clear ASCII markers. I would like to edit the different parts

This is indeed not well supported.  But you can try to read it
once with GB2312 (at which point the BIG5 part will look like
garbage, but it should be preserved AFAIK, if not you might want
to report it as a bug).
You can later re-read the file with a BIG5 encoding (at which
point the GB2312 part will look like garbage, ...).

> individually, and without breaking the others, and without converting
> to MULE's internal format or a common one like UTF-8. I don't think
> it is difficult to implement, but it is more like the MULE developers
> think they know my needs better than I do, and insist that I do
> things their way.

Emacs could support the situation you describe, but nobody has bothered
to write the code for it AFAIK.  It can all be done in elisp, tho.

> As for the portability problem with the elisp script cemacs.el
> (the origin of the whole thread) to current emacs versions, I have been
> looking deeper into the documentation of current emacs and just
> found my answer; starting current emacs with the --unibyte option
> would do. I also looked a bit further - apparently the --unibyte option
> was not available before emacs 20.3, so cemacs.el had indeed been
> broken for about 3 years between 1995 and 1998. As a result, cemacs's
> README contains a warning that it doesn't run under emacs 20+,
> and a later fork+enhancement of it even contains codes for
> version checking and abort under emacs 20+. I'll write
> to the author of cemacs to rectify that.

Emacs-20.1 also had an option to run in unibyte mode, although
it didn't have the --unibyte argument.  etc/NEWS for Emacs-20.1 says:

    [...]

    You can disable multibyte character support as follows:

      (setq-default enable-multibyte-characters nil)

    [....]

Note that Emacs-20.1 and 20.2 turned out to have many problems,
so very few people are still using them (much fewer than 20.3
or 19.34).


	Stefan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: a few MULE criticisms
  2003-05-14 20:55 ` Jason Rumney
@ 2003-05-14 22:05   ` Hin-Tak Leung
  0 siblings, 0 replies; 18+ messages in thread
From: Hin-Tak Leung @ 2003-05-14 22:05 UTC (permalink / raw)
  Cc: emacs-devel

Jason Rumney wrote:
> Hin-Tak Leung <hintak_leung@yahoo.co.uk> writes:
> 
> Your first two comments look like they could be handled by writing new
> input methods. Because they are likely to require big dictionaries,
> they don't necessarily have to be bundled with Emacs. They could exist
> as an external package just as cemacs does now, but taking advantage
> of the leim framework that is in place now.
>

No, the first two aren't really new methods; they are ways of displaying
alternatives, anticipating and asking the user which of the alternatives he
might like, based on (1) what he had just inserted in the current buffer,
or (2) explicit user-initiated regular-expression matches within the current
input method. But both of them addresses the problem that the user may not
know or remember the precise key-strokes for invoking the desired input.

(1) certainly needs big dictionaries, for each locale. Continuing my
examples, to anticipate that the user may want "for", "engine" or
"through" after he had inputted the word "search", because "search for",
"search through", "search engine" are commonly used phrases.

(2) is probably more like an enhancement or change in
window/buffer management. At  the moment, under MULE, if I don't know
the exact input key-strokes for a particular character, there is very
little chance of arriving at the result.

It is probably a bit like a "spell-checker" within the current input method:
e.g. I vaguely know the character I want needs "pronou*", and the ability
to type 'pronou*ion' or 'pronou?" and get emacs to suggest to me that
it could be "pronoun", "pronounciation", etc. It doesn't require a
dictionary (unlike this English example), but it requires
(a) some standards of specifying wild cards, (b) being able
to scan the current input method table for matches, (c) and displaying
the list of matches from the current input method for me to
choose from. So it requires some kind of searching mechanism within
the current input method, and a way of displaying an axcilliary buffer
with an enumerated list of all such matches in it, and some glue
between this buffer and the main buffer.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld.
  2003-05-14 20:03 a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld Hin-Tak Leung
  2003-05-14 20:55 ` Jason Rumney
  2003-05-14 21:55 ` a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld Stefan Monnier
@ 2003-05-15  1:18 ` Kenichi Handa
  2003-05-15  1:39   ` Luc Teirlinck
  2003-05-15  3:29   ` a few MULE criticisms Hin-Tak Leung
  2003-05-15  7:03 ` a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld Stephen J. Turnbull
  3 siblings, 2 replies; 18+ messages in thread
From: Kenichi Handa @ 2003-05-15  1:18 UTC (permalink / raw)
  Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 4149 bytes --]

In article <3EC2A0FA.1040007@yahoo.co.uk>, Hin-Tak Leung <hintak_leung@yahoo.co.uk> writes:
> This is not meant to be a flame - Mr Stallman asked me why I would rather
> continue to use emacs 19.34 (and fixes its problems) in combination
> with an old cemacs elisp script instead of using MULE, and I am writing
> this in the hope that MULE will satisfy my editing needs one day. I
> am native Chinese, and can also do a small amount of Japanese, so
> my experience is probably quite representative.

Thank you for the report.

At first, as far as I know, cemacs.el is a small program
written by <simpson@math.psu.edu> that does just these
things:
  o make emacs 19 use standard-display-8bit
  o rebind forward-char, delete-char, etc to functions that
    pay attention to 8-bit chars.
And, it works only in tty mode under a chinese terminal,
e.g. cxterm.

Correct?  Or, are we talking about the different program?
If so, please send me your version of cemacs.el.

If we are talking about the same program, the current Emacs
doesn't need it.   You can start emacs under cxterm by:
% LANG=zh emasc -nw

Or, start emacs with "-nw" and C-x RET t euc-cn RET C-x RET
k euc-cn RET.

Now, Emacs should display GB2312 characters correctly, and
you can also use the input methods of cxterm.

If you are using cxterm with Big5 mode, do this instead:
% LANG=zh_CN.big5 emacs -nw

> (1) Associations: the ability to let the user choose the next possible
> associated characters.
[...]
> for many years. This would most certainly require extending MULE with
> the ability of loading distionaries of commonly used phrases in
> various languages. And will make the leim package a lot bigger.

I think this should be implemented by extending abbrev-mode
so that it can associate an abbreviation with multiple
words/phrases (is it possible already?)

> (2) Hints: quite similiar to (1), e.g. sometimes I can't quite
> remember the code for "Leung", but vaguely know it is "e*f" in
> ChangJie. (it is actually 'eif'). On just about any other systems
> (MacOS's CJK extensions, etc), the full list is displayed and it
> narrows down as the user types, so the user can select the correct
> one visually if he can't remember the exact code. On Cxterm,
> one can do 'e?f' or 'e??f' to obtain a list of matches.

When you type TAB while you are using an input method, Emacs
shows the full list.  But, the method used in cxterm is not
implemented, it's not easy.

> (3) new input methods, and per-user input methods: adding new input
> mapping methods on a per-user basis and make that the default. I know
> this is possible in MULE - but the procedure is not in the obvious places.
> There are new methods coming out, e.g. new ones developed in view of
> handhelds, which only requires the key-pad numeric keys. And there
> are personal per-user needs e.g. I might like an enhanced ChangJie
> method, which includes a special short-hand for my own name, to
> over-ride the system one.

??? Why is it difficult?   You just have to create a new
leim file, and load it.

> (4) The inability to process part of a file in one encoding and
> save it as a binary stream: This might be possible in MULE, but
> I can't work out how - MULE seems to insist that I save or
> convert documents into its internal representation. e.g. I have a file,
> part of which is in GB2312, part in JIS-euc, and part BIG5, separated
> by clear ASCII markers. I would like to edit the different parts
> individually, and without breaking the others, and without converting
> to MULE's internal format or a common one like UTF-8. I don't think
> it is difficult to implement, but it is more like the MULE developers
> think they know my needs better than I do, and insist that I do
> things their way.

There are two ways for dealing with such a situation.
  o use file-name-handler
  o make a special coding system
The attached is a quick hack for the latter.  When you load
mixed-coding.el, you can read the file sample.txt by C-x RET
c mixed-coding RET C-x C-x sample.txt RET, and save that
file simply by C-x C-s.  Separator lines are highlighted in
a buffer.

---
Ken'ichi HANDA
handa@m17n.org


[-- Attachment #2: mixed-coding.tar.gz --]
[-- Type: application/octet-stream, Size: 1015 bytes --]

[-- Attachment #3: Type: text/plain, Size: 142 bytes --]

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/emacs-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld.
  2003-05-15  1:18 ` a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld Kenichi Handa
@ 2003-05-15  1:39   ` Luc Teirlinck
  2003-05-15  3:29   ` a few MULE criticisms Hin-Tak Leung
  1 sibling, 0 replies; 18+ messages in thread
From: Luc Teirlinck @ 2003-05-15  1:39 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa wrote:

   I think this should be implemented by extending abbrev-mode
   so that it can associate an abbreviation with multiple
   words/phrases (is it possible already?)

I am not sure if I understand the last question correctly.  If I do,
the answer is yes.  (For instance, just give a numeric argument to 
C-x a g or C-x a l.)  On the other hand, only word-constituent characters
can be part of the abbrev itself.

Sincerely,

Luc.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: a few MULE criticisms
  2003-05-14 21:55 ` a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld Stefan Monnier
@ 2003-05-15  2:03   ` Hin-Tak Leung
  2003-05-15  6:55     ` Jason Rumney
  0 siblings, 1 reply; 18+ messages in thread
From: Hin-Tak Leung @ 2003-05-15  2:03 UTC (permalink / raw)
  Cc: emacs-devel

Stefan Monnier wrote:
> I have no knowledge of any non-latin script, so could you explain
> to me how Emacs-19.34 can be used to edit in a character-set larger
> than 256 chars ?

It is a very old system (tradition? dated back to emacs-18).
The principle is essentially putting emacs in -nw and 8-bit display mode
inside a customized terminal emulator, and let the terminal emulator
handles how keystrokes are translated into characters inserted
into the emacs buffer, and what fonts are used for the display.
This mechanism isn't exactly specific to emacs, but any editor
or indeed any application (pine, lynx, etc) that would be happy
within such an environment.

There are other less important subtleties, like adjusting the
movements of cursor in steps of suitable character units, rather
than bytes, which made emacs the better choice for such "embedded"
use.

> Emacs-20.1 also had an option to run in unibyte mode, although
> it didn't have the --unibyte argument.  etc/NEWS for Emacs-20.1 says:
> 
>     [...]
> 
>     You can disable multibyte character support as follows:
> 
>       (setq-default enable-multibyte-characters nil)
> 
>     [....]
> 
> Note that Emacs-20.1 and 20.2 turned out to have many problems,
> so very few people are still using them (much fewer than 20.3
> or 19.34).

Thanks for pointing this out (this is still a very good learning
experience even though it is about 8 years late...). For a few
years the general(?) consesus was either (a) don't upgrade, (b)
switch to other X-based mechanism.

I haven't had any experience with i18n support in X nor
localised X, but I heard that they are somewhat usable. The
advantage of that approach is that it is available to all
X-clients (browsers, terminal emulators, editors). As I
previously said out, my main criticism of using MULE is that
it can't suggest the next character by association or by
fuzzy match. The same criticism doesn't seem to apply to
i18n X support - some l10n X or i18n X system indeed ships
large dictionaries with the system. I am not sure about
Chinese support, but Wnn and Canna (both quite well-known
and quite big Japanese dictionaries) are shipped with some
commercial and even open-source unix/unix-like systems.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: a few MULE criticisms
  2003-05-15  1:18 ` a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld Kenichi Handa
  2003-05-15  1:39   ` Luc Teirlinck
@ 2003-05-15  3:29   ` Hin-Tak Leung
  2003-05-15 10:06     ` Hin-Tak Leung
  1 sibling, 1 reply; 18+ messages in thread
From: Hin-Tak Leung @ 2003-05-15  3:29 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa wrote:
> At first, as far as I know, cemacs.el is a small program
> written by <simpson@math.psu.edu> that does just these
> things:
>   o make emacs 19 use standard-display-8bit
>   o rebind forward-char, delete-char, etc to functions that
>     pay attention to 8-bit chars.
> And, it works only in tty mode under a chinese terminal,
> e.g. cxterm.

That's correct.

<a lot of C-x RET, etc snipped>

Is there a central organized repository for these tips/etc
associated with leim? e.g. how to create a new leim file?
They aren't in the obvious places (i.e. the info files
that came with emacs).

>>(1) Associations: the ability to let the user choose the next possible
>>associated characters.
> 
> [...]
> 
>>for many years. This would most certainly require extending MULE with
>>the ability of loading distionaries of commonly used phrases in
>>various languages. And will make the leim package a lot bigger.
> 
> 
> I think this should be implemented by extending abbrev-mode
> so that it can associate an abbreviation with multiple
> words/phrases (is it possible already?)

At this point it is probably good to switch to Japanese examples...
What I meant is the ability to anticipate something like:

(a) if I input "Handa" (two characters in Japanese Kanji),
and because it is a common surname, the system somehow asks
whether I want to follow it with "san" (two characters in Japanese
gana - the honourable suffix, similar to the "Mr" prefix in English),
or the 2nd half of the name of famous historical figures
with that surname.

(b) Many of the usage of verbs in Japanese consists of one
or two Kanji characters, e.g. "carry", "bring", followed
by verb modifiers meaning "to", "did not", "forbidden to",
"please do", "please do not", etc which could be 5-6
characters long. The list of characters commonly used for
verbs is quite well defined (a few hundreds? still small by
comparison to the whole character set of a few thousands),
and the list of commonly used verb modifiers is even shorter
(maybe about 10?). Any of those in the 1st list is likely
to be followed by those in the 2nd list.

(c) "Ken" (one Japanese character) is often followed by
one specific character to form the phrase for "health".
(in addition to "Ken-ichi", a rather common first name),
and "ichi" (one character) is often followed by "ban"
(one character) to form "ichi-ban" (meaning "the best").

These might sound very demanding and critical - but association
can dramatically improve the speed of typing
by a factor of two or three... and association has been available
with some CJK input mechanism on either unix or other platforms
for years.

And it is not just about the speed of typing - sometimes
one just can't remember the precise keystrokes corresponding
to a certain lesser-used character, so one would rely on
association from more frequently-used ones (and delete
the more frequently-used one afterwards).

> When you type TAB while you are using an input method, Emacs
> shows the full list.  But, the method used in cxterm is not
> implemented, it's not easy.

I have figured that out. However a match by beginning and ending ("a*b")
or by ending ("*b") is quite important for Chinese inputs. TAB
(match by the beginning portion "a*") probably works alright
for Japanese, because a native Japanese speaker most probably
know how the character is pronounced (or at least know the
first one or two syllables of the phrase). But many of the
Chinese input methods (other than the Pinyi method,
"by pronounciation") function by character shapes, and the
distinctive/memorable part is often the right-hand side of
the character. In other words, the ending of the keystroke
sequence (or the 2nd half of the sequence), because the keystroke
sequence is usually coded according to how native writer
writes the different portion of a character (top-left,
bottom-left, top-right, bottom-right).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: a few MULE criticisms
  2003-05-15  2:03   ` a few MULE criticisms Hin-Tak Leung
@ 2003-05-15  6:55     ` Jason Rumney
  0 siblings, 0 replies; 18+ messages in thread
From: Jason Rumney @ 2003-05-15  6:55 UTC (permalink / raw)
  Cc: Stefan Monnier

Hin-Tak Leung <hintak_leung@yahoo.co.uk> writes:

> I haven't had any experience with i18n support in X nor
> localised X, but I heard that they are somewhat usable. The
> advantage of that approach is that it is available to all
> X-clients (browsers, terminal emulators, editors). As I
> previously said out, my main criticism of using MULE is that
> it can't suggest the next character by association or by
> fuzzy match. The same criticism doesn't seem to apply to
> i18n X support - some l10n X or i18n X system indeed ships
> large dictionaries with the system.

Then Emacs can also take advantage of these, since it supports X
input methods, as well as running under cxterm and similar terminal
emulators. So you can forget about leim, it is just a fallback for
when better input methods are not already provided by the environment
you are running in.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld.
  2003-05-14 20:03 a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld Hin-Tak Leung
                   ` (2 preceding siblings ...)
  2003-05-15  1:18 ` a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld Kenichi Handa
@ 2003-05-15  7:03 ` Stephen J. Turnbull
  3 siblings, 0 replies; 18+ messages in thread
From: Stephen J. Turnbull @ 2003-05-15  7:03 UTC (permalink / raw)
  Cc: emacs-devel

>>>>> "Hin-Tak" == Hin-Tak Leung <hintak_leung@yahoo.co.uk> writes:

    Hin-Tak> I am writing this in the hope that MULE will satisfy my
    Hin-Tak> editing needs one day.

Mule documentation has historically been pretty minimal outside of
Japanese usage.  The features you need are probably there, or needing
only a bit of Lisp wrapper.  They can be hard to find if you don't
read the code.

    Hin-Tak> I am native Chinese, and can also do a small amount of
    Hin-Tak> Japanese, so my experience is probably quite
    Hin-Tak> representative.

Japanese is my daily language (although English is my mother tongue),
and I can do a small amount of Korean and Spanish.  My experience has
been the opposite: I really don't want to go anywhere without Mule.

    Hin-Tak> (1) Associations: the ability to let the user choose the
    Hin-Tak> next possible associated characters. In English, "Search"
    Hin-Tak> is often followed by "engine" "for" or "through". In
    Hin-Tak> Chinese, When I type "Leung" (not in common sentence
    Hin-Tak> vocabulary, but it is a common surname), it is almost
    Hin-Tak> certain that I would follow with the rest of my name.

Similar facilities, which not only allow you to define associations,
but will automatically learn them as you type, have been available in
Mule (at least for Japanese) for ten years or so, with third-party
input methods such as Wnn and Canna.  Wnn also supports Chinese.

    Hin-Tak> This would most certainly require extending MULE with the
    Hin-Tak> ability of loading distionaries of commonly used phrases
    Hin-Tak> in various languages.

I think it is preferable to delegate this to the system input methods,
which already have such dictionaries and autolearning.

A feature which you imply, and is (to my great annoyance, as it makes
a lot of bad choices) implemented in Mac kotoeri, is auto-completion.
That is, input that you did type is completed with words that
typically follow it, but you have not yet typed.  This could be done,
I think, but (at least for Japanese romakana input) it seems hard to
do it conveniently.  YMMV, and if you want that feature, to my
knowledge it is unimplemented in Mule input methods now available.
(Some commercial methods are accessible via X Input Methods.)

    Hin-Tak> (2) Hints: quite similiar to (1), e.g. sometimes I can't
    Hin-Tak> quite remember the code for "Leung", but vaguely know it
    Hin-Tak> is "e*f" in ChangJie. (it is actually 'eif'). On just
    Hin-Tak> about any other systems (MacOS's CJK extensions, etc),
    Hin-Tak> the full list is displayed and it narrows down as the
    Hin-Tak> user types, so the user can select the correct one
    Hin-Tak> visually if he can't remember the exact code. On Cxterm,
    Hin-Tak> one can do 'e?f' or 'e??f' to obtain a list of matches.

Hints should be easy to do.  "Narrowing the list as the user types"
would be harder, because it has to take account of the case where
there is no GUI available.  Even with GUI, I find the (often long)
tables are normally annoying and obtrusive.  I'd really prefer that
they were available on request.  More complexity ....  It might
require a special data structure to be efficient, too.  (Otherwise
you'd end up with something like repeatedly matching a regexp against
the candidate strings.)  But it should be doable.

    Hin-Tak> (4) The inability to process part of a file in one
    Hin-Tak> encoding and save it as a binary stream: This might be
    Hin-Tak> possible in MULE, but I can't work out how

Use `encode-coding-region' and `decode-coding-region'.  They are not
currently defined as interactive commands.  This has always annoyed
me, too, but not enough to change it.  (The facility would need a
bunch of wrapping to make it user-friendly, simply adding an
interactive declaration isn't good enough.)

    Hin-Tak> - MULE seems to insist that I save or convert documents
    Hin-Tak> into its internal representation.

Saving in internal representation is severely discouraged; in the Mule
implementation I use it's not possible without using a special
debugging build.  Using the internal representation in the working
buffer is the only way that the file can be displayed as Chinese or
Japanese while editing.  Isn't it sufficient as long as (a) you can
read it as ordinary glyphs and (b) the file is saved in the format you
want?

    Hin-Tak> e.g. I have a file, part of which is in GB2312, part in
    Hin-Tak> JIS-euc, and part BIG5, separated by clear ASCII markers.

My, you do like to make life interesting for yourself!

    Hin-Tak> I would like to edit the different parts individually,
    Hin-Tak> and without breaking the others, and without converting
    Hin-Tak> to MULE's internal format or a common one like UTF-8. I
    Hin-Tak> don't think it is difficult to implement, but it is more
    Hin-Tak> like the MULE developers think they know my needs better
    Hin-Tak> than I do, and insist that I do things their way.

It would be very easy to implement for a single file.  It's extremely
difficult to do generally.  However, the Mule developers have already
done a comprehensive and robust implementation of exactly this
functionality, by implementing ISO 2022.  This is incompatible with
Big5, of course, but that was a deliberate design decision by the Big5
implementers.  Big5, like Shift JIS, is not interoperable with
anything else.  Mule developers can hardly be expected to spend effort
on trying to make deliberately unfriendly systems cooperate!

ISTR there is a facility for reading and writing files containing Big5
mixed with incompatible coding systems, but those files are only
readable by Mule because they use ISO 2022 private character sets to
represent the Big5 characters.  (Ie, Mule knows they are Big5, and if
you cut them out and save them separately, they can be saved as Big5.
But in the mixed format, ordinary Big5 applications will not see "a
bunch of garbage with sensible Big5 mixed in", they'll see "all
garbage".)

I think the bottom line is that most of the features you want are
fairly easily supported in Mule.  Some (an output format containing
portions in "true" Big5) you'd have to pay handsomely to get a self-
respecting programmer to implement.[1]  The rest are mostly already
available in Japanese, because they've been contributed by the
Japanese community (including the core Mule developers, of course).  I
would guess that the lack in Chinese is due to lack of effective
interest from users, ie, contribution of code, implementation
suggestions, and data (eg, the conversion dictionaries).  The Mule
developers have done well by Chinese as far as I can tell: two
separate built-in coding systems (GB2312 and Big5) with provision for
others (there are built-in charsets for CNS 11643), a dozen or more
input methods, etc.  But to refine those "raw materials" requires
input from users.


Footnotes: 
[1]  Ie, I sympathize with your needs, but I know that if I
implemented it, you'd be back to ask me for support on a regular
basis, because it's an inherently unstable concept.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: a few MULE criticisms
  2003-05-15  3:29   ` a few MULE criticisms Hin-Tak Leung
@ 2003-05-15 10:06     ` Hin-Tak Leung
  2003-05-15 15:51       ` Stephen J. Turnbull
  0 siblings, 1 reply; 18+ messages in thread
From: Hin-Tak Leung @ 2003-05-15 10:06 UTC (permalink / raw)
  Cc: Kenichi Handa



Hin-Tak Leung wrote:
> Kenichi Handa wrote:

>> When you type TAB while you are using an input method, Emacs
>> shows the full list.  But, the method used in cxterm is not
>> implemented, it's not easy.
> 
> 
> I have figured that out. However a match by beginning and ending ("a*b")
> or by ending ("*b") is quite important for Chinese inputs. TAB
> (match by the beginning portion "a*") probably works alright
> for Japanese, because a native Japanese speaker most probably
> know how the character is pronounced (or at least know the
> first one or two syllables of the phrase). But many of the
> Chinese input methods (other than the Pinyi method,
> "by pronounciation") function by character shapes, and the
> distinctive/memorable part is often the right-hand side of
> the character. In other words, the ending of the keystroke
> sequence (or the 2nd half of the sequence), because the keystroke
> sequence is usually coded according to how native writer
> writes the different portion of a character (top-left,
> bottom-left, top-right, bottom-right).
> 

I would actually like to quantify this a bit further. In fact
matching by ending is the more useful alternative if
one has to choose the two alternatives of matching by either end.
In Japanese, most of the input methods maps by pronounciation.
This is possible and quite convenient, because Japanese
Kanji's are multiple-syllabic (2 or 3), so that's a few hundred
pronounciation combinations mapped to a few thousand characters,
and it is possible to choose among about < 10 variants
for the same 4-7 keystroke sequence in a pronounciation-based mapping.

One the other hand, all chinese characters are mono-syllabic;
the human tougue uses only about <100 such sounds
(consonant+vowel+consonant) - so for every pronounciation
it typically correspond to about >100 characters. Couple that
with region variations, dialects, (remember the size of china
stretches a similar distance as London/UK to Cairo/Egypt, or
New York to Mexico), almost no sane person types chinese
by pronounciation if he needs to type any sizable paragraphs,
unless he is truly desparate - because we are
talking about "input 3 keystrokes, scroll a list of 100 to pick a
character, input another 3 key strokes, scroll another list of 100
to pick another character, etc" to churn each character out.

The more prefered methods are either pronounciation+intonation
(which is probably "input 4 keystrokes, scroll a list of 20")
but it still suffers a lot by the dialect/regional problems, or by
shape ("input 4-5 keystrokes, scroll a list of 5") to be
able to access nearly 10,000 characters. There are a few methods
which matches by shape (i.e. how a character is written),
but as I explained, the right-hand-side of a chinese character
is usually the more distinct side but the right half is
usually written last; if one maps characters by how it is usually written,
it normally makes the 2nd half of the key-stroke sequence more
useful as a matching criterion, if one is not sure about the
precise sequence and needs to make a guess and ask the computer
for a (as short as possible) list of alternatives.

Also, it is not uncommon to switch between input methods frequently
to arrive at different characters, say 5-10 times within a medium-size
sentence. Binding the switch to function keys to enable fast-switching
is quite necessary to type at any reasonable speed. I know MULE can
be customized to behave differently, but input method switching
is burried down the 3rd or the 4th sub-menu's in X :-).
(And Japanese users probably don't switch input methods as
frequently as Chinese users would do...)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: a few MULE criticisms
  2003-05-15 10:06     ` Hin-Tak Leung
@ 2003-05-15 15:51       ` Stephen J. Turnbull
  2003-05-15 19:49         ` Hin-Tak Leung
  0 siblings, 1 reply; 18+ messages in thread
From: Stephen J. Turnbull @ 2003-05-15 15:51 UTC (permalink / raw)
  Cc: Kenichi Handa

>>>>> "Hin-Tak" == Hin-Tak Leung <hintak_leung@yahoo.co.uk> writes:

    Hin-Tak> The more prefered methods are either
    Hin-Tak> pronounciation+intonation (which is probably "input 4
    Hin-Tak> keystrokes, scroll a list of 20")

Quail already offers at least one of these.

    Hin-Tak> There are a few methods which matches by shape (i.e. how

Quail offers a couple of these, too.

    Hin-Tak> a character is written), but as I explained, the right-
    Hin-Tak> hand-side of a chinese character is usually the more
    Hin-Tak> distinct side but the right half is usually written last;

I suppose it wouldn't help much for input methods to simply reverse
the order.  Then you'd still need wildcards for the (less frequent,
but not so rare) case where the left side is more distinctive, right?

    Hin-Tak> Also, it is not uncommon to switch between input methods
    Hin-Tak> frequently to arrive at different characters, say 5-10
    Hin-Tak> times within a medium-size sentence. Binding the switch
    Hin-Tak> to function keys to enable fast-switching is quite
    Hin-Tak> necessary to type at any reasonable speed.  (And Japanese
    Hin-Tak> users probably don't switch input methods as frequently
    Hin-Tak> as Chinese users would do...)

Note that in certain applications, such as programming code that
produces Japanese strings (eg, XML or TeX), the input method may be
toggled on and off many times in a medium sized "sentence".  But it
sounds like you mean several different methods, not (for example)
switching from geometric to phonetic and back several times.  So you'd
need several keybindings, instead of just one for the toggle.

Also it sounds like which methods are preferred varies a lot by user.
Is the number of commonly used methods small enough (say 5 or 6) that
all can be bound to function keys at once?  Or are there enough that
each user should be able to configure his own preferences to reduce
the number of hot-keys required?

In fact the server-based input methods for Japanese usually do provide
function key access to methods like a special list of symbols, input
via JIS code, user dictionary, etc.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: a few MULE criticisms
  2003-05-15 15:51       ` Stephen J. Turnbull
@ 2003-05-15 19:49         ` Hin-Tak Leung
  2003-05-15 21:29           ` Kevin Rodgers
  2003-05-16  7:09           ` Stephen J. Turnbull
  0 siblings, 2 replies; 18+ messages in thread
From: Hin-Tak Leung @ 2003-05-15 19:49 UTC (permalink / raw)
  Cc: Kenichi Handa

Stephen J. Turnbull wrote:
>>>>>>"Hin-Tak" == Hin-Tak Leung <hintak_leung@yahoo.co.uk> writes:
> 
> 
>     Hin-Tak> The more prefered methods are either
>     Hin-Tak> pronounciation+intonation (which is probably "input 4
>     Hin-Tak> keystrokes, scroll a list of 20")
> 
> Quail already offers at least one of these.
> 
>     Hin-Tak> There are a few methods which matches by shape (i.e. how
> 
> Quail offers a couple of these, too.

Yes, I am aware of that. (I think the emacs documentation mentioned
that many of the Chinese related input method tables were copied
from cxterm, which was the most popular [the only?] way of doing
Chinese at that time in history <1994).

But having just the input tables is not quite enough...

>     Hin-Tak> a character is written), but as I explained, the right-
>     Hin-Tak> hand-side of a chinese character is usually the more
>     Hin-Tak> distinct side but the right half is usually written last;
> 
> I suppose it wouldn't help much for input methods to simply reverse
> the order.  Then you'd still need wildcards for the (less frequent,
> but not so rare) case where the left side is more distinctive, right?

Indeed no. A majority(?I think) has a left-right division where
the right part is more distinctive, but as you suggest, a sizable
part is the reverse; and there are ones which have a top-bottom division
(for which again, the bottom part I believe is often the more
distinctive half); and still others which don't have any obvious internal
divisions.

The more popular methods tend to be ones in which the choices are narrowed
down quickly and evenly as one more keystroke is added to the sequence.

>     Hin-Tak> Also, it is not uncommon to switch between input methods
>     Hin-Tak> frequently to arrive at different characters, say 5-10
>     Hin-Tak> times within a medium-size sentence. Binding the switch
>     Hin-Tak> to function keys to enable fast-switching is quite
>     Hin-Tak> necessary to type at any reasonable speed.  (And Japanese
>     Hin-Tak> users probably don't switch input methods as frequently
>     Hin-Tak> as Chinese users would do...)
> 
> Note that in certain applications, such as programming code that
> produces Japanese strings (eg, XML or TeX), the input method may be
> toggled on and off many times in a medium sized "sentence".  But it
> sounds like you mean several different methods, not (for example)
> switching from geometric to phonetic and back several times.  So you'd
> need several keybindings, instead of just one for the toggle.

Yes, for Japanese usage, the predominant way is some variant of
phonetic (by pronounciation) mappings, with a quick toggle to
ASCII when these are needed. I personally use ChangJie (a shape-based
mapping) most of the time, but I do use a few others, one of the
pronounciation-based ones, and if I am really desparate, even the
one for English-translation. (i.e. typing "apple" and expecting
the chinese character for that fruit!).

> Also it sounds like which methods are preferred varies a lot by user.
> Is the number of commonly used methods small enough (say 5 or 6) that
> all can be bound to function keys at once?  Or are there enough that
> each user should be able to configure his own preferences to reduce
> the number of hot-keys required?

Indeed. I have touched upon this before. e.g. particularly the 
pronounciation-based ones, due to the size of the Chinese-speaking
region (e.g. a fictious non-stop consumer commercial flight touching
Beijing-Taiwan-Hong Kong-Singapore would take 8-12 hours),
people pronounce the same character differently according to
where they come from - that's already 4 *popular* pronounciation-based
system :-). and we haven't started on the shape-based ones yet...

> In fact the server-based input methods for Japanese usually do provide
> function key access to methods like a special list of symbols, input
> via JIS code, user dictionary, etc.

Yes, to much of my envy ... population-wise, the Chinese is so much
bigger, and yet in the issue of computer over-all localization
the Japanese is so much more advanced. The civil war and the political
turmoils within China until the late 80's has done much harm to the
general education and technology advances (in addition to other 
social/economical problems).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: a few MULE criticisms
  2003-05-15 19:49         ` Hin-Tak Leung
@ 2003-05-15 21:29           ` Kevin Rodgers
  2003-05-16  7:09           ` Stephen J. Turnbull
  1 sibling, 0 replies; 18+ messages in thread
From: Kevin Rodgers @ 2003-05-15 21:29 UTC (permalink / raw)


Hin-Tak Leung wrote:

> Yes, to much of my envy ... population-wise, the Chinese is so much
> bigger, and yet in the issue of computer over-all localization
> the Japanese is so much more advanced. The civil war and the political
> turmoils within China until the late 80's has done much harm to the
> general education and technology advances (in addition to other 
> social/economical problems).

Several slogans come to mind:

Get busy!
Solve the problem.
Sweep your side of the street.

-- 
<a href="mailto:&lt;kevin.rodgers&#64;ihs.com&gt;">Kevin Rodgers</a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: a few MULE criticisms
  2003-05-15 19:49         ` Hin-Tak Leung
  2003-05-15 21:29           ` Kevin Rodgers
@ 2003-05-16  7:09           ` Stephen J. Turnbull
  2003-05-16 11:43             ` Hin-Tak Leung
  1 sibling, 1 reply; 18+ messages in thread
From: Stephen J. Turnbull @ 2003-05-16  7:09 UTC (permalink / raw)
  Cc: emacs-devel

>>>>> "Hin-Tak" == Hin-Tak Leung <hintak_leung@yahoo.co.uk> writes:

    Hin-Tak> The more popular methods tend to be ones in which the
    Hin-Tak> choices are narrowed down quickly and evenly as one more
    Hin-Tak> keystroke is added to the sequence.

An explicit list would help.  Emacs could offer them in order of
popularity, at least to the extent that they are available in free
versions.

    Hin-Tak> Yes, to much of my envy ... population-wise, the Chinese
    Hin-Tak> is so much bigger, and yet in the issue of computer
    Hin-Tak> over-all localization the Japanese is so much more
    Hin-Tak> advanced.  The civil war and the political turmoils
    Hin-Tak> within China until the late 80's has done much harm to
    Hin-Tak> the general education and technology advances (in
    Hin-Tak> addition to other social/economical problems).

I'm in no position to judge what might have "held China back."
However, it's no accident that Japan is advanced in localization.
Japan has a strong culture of all users criticizing and improving
their own tools and working environment, each making a few incremental
improvements.  Often it is formalized in industry in the practice of
"quality circles," but it works just as well informally.  It is
perfectly adapted to producing good localization, not to mention being
closely related to the practices of free software.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: a few MULE criticisms
  2003-05-16  7:09           ` Stephen J. Turnbull
@ 2003-05-16 11:43             ` Hin-Tak Leung
  2003-05-17  7:32               ` Stephen J. Turnbull
  0 siblings, 1 reply; 18+ messages in thread
From: Hin-Tak Leung @ 2003-05-16 11:43 UTC (permalink / raw)
  Cc: emacs-devel

Stephen J. Turnbull wrote:
>>>>>>"Hin-Tak" == Hin-Tak Leung <hintak_leung@yahoo.co.uk> writes:
> 
> 
>     Hin-Tak> The more popular methods tend to be ones in which the
>     Hin-Tak> choices are narrowed down quickly and evenly as one more
>     Hin-Tak> keystroke is added to the sequence.
> 
> An explicit list would help.  Emacs could offer them in order of
> popularity, at least to the extent that they are available in free
> versions.

That's a somewhat difficult task. I mentioned 4 major
pronounciation/dialect systems (Mainland China, Taiwan,
Hong Kong, Singapore); and unlike the Japanese - with a
centralized/standardized educational system, the popular shape-based
input methods also differ according to under which of these 4 major
political/educational systems one learned their computers at
school, or use their computers in the government-level,
for example; so we are talking about 10+ input methods,
each of which in quite wide circulation, and depending on who you
ask, you will get a list of totally different order. There are also
up-and-coming methods designed for numeric-key-pad/handheld/mobile-phone
use which are very popular among the younger generations for
doing their SMS messaging, which some might like to use on their
computers (and which aren't yet in leim). There are almost
no consensus across the 4 dialect/political/educational systems.
(and indeed, sometimes they deliberately disagree for the sake of it...)

The Chinese (ethnic) can't even agree on what constitutes the
Big5 character set :-) - Big5 was a "convention" and
there is a 5-10% difference between e.g. Hong Kong and Taiwan.
This is in contrast to the Japanese JIS level 1/level 2 sets,
which are government controlled and revised centrally.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: a few MULE criticisms
  2003-05-16 11:43             ` Hin-Tak Leung
@ 2003-05-17  7:32               ` Stephen J. Turnbull
  2003-05-17 19:40                 ` Hin-Tak Leung
  0 siblings, 1 reply; 18+ messages in thread
From: Stephen J. Turnbull @ 2003-05-17  7:32 UTC (permalink / raw)
  Cc: emacs-devel

>>>>> "Hin-Tak" == Hin-Tak Leung <hintak_leung@yahoo.co.uk> writes:

    Hin-Tak> Stephen J. Turnbull wrote:

    >>>>>>> "Hin-Tak" == Hin-Tak Leung <hintak_leung@yahoo.co.uk>
    >>>>>>> writes:

    Hin-Tak> The more popular methods tend to be ones in which the
    Hin-Tak> choices are narrowed down quickly and evenly as one more
    Hin-Tak> keystroke is added to the sequence.

    >> An explicit list would help.  Emacs could offer them in order
    >> of popularity, at least to the extent that they are available
    >> in free versions.

    Hin-Tak> That's a somewhat difficult task. I mentioned 4 major
    Hin-Tak> pronounciation/dialect systems (Mainland China, Taiwan,
    Hin-Tak> Hong Kong, Singapore); and unlike the Japanese

I'm aware that there's a huge variety, and that it's impossible to
give a definitive list.

But what's really different about the Japanese is that they won't
hesitate to answer such a question.  (Of course the educational system
and the Myth of Japanese Homogeneity is an important factor in the
willingness, but the important thing to the implementation is getting
an answer, for whatever reason.)  Usually the first such answer is
dead wrong, of course, but such lists are easily changed and easily
worked around or customized.  So by the time you've iterated over ten
Japanese, you're doing pretty well.

If you want, you can specify "where you're coming from" (in this case,
quite literally) and we can start to differentiate the Chinese
"sublocales".  And be greedy.  Ask for the list _you_ want.  If there
are more than a billion ethnic Chinese, I bet there are 100 million
who mostly agree with you.  Anything that makes 100 million people a
little happier with Emacs is a GoodThang[tm].  :-)  Note that one
effect of having an "official list" is that those who would want
something else will have something to point at and say "Gaak! that's
dead wrong!", and they'll offer their own lists.  Does that mean your
list goes away?  No, it means that to start with we add a customizable
variable like

Chinese Input Method Priority:
  [X] Leung's list (where you come from: im1, im2, im3, ...)
  [ ] Lee's list (Hong Kong: im3, im2, im1, ...)
  [ ] Custom list [..................................]

and then as we know more about how that correlates with other aspects
of Emacs usage, we start to build up the "sublocale" concept.

    Hin-Tak> The Chinese (ethnic) can't even agree on what constitutes
    Hin-Tak> the Big5 character set :-) - Big5 was a "convention" and
    Hin-Tak> there is a 5-10% difference between e.g. Hong Kong and
    Hin-Tak> Taiwan.  This is in contrast to the Japanese JIS level
    Hin-Tak> 1/level 2 sets, which are government controlled and
    Hin-Tak> revised centrally.

You mean the same Japanese who on the next to last major revision of
the standard still managed to leave out a couple of officially
recognized name characters?  Yes, the Chinese have bigger problems and
more various needs.  But localization is hard for everybody.  The
sooner we start, the better.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: a few MULE criticisms
  2003-05-17  7:32               ` Stephen J. Turnbull
@ 2003-05-17 19:40                 ` Hin-Tak Leung
  0 siblings, 0 replies; 18+ messages in thread
From: Hin-Tak Leung @ 2003-05-17 19:40 UTC (permalink / raw)
  Cc: emacs-devel

Stephen J. Turnbull wrote:
<snipped>
>     >> An explicit list would help.  Emacs could offer them in order
>     >> of popularity, at least to the extent that they are available
>     >> in free versions.
<snipped>
> If you want, you can specify "where you're coming from" (in this case,
> quite literally) and we can start to differentiate the Chinese
> "sublocales".  And be greedy.  Ask for the list _you_ want.
<snipped>

My own chinese editing needs/experience is maybe somewhat atypical. My
typing habit is, in decreasing frequency:
(1) the equivalent of "tsang-b5.el",
(2) doing "ab?cd", "?bcd" etc under "tsang-b5.el"
     (some other thread of this conversation seems to indicate that this
      cannot be done under MULE at the moment)
(3) the equivalent of "quick-b5.el" (it is the same as "a*b" under
  "tsang-b5.el" in (1), and "a???b"+"a??b"+"a?b"+"ab" in (2))
(4) english (i.e. typing "apple" and getting the characters for that fruit.
              no equivalent in MULE?)
(5) try one of the pronouciation-based ones. The closest equivalent is probably
'CTLau-b5.el'?

I do use association quite heavily, i.e. take the later half
of a 2/3/4-character phrase after typing the 1st character,
and also often when I reach (3),(4), and the character I want
is known to be the later-half of a common phrase, I would type
the first half using (1), (2) and delete it afterwards.
Inputting 4-5 key strokes for the first character, select from a
small list of 2-3 by association, then a delete, is still sometimes
preferable to scrolling a much longer list of 20-30 in (3).

My Japanese typing habit is somewhat heavily customized - I
have two private input methods (one based on tsang's, the other
is pronouciation-based which I compiled myself from
ftp://ftp.funet.fi/pub/culture/japan/info/jis1detl.lst, according
to my notes). Compiling one's own input method is probably because
it is learned as a 2nd language.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2003-05-17 19:40 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-05-14 20:03 a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld Hin-Tak Leung
2003-05-14 20:55 ` Jason Rumney
2003-05-14 22:05   ` a few MULE criticisms Hin-Tak Leung
2003-05-14 21:55 ` a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld Stefan Monnier
2003-05-15  2:03   ` a few MULE criticisms Hin-Tak Leung
2003-05-15  6:55     ` Jason Rumney
2003-05-15  1:18 ` a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld Kenichi Handa
2003-05-15  1:39   ` Luc Teirlinck
2003-05-15  3:29   ` a few MULE criticisms Hin-Tak Leung
2003-05-15 10:06     ` Hin-Tak Leung
2003-05-15 15:51       ` Stephen J. Turnbull
2003-05-15 19:49         ` Hin-Tak Leung
2003-05-15 21:29           ` Kevin Rodgers
2003-05-16  7:09           ` Stephen J. Turnbull
2003-05-16 11:43             ` Hin-Tak Leung
2003-05-17  7:32               ` Stephen J. Turnbull
2003-05-17 19:40                 ` Hin-Tak Leung
2003-05-15  7:03 ` a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld Stephen J. Turnbull

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).