Re: a few MULE criticisms

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Re: a few MULE criticisms
  2003-05-14 20:55 ` Jason Rumney
@ 2003-05-14 22:05   ` Hin-Tak Leung
  0 siblings, 0 replies; 13+ messages in thread
From: Hin-Tak Leung @ 2003-05-14 22:05 UTC (permalink / raw)
  Cc: emacs-devel

Jason Rumney wrote:
> Hin-Tak Leung <hintak_leung@yahoo.co.uk> writes:
> 
> Your first two comments look like they could be handled by writing new
> input methods. Because they are likely to require big dictionaries,
> they don't necessarily have to be bundled with Emacs. They could exist
> as an external package just as cemacs does now, but taking advantage
> of the leim framework that is in place now.
>

No, the first two aren't really new methods; they are ways of displaying
alternatives, anticipating and asking the user which of the alternatives he
might like, based on (1) what he had just inserted in the current buffer,
or (2) explicit user-initiated regular-expression matches within the current
input method. But both of them addresses the problem that the user may not
know or remember the precise key-strokes for invoking the desired input.

(1) certainly needs big dictionaries, for each locale. Continuing my
examples, to anticipate that the user may want "for", "engine" or
"through" after he had inputted the word "search", because "search for",
"search through", "search engine" are commonly used phrases.

(2) is probably more like an enhancement or change in
window/buffer management. At  the moment, under MULE, if I don't know
the exact input key-strokes for a particular character, there is very
little chance of arriving at the result.

It is probably a bit like a "spell-checker" within the current input method:
e.g. I vaguely know the character I want needs "pronou*", and the ability
to type 'pronou*ion' or 'pronou?" and get emacs to suggest to me that
it could be "pronoun", "pronounciation", etc. It doesn't require a
dictionary (unlike this English example), but it requires
(a) some standards of specifying wild cards, (b) being able
to scan the current input method table for matches, (c) and displaying
the list of matches from the current input method for me to
choose from. So it requires some kind of searching mechanism within
the current input method, and a way of displaying an axcilliary buffer
with an enumerated list of all such matches in it, and some glue
between this buffer and the main buffer.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: a few MULE criticisms
  2003-05-14 21:55 ` a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld Stefan Monnier
@ 2003-05-15  2:03   ` Hin-Tak Leung
  2003-05-15  6:55     ` Jason Rumney
  0 siblings, 1 reply; 13+ messages in thread
From: Hin-Tak Leung @ 2003-05-15  2:03 UTC (permalink / raw)
  Cc: emacs-devel

Stefan Monnier wrote:
> I have no knowledge of any non-latin script, so could you explain
> to me how Emacs-19.34 can be used to edit in a character-set larger
> than 256 chars ?

It is a very old system (tradition? dated back to emacs-18).
The principle is essentially putting emacs in -nw and 8-bit display mode
inside a customized terminal emulator, and let the terminal emulator
handles how keystrokes are translated into characters inserted
into the emacs buffer, and what fonts are used for the display.
This mechanism isn't exactly specific to emacs, but any editor
or indeed any application (pine, lynx, etc) that would be happy
within such an environment.

There are other less important subtleties, like adjusting the
movements of cursor in steps of suitable character units, rather
than bytes, which made emacs the better choice for such "embedded"
use.

> Emacs-20.1 also had an option to run in unibyte mode, although
> it didn't have the --unibyte argument.  etc/NEWS for Emacs-20.1 says:
> 
>     [...]
> 
>     You can disable multibyte character support as follows:
> 
>       (setq-default enable-multibyte-characters nil)
> 
>     [....]
> 
> Note that Emacs-20.1 and 20.2 turned out to have many problems,
> so very few people are still using them (much fewer than 20.3
> or 19.34).

Thanks for pointing this out (this is still a very good learning
experience even though it is about 8 years late...). For a few
years the general(?) consesus was either (a) don't upgrade, (b)
switch to other X-based mechanism.

I haven't had any experience with i18n support in X nor
localised X, but I heard that they are somewhat usable. The
advantage of that approach is that it is available to all
X-clients (browsers, terminal emulators, editors). As I
previously said out, my main criticism of using MULE is that
it can't suggest the next character by association or by
fuzzy match. The same criticism doesn't seem to apply to
i18n X support - some l10n X or i18n X system indeed ships
large dictionaries with the system. I am not sure about
Chinese support, but Wnn and Canna (both quite well-known
and quite big Japanese dictionaries) are shipped with some
commercial and even open-source unix/unix-like systems.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: a few MULE criticisms
  2003-05-15  1:18 ` a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld Kenichi Handa
@ 2003-05-15  3:29   ` Hin-Tak Leung
  2003-05-15 10:06     ` Hin-Tak Leung
  0 siblings, 1 reply; 13+ messages in thread
From: Hin-Tak Leung @ 2003-05-15  3:29 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa wrote:
> At first, as far as I know, cemacs.el is a small program
> written by <simpson@math.psu.edu> that does just these
> things:
>   o make emacs 19 use standard-display-8bit
>   o rebind forward-char, delete-char, etc to functions that
>     pay attention to 8-bit chars.
> And, it works only in tty mode under a chinese terminal,
> e.g. cxterm.

That's correct.

<a lot of C-x RET, etc snipped>

Is there a central organized repository for these tips/etc
associated with leim? e.g. how to create a new leim file?
They aren't in the obvious places (i.e. the info files
that came with emacs).

>>(1) Associations: the ability to let the user choose the next possible
>>associated characters.
> 
> [...]
> 
>>for many years. This would most certainly require extending MULE with
>>the ability of loading distionaries of commonly used phrases in
>>various languages. And will make the leim package a lot bigger.
> 
> 
> I think this should be implemented by extending abbrev-mode
> so that it can associate an abbreviation with multiple
> words/phrases (is it possible already?)

At this point it is probably good to switch to Japanese examples...
What I meant is the ability to anticipate something like:

(a) if I input "Handa" (two characters in Japanese Kanji),
and because it is a common surname, the system somehow asks
whether I want to follow it with "san" (two characters in Japanese
gana - the honourable suffix, similar to the "Mr" prefix in English),
or the 2nd half of the name of famous historical figures
with that surname.

(b) Many of the usage of verbs in Japanese consists of one
or two Kanji characters, e.g. "carry", "bring", followed
by verb modifiers meaning "to", "did not", "forbidden to",
"please do", "please do not", etc which could be 5-6
characters long. The list of characters commonly used for
verbs is quite well defined (a few hundreds? still small by
comparison to the whole character set of a few thousands),
and the list of commonly used verb modifiers is even shorter
(maybe about 10?). Any of those in the 1st list is likely
to be followed by those in the 2nd list.

(c) "Ken" (one Japanese character) is often followed by
one specific character to form the phrase for "health".
(in addition to "Ken-ichi", a rather common first name),
and "ichi" (one character) is often followed by "ban"
(one character) to form "ichi-ban" (meaning "the best").

These might sound very demanding and critical - but association
can dramatically improve the speed of typing
by a factor of two or three... and association has been available
with some CJK input mechanism on either unix or other platforms
for years.

And it is not just about the speed of typing - sometimes
one just can't remember the precise keystrokes corresponding
to a certain lesser-used character, so one would rely on
association from more frequently-used ones (and delete
the more frequently-used one afterwards).

> When you type TAB while you are using an input method, Emacs
> shows the full list.  But, the method used in cxterm is not
> implemented, it's not easy.

I have figured that out. However a match by beginning and ending ("a*b")
or by ending ("*b") is quite important for Chinese inputs. TAB
(match by the beginning portion "a*") probably works alright
for Japanese, because a native Japanese speaker most probably
know how the character is pronounced (or at least know the
first one or two syllables of the phrase). But many of the
Chinese input methods (other than the Pinyi method,
"by pronounciation") function by character shapes, and the
distinctive/memorable part is often the right-hand side of
the character. In other words, the ending of the keystroke
sequence (or the 2nd half of the sequence), because the keystroke
sequence is usually coded according to how native writer
writes the different portion of a character (top-left,
bottom-left, top-right, bottom-right).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: a few MULE criticisms
  2003-05-15  2:03   ` a few MULE criticisms Hin-Tak Leung
@ 2003-05-15  6:55     ` Jason Rumney
  0 siblings, 0 replies; 13+ messages in thread
From: Jason Rumney @ 2003-05-15  6:55 UTC (permalink / raw)
  Cc: Stefan Monnier

Hin-Tak Leung <hintak_leung@yahoo.co.uk> writes:

> I haven't had any experience with i18n support in X nor
> localised X, but I heard that they are somewhat usable. The
> advantage of that approach is that it is available to all
> X-clients (browsers, terminal emulators, editors). As I
> previously said out, my main criticism of using MULE is that
> it can't suggest the next character by association or by
> fuzzy match. The same criticism doesn't seem to apply to
> i18n X support - some l10n X or i18n X system indeed ships
> large dictionaries with the system.

Then Emacs can also take advantage of these, since it supports X
input methods, as well as running under cxterm and similar terminal
emulators. So you can forget about leim, it is just a fallback for
when better input methods are not already provided by the environment
you are running in.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: a few MULE criticisms
  2003-05-15  3:29   ` a few MULE criticisms Hin-Tak Leung
@ 2003-05-15 10:06     ` Hin-Tak Leung
  2003-05-15 15:51       ` Stephen J. Turnbull
  0 siblings, 1 reply; 13+ messages in thread
From: Hin-Tak Leung @ 2003-05-15 10:06 UTC (permalink / raw)
  Cc: Kenichi Handa

Hin-Tak Leung wrote:
> Kenichi Handa wrote:

>> When you type TAB while you are using an input method, Emacs
>> shows the full list.  But, the method used in cxterm is not
>> implemented, it's not easy.
> 
> 
> I have figured that out. However a match by beginning and ending ("a*b")
> or by ending ("*b") is quite important for Chinese inputs. TAB
> (match by the beginning portion "a*") probably works alright
> for Japanese, because a native Japanese speaker most probably
> know how the character is pronounced (or at least know the
> first one or two syllables of the phrase). But many of the
> Chinese input methods (other than the Pinyi method,
> "by pronounciation") function by character shapes, and the
> distinctive/memorable part is often the right-hand side of
> the character. In other words, the ending of the keystroke
> sequence (or the 2nd half of the sequence), because the keystroke
> sequence is usually coded according to how native writer
> writes the different portion of a character (top-left,
> bottom-left, top-right, bottom-right).
> 

I would actually like to quantify this a bit further. In fact
matching by ending is the more useful alternative if
one has to choose the two alternatives of matching by either end.
In Japanese, most of the input methods maps by pronounciation.
This is possible and quite convenient, because Japanese
Kanji's are multiple-syllabic (2 or 3), so that's a few hundred
pronounciation combinations mapped to a few thousand characters,
and it is possible to choose among about < 10 variants
for the same 4-7 keystroke sequence in a pronounciation-based mapping.

One the other hand, all chinese characters are mono-syllabic;
the human tougue uses only about <100 such sounds
(consonant+vowel+consonant) - so for every pronounciation
it typically correspond to about >100 characters. Couple that
with region variations, dialects, (remember the size of china
stretches a similar distance as London/UK to Cairo/Egypt, or
New York to Mexico), almost no sane person types chinese
by pronounciation if he needs to type any sizable paragraphs,
unless he is truly desparate - because we are
talking about "input 3 keystrokes, scroll a list of 100 to pick a
character, input another 3 key strokes, scroll another list of 100
to pick another character, etc" to churn each character out.

The more prefered methods are either pronounciation+intonation
(which is probably "input 4 keystrokes, scroll a list of 20")
but it still suffers a lot by the dialect/regional problems, or by
shape ("input 4-5 keystrokes, scroll a list of 5") to be
able to access nearly 10,000 characters. There are a few methods
which matches by shape (i.e. how a character is written),
but as I explained, the right-hand-side of a chinese character
is usually the more distinct side but the right half is
usually written last; if one maps characters by how it is usually written,
it normally makes the 2nd half of the key-stroke sequence more
useful as a matching criterion, if one is not sure about the
precise sequence and needs to make a guess and ask the computer
for a (as short as possible) list of alternatives.

Also, it is not uncommon to switch between input methods frequently
to arrive at different characters, say 5-10 times within a medium-size
sentence. Binding the switch to function keys to enable fast-switching
is quite necessary to type at any reasonable speed. I know MULE can
be customized to behave differently, but input method switching
is burried down the 3rd or the 4th sub-menu's in X :-).
(And Japanese users probably don't switch input methods as
frequently as Chinese users would do...)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: a few MULE criticisms
  2003-05-15 10:06     ` Hin-Tak Leung
@ 2003-05-15 15:51       ` Stephen J. Turnbull
  2003-05-15 19:49         ` Hin-Tak Leung
  0 siblings, 1 reply; 13+ messages in thread
From: Stephen J. Turnbull @ 2003-05-15 15:51 UTC (permalink / raw)
  Cc: Kenichi Handa

>>>>> "Hin-Tak" == Hin-Tak Leung <hintak_leung@yahoo.co.uk> writes:

    Hin-Tak> The more prefered methods are either
    Hin-Tak> pronounciation+intonation (which is probably "input 4
    Hin-Tak> keystrokes, scroll a list of 20")

Quail already offers at least one of these.

    Hin-Tak> There are a few methods which matches by shape (i.e. how

Quail offers a couple of these, too.

    Hin-Tak> a character is written), but as I explained, the right-
    Hin-Tak> hand-side of a chinese character is usually the more
    Hin-Tak> distinct side but the right half is usually written last;

I suppose it wouldn't help much for input methods to simply reverse
the order.  Then you'd still need wildcards for the (less frequent,
but not so rare) case where the left side is more distinctive, right?

    Hin-Tak> Also, it is not uncommon to switch between input methods
    Hin-Tak> frequently to arrive at different characters, say 5-10
    Hin-Tak> times within a medium-size sentence. Binding the switch
    Hin-Tak> to function keys to enable fast-switching is quite
    Hin-Tak> necessary to type at any reasonable speed.  (And Japanese
    Hin-Tak> users probably don't switch input methods as frequently
    Hin-Tak> as Chinese users would do...)

Note that in certain applications, such as programming code that
produces Japanese strings (eg, XML or TeX), the input method may be
toggled on and off many times in a medium sized "sentence".  But it
sounds like you mean several different methods, not (for example)
switching from geometric to phonetic and back several times.  So you'd
need several keybindings, instead of just one for the toggle.

Also it sounds like which methods are preferred varies a lot by user.
Is the number of commonly used methods small enough (say 5 or 6) that
all can be bound to function keys at once?  Or are there enough that
each user should be able to configure his own preferences to reduce
the number of hot-keys required?

In fact the server-based input methods for Japanese usually do provide
function key access to methods like a special list of symbols, input
via JIS code, user dictionary, etc.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: a few MULE criticisms
  2003-05-15 15:51       ` Stephen J. Turnbull
@ 2003-05-15 19:49         ` Hin-Tak Leung
  2003-05-15 21:29           ` Kevin Rodgers
  2003-05-16  7:09           ` Stephen J. Turnbull
  0 siblings, 2 replies; 13+ messages in thread
From: Hin-Tak Leung @ 2003-05-15 19:49 UTC (permalink / raw)
  Cc: Kenichi Handa

Stephen J. Turnbull wrote:
>>>>>>"Hin-Tak" == Hin-Tak Leung <hintak_leung@yahoo.co.uk> writes:
> 
> 
>     Hin-Tak> The more prefered methods are either
>     Hin-Tak> pronounciation+intonation (which is probably "input 4
>     Hin-Tak> keystrokes, scroll a list of 20")
> 
> Quail already offers at least one of these.
> 
>     Hin-Tak> There are a few methods which matches by shape (i.e. how
> 
> Quail offers a couple of these, too.

Yes, I am aware of that. (I think the emacs documentation mentioned
that many of the Chinese related input method tables were copied
from cxterm, which was the most popular [the only?] way of doing
Chinese at that time in history <1994).

But having just the input tables is not quite enough...

>     Hin-Tak> a character is written), but as I explained, the right-
>     Hin-Tak> hand-side of a chinese character is usually the more
>     Hin-Tak> distinct side but the right half is usually written last;
> 
> I suppose it wouldn't help much for input methods to simply reverse
> the order.  Then you'd still need wildcards for the (less frequent,
> but not so rare) case where the left side is more distinctive, right?

Indeed no. A majority(?I think) has a left-right division where
the right part is more distinctive, but as you suggest, a sizable
part is the reverse; and there are ones which have a top-bottom division
(for which again, the bottom part I believe is often the more
distinctive half); and still others which don't have any obvious internal
divisions.

The more popular methods tend to be ones in which the choices are narrowed
down quickly and evenly as one more keystroke is added to the sequence.

>     Hin-Tak> Also, it is not uncommon to switch between input methods
>     Hin-Tak> frequently to arrive at different characters, say 5-10
>     Hin-Tak> times within a medium-size sentence. Binding the switch
>     Hin-Tak> to function keys to enable fast-switching is quite
>     Hin-Tak> necessary to type at any reasonable speed.  (And Japanese
>     Hin-Tak> users probably don't switch input methods as frequently
>     Hin-Tak> as Chinese users would do...)
> 
> Note that in certain applications, such as programming code that
> produces Japanese strings (eg, XML or TeX), the input method may be
> toggled on and off many times in a medium sized "sentence".  But it
> sounds like you mean several different methods, not (for example)
> switching from geometric to phonetic and back several times.  So you'd
> need several keybindings, instead of just one for the toggle.

Yes, for Japanese usage, the predominant way is some variant of
phonetic (by pronounciation) mappings, with a quick toggle to
ASCII when these are needed. I personally use ChangJie (a shape-based
mapping) most of the time, but I do use a few others, one of the
pronounciation-based ones, and if I am really desparate, even the
one for English-translation. (i.e. typing "apple" and expecting
the chinese character for that fruit!).

> Also it sounds like which methods are preferred varies a lot by user.
> Is the number of commonly used methods small enough (say 5 or 6) that
> all can be bound to function keys at once?  Or are there enough that
> each user should be able to configure his own preferences to reduce
> the number of hot-keys required?

Indeed. I have touched upon this before. e.g. particularly the 
pronounciation-based ones, due to the size of the Chinese-speaking
region (e.g. a fictious non-stop consumer commercial flight touching
Beijing-Taiwan-Hong Kong-Singapore would take 8-12 hours),
people pronounce the same character differently according to
where they come from - that's already 4 *popular* pronounciation-based
system :-). and we haven't started on the shape-based ones yet...

> In fact the server-based input methods for Japanese usually do provide
> function key access to methods like a special list of symbols, input
> via JIS code, user dictionary, etc.

Yes, to much of my envy ... population-wise, the Chinese is so much
bigger, and yet in the issue of computer over-all localization
the Japanese is so much more advanced. The civil war and the political
turmoils within China until the late 80's has done much harm to the
general education and technology advances (in addition to other 
social/economical problems).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: a few MULE criticisms
  2003-05-15 19:49         ` Hin-Tak Leung
@ 2003-05-15 21:29           ` Kevin Rodgers
  2003-05-16  7:09           ` Stephen J. Turnbull
  1 sibling, 0 replies; 13+ messages in thread
From: Kevin Rodgers @ 2003-05-15 21:29 UTC (permalink / raw)


Hin-Tak Leung wrote:

> Yes, to much of my envy ... population-wise, the Chinese is so much
> bigger, and yet in the issue of computer over-all localization
> the Japanese is so much more advanced. The civil war and the political
> turmoils within China until the late 80's has done much harm to the
> general education and technology advances (in addition to other 
> social/economical problems).

Several slogans come to mind:

Get busy!
Solve the problem.
Sweep your side of the street.

-- 
<a href="mailto:&lt;kevin.rodgers&#64;ihs.com&gt;">Kevin Rodgers</a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: a few MULE criticisms
  2003-05-15 19:49         ` Hin-Tak Leung
  2003-05-15 21:29           ` Kevin Rodgers
@ 2003-05-16  7:09           ` Stephen J. Turnbull
  2003-05-16 11:43             ` Hin-Tak Leung
  1 sibling, 1 reply; 13+ messages in thread
From: Stephen J. Turnbull @ 2003-05-16  7:09 UTC (permalink / raw)
  Cc: emacs-devel

>>>>> "Hin-Tak" == Hin-Tak Leung <hintak_leung@yahoo.co.uk> writes:

    Hin-Tak> The more popular methods tend to be ones in which the
    Hin-Tak> choices are narrowed down quickly and evenly as one more
    Hin-Tak> keystroke is added to the sequence.

An explicit list would help.  Emacs could offer them in order of
popularity, at least to the extent that they are available in free
versions.

    Hin-Tak> Yes, to much of my envy ... population-wise, the Chinese
    Hin-Tak> is so much bigger, and yet in the issue of computer
    Hin-Tak> over-all localization the Japanese is so much more
    Hin-Tak> advanced.  The civil war and the political turmoils
    Hin-Tak> within China until the late 80's has done much harm to
    Hin-Tak> the general education and technology advances (in
    Hin-Tak> addition to other social/economical problems).

I'm in no position to judge what might have "held China back."
However, it's no accident that Japan is advanced in localization.
Japan has a strong culture of all users criticizing and improving
their own tools and working environment, each making a few incremental
improvements.  Often it is formalized in industry in the practice of
"quality circles," but it works just as well informally.  It is
perfectly adapted to producing good localization, not to mention being
closely related to the practices of free software.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: a few MULE criticisms
  2003-05-16  7:09           ` Stephen J. Turnbull
@ 2003-05-16 11:43             ` Hin-Tak Leung
  2003-05-17  7:32               ` Stephen J. Turnbull
  0 siblings, 1 reply; 13+ messages in thread
From: Hin-Tak Leung @ 2003-05-16 11:43 UTC (permalink / raw)
  Cc: emacs-devel

Stephen J. Turnbull wrote:
>>>>>>"Hin-Tak" == Hin-Tak Leung <hintak_leung@yahoo.co.uk> writes:
> 
> 
>     Hin-Tak> The more popular methods tend to be ones in which the
>     Hin-Tak> choices are narrowed down quickly and evenly as one more
>     Hin-Tak> keystroke is added to the sequence.
> 
> An explicit list would help.  Emacs could offer them in order of
> popularity, at least to the extent that they are available in free
> versions.

That's a somewhat difficult task. I mentioned 4 major
pronounciation/dialect systems (Mainland China, Taiwan,
Hong Kong, Singapore); and unlike the Japanese - with a
centralized/standardized educational system, the popular shape-based
input methods also differ according to under which of these 4 major
political/educational systems one learned their computers at
school, or use their computers in the government-level,
for example; so we are talking about 10+ input methods,
each of which in quite wide circulation, and depending on who you
ask, you will get a list of totally different order. There are also
up-and-coming methods designed for numeric-key-pad/handheld/mobile-phone
use which are very popular among the younger generations for
doing their SMS messaging, which some might like to use on their
computers (and which aren't yet in leim). There are almost
no consensus across the 4 dialect/political/educational systems.
(and indeed, sometimes they deliberately disagree for the sake of it...)

The Chinese (ethnic) can't even agree on what constitutes the
Big5 character set :-) - Big5 was a "convention" and
there is a 5-10% difference between e.g. Hong Kong and Taiwan.
This is in contrast to the Japanese JIS level 1/level 2 sets,
which are government controlled and revised centrally.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: a few MULE criticisms
  2003-05-16 11:43             ` Hin-Tak Leung
@ 2003-05-17  7:32               ` Stephen J. Turnbull
  2003-05-17 19:40                 ` Hin-Tak Leung
  0 siblings, 1 reply; 13+ messages in thread
From: Stephen J. Turnbull @ 2003-05-17  7:32 UTC (permalink / raw)
  Cc: emacs-devel

>>>>> "Hin-Tak" == Hin-Tak Leung <hintak_leung@yahoo.co.uk> writes:

    Hin-Tak> Stephen J. Turnbull wrote:

    >>>>>>> "Hin-Tak" == Hin-Tak Leung <hintak_leung@yahoo.co.uk>
    >>>>>>> writes:

    Hin-Tak> The more popular methods tend to be ones in which the
    Hin-Tak> choices are narrowed down quickly and evenly as one more
    Hin-Tak> keystroke is added to the sequence.

    >> An explicit list would help.  Emacs could offer them in order
    >> of popularity, at least to the extent that they are available
    >> in free versions.

    Hin-Tak> That's a somewhat difficult task. I mentioned 4 major
    Hin-Tak> pronounciation/dialect systems (Mainland China, Taiwan,
    Hin-Tak> Hong Kong, Singapore); and unlike the Japanese

I'm aware that there's a huge variety, and that it's impossible to
give a definitive list.

But what's really different about the Japanese is that they won't
hesitate to answer such a question.  (Of course the educational system
and the Myth of Japanese Homogeneity is an important factor in the
willingness, but the important thing to the implementation is getting
an answer, for whatever reason.)  Usually the first such answer is
dead wrong, of course, but such lists are easily changed and easily
worked around or customized.  So by the time you've iterated over ten
Japanese, you're doing pretty well.

If you want, you can specify "where you're coming from" (in this case,
quite literally) and we can start to differentiate the Chinese
"sublocales".  And be greedy.  Ask for the list _you_ want.  If there
are more than a billion ethnic Chinese, I bet there are 100 million
who mostly agree with you.  Anything that makes 100 million people a
little happier with Emacs is a GoodThang[tm].  :-)  Note that one
effect of having an "official list" is that those who would want
something else will have something to point at and say "Gaak! that's
dead wrong!", and they'll offer their own lists.  Does that mean your
list goes away?  No, it means that to start with we add a customizable
variable like

Chinese Input Method Priority:
  [X] Leung's list (where you come from: im1, im2, im3, ...)
  [ ] Lee's list (Hong Kong: im3, im2, im1, ...)
  [ ] Custom list [..................................]

and then as we know more about how that correlates with other aspects
of Emacs usage, we start to build up the "sublocale" concept.

    Hin-Tak> The Chinese (ethnic) can't even agree on what constitutes
    Hin-Tak> the Big5 character set :-) - Big5 was a "convention" and
    Hin-Tak> there is a 5-10% difference between e.g. Hong Kong and
    Hin-Tak> Taiwan.  This is in contrast to the Japanese JIS level
    Hin-Tak> 1/level 2 sets, which are government controlled and
    Hin-Tak> revised centrally.

You mean the same Japanese who on the next to last major revision of
the standard still managed to leave out a couple of officially
recognized name characters?  Yes, the Chinese have bigger problems and
more various needs.  But localization is hard for everybody.  The
sooner we start, the better.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: a few MULE criticisms
  2003-05-17  7:32               ` Stephen J. Turnbull
@ 2003-05-17 19:40                 ` Hin-Tak Leung
  0 siblings, 0 replies; 13+ messages in thread
From: Hin-Tak Leung @ 2003-05-17 19:40 UTC (permalink / raw)
  Cc: emacs-devel

Stephen J. Turnbull wrote:
<snipped>
>     >> An explicit list would help.  Emacs could offer them in order
>     >> of popularity, at least to the extent that they are available
>     >> in free versions.
<snipped>
> If you want, you can specify "where you're coming from" (in this case,
> quite literally) and we can start to differentiate the Chinese
> "sublocales".  And be greedy.  Ask for the list _you_ want.
<snipped>

My own chinese editing needs/experience is maybe somewhat atypical. My
typing habit is, in decreasing frequency:
(1) the equivalent of "tsang-b5.el",
(2) doing "ab?cd", "?bcd" etc under "tsang-b5.el"
     (some other thread of this conversation seems to indicate that this
      cannot be done under MULE at the moment)
(3) the equivalent of "quick-b5.el" (it is the same as "a*b" under
  "tsang-b5.el" in (1), and "a???b"+"a??b"+"a?b"+"ab" in (2))
(4) english (i.e. typing "apple" and getting the characters for that fruit.
              no equivalent in MULE?)
(5) try one of the pronouciation-based ones. The closest equivalent is probably
'CTLau-b5.el'?

I do use association quite heavily, i.e. take the later half
of a 2/3/4-character phrase after typing the 1st character,
and also often when I reach (3),(4), and the character I want
is known to be the later-half of a common phrase, I would type
the first half using (1), (2) and delete it afterwards.
Inputting 4-5 key strokes for the first character, select from a
small list of 2-3 by association, then a delete, is still sometimes
preferable to scrolling a much longer list of 20-30 in (3).

My Japanese typing habit is somewhat heavily customized - I
have two private input methods (one based on tsang's, the other
is pronouciation-based which I compiled myself from
ftp://ftp.funet.fi/pub/culture/japan/info/jis1detl.lst, according
to my notes). Compiling one's own input method is probably because
it is learned as a 2nd language.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: a few MULE criticisms
@ 2003-05-18  5:23 Stefan Monnier
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Monnier @ 2003-05-18  5:23 UTC (permalink / raw)
  Cc: emacs-devel


Regarding the use of ? in input methods.
I've never looked at the code of quail before, so clearly
this is not working 100% (not even 90%, I'd say), but
maybe someone more knowledgeable can fix it ?

It maybe won't be a workable approach for dense quail
maps (like most are, I expect :-( ) but it works OK on
the TeX input method where I can say \righ?????? and get
a right arrow.

Generalizing to * is going to be even less workable, so I think
that a different approach is necessary.


	Stefan


--- quail.el.~1.128.~	Tue Apr 15 18:46:10 2003
+++ quail.el	Sun May 18 01:19:08 2003
@@ -1,6 +1,6 @@
 ;;; quail.el --- provides simple input method for multilingual text
 
-;; Copyright (C) 1995, 2000 Electrotechnical Laboratory, JAPAN.
+;; Copyright (C) 1995, 2000, 2003 Electrotechnical Laboratory, JAPAN.
 ;; Licensed to the Free Software Foundation.
 ;; Copyright (C) 2001, 2002 Free Software Foundation, Inc.
 
@@ -1218,6 +1216,28 @@
    (t
     (error "Invalid object in Quail map: %s" def))))
 
+(defun quail-merge-maps (map1 &rest maps)
+  (if (null maps) map1
+    (let* ((map2 (pop maps))
+	   (h1 (pop map1))
+	   (h2 (pop map2)))
+      (apply 'quail-merge-maps
+	     (cons
+	      (if (and h1 h2)
+		  (vconcat (if (vectorp h1) h1 (vector h1))
+			   (if (vectorp h2) h2 (vector h2)))
+		(or h1 h2))
+	      (let ((tail nil) conflict)
+		(dolist (entry map1)
+		  (setq conflict (assq (car entry) map2))
+		  (push (if (not conflict) entry
+			  (setq map2 (delq conflict map2))
+			  (cons (car entry)
+				(quail-merge-maps (cdr entry) (cdr conflict))))
+			tail))
+		(append map2 tail)))
+	     maps))))
+
 (defun quail-lookup-key (key &optional len)
   "Lookup KEY of length LEN in the current Quail map and return the definition.
 The returned value is a Quail map specific to KEY."
@@ -1236,7 +1256,9 @@
       (setq slot (assq ch (cdr map)))
       (if (and (cdr slot) (symbolp (cdr slot)))
 	  (setcdr slot (funcall (cdr slot) key idx)))
-      (setq map (cdr slot)))
+      (if (and (null slot) (eq ch ??))
+	  (setq map (apply 'quail-merge-maps (mapcar 'cdr (cdr map))))
+	(setq map (cdr slot))))
     (setq def (car map))
     (setq quail-current-translations nil)
     (if (and map (setq translation (quail-get-translation def key len)))

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2003-05-18  5:23 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-05-18  5:23 a few MULE criticisms Stefan Monnier
  -- strict thread matches above, loose matches on Subject: below --
2003-05-14 20:03 a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld Hin-Tak Leung
2003-05-14 20:55 ` Jason Rumney
2003-05-14 22:05   ` a few MULE criticisms Hin-Tak Leung
2003-05-14 21:55 ` a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld Stefan Monnier
2003-05-15  2:03   ` a few MULE criticisms Hin-Tak Leung
2003-05-15  6:55     ` Jason Rumney
2003-05-15  1:18 ` a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld Kenichi Handa
2003-05-15  3:29   ` a few MULE criticisms Hin-Tak Leung
2003-05-15 10:06     ` Hin-Tak Leung
2003-05-15 15:51       ` Stephen J. Turnbull
2003-05-15 19:49         ` Hin-Tak Leung
2003-05-15 21:29           ` Kevin Rodgers
2003-05-16  7:09           ` Stephen J. Turnbull
2003-05-16 11:43             ` Hin-Tak Leung
2003-05-17  7:32               ` Stephen J. Turnbull
2003-05-17 19:40                 ` Hin-Tak Leung

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).