From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel Subject: Re: Input method or help feature needed Date: Mon, 21 Feb 2011 11:53:20 +0900 Message-ID: <87k4gunlnj.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87lj1ew6d3.fsf@catnip.gol.com> <20110218083736.GA12190@tomas> <20110220082705.GA4092@tomas> <83hbbytmvl.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: dough.gmane.org 1298256664 26400 80.91.229.12 (21 Feb 2011 02:51:04 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Mon, 21 Feb 2011 02:51:04 +0000 (UTC) Cc: tomas@tuxteam.de, rms@gnu.org, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Feb 21 03:50:59 2011 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1PrLrW-0000IZ-Uo for ged-emacs-devel@m.gmane.org; Mon, 21 Feb 2011 03:50:59 +0100 Original-Received: from localhost ([127.0.0.1]:57125 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PrLrW-00070M-DT for ged-emacs-devel@m.gmane.org; Sun, 20 Feb 2011 21:50:58 -0500 Original-Received: from [140.186.70.92] (port=43947 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PrLrS-00070C-2w for emacs-devel@gnu.org; Sun, 20 Feb 2011 21:50:54 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PrLrN-0001xj-0B for emacs-devel@gnu.org; Sun, 20 Feb 2011 21:50:53 -0500 Original-Received: from mgmt2.sk.tsukuba.ac.jp ([130.158.97.224]:41021) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PrLrK-0001x0-Tm; Sun, 20 Feb 2011 21:50:47 -0500 Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp [130.158.99.156]) by mgmt2.sk.tsukuba.ac.jp (Postfix) with ESMTP id 486079706AB; Mon, 21 Feb 2011 11:50:42 +0900 (JST) Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000) id 39CCA1A2884; Mon, 21 Feb 2011 11:53:20 +0900 (JST) In-Reply-To: <83hbbytmvl.fsf@gnu.org> X-Mailer: VM 8.1.93a under 21.5 (beta29) "garbanzo" ed3b274cc037 XEmacs Lucid (x86_64-unknown-linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 130.158.97.224 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:136313 Archived-At: Eli Zaretskii writes: > > (Excluding Korean and Han characters and whatever else ought to be > > excluded). > > Why exclude them? Because there are 11000 of the former and 21000 (and counting) of the latter. The Korean Hangul are precomposed in an algorithmic fashion from about 70 components called "jamo". It makes very little sense to just have many pages when you can look up the jamo in smaller lists, and drill down to exactly the Hangul you want. Just as it should be possible to type "i" and get a page of all characters related to "i" including the Turkish dotless "i" and Greek iota, etc. Similarly, the Han characters are organized by radical and stroke count, and it should be possible to look at the (relatively) short list of 214 radicals, then drill down to an approximate stroke count, and then page up and down the stroke count. There are non-radical components as well, many of which even total Han illiterates would be likely to recognize. I don't know if these are listed in the Unicode tables, but if so they could be combined with the radical and (optionally) approximate stroke count to drastically prune the search tree in 90% or more of practical cases. However a simple list of Hangul or Hanzi would be rather painful to use, not to mention that if you don't know how to say it (every Hangul has an algorithmically constructed pronunciation), you're probably not fluent enough in the language to easily pick the right character out of an array of say 400 (20 x 20 seems like a reasonable size for a "page" of characters). The real differences are often subtle, not to mention that many characters have several variant glyphs, and these variations tend to confuse the non-native speaker. A pure list in Unicode order for these characters is better than *nothing*, true, but it's not really an acceptable answer to Richard's requirement.