From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: [PATCH] Unicode Lisp reader escapes Date: Wed, 10 May 2006 14:37:44 +0900 Message-ID: References: <17491.34779.959316.484740@parhasard.net> <87odyfnqcj.fsf-monnier+emacs@gnu.org> <17498.27200.911709.330947@parhasard.net> <877j4z5had.fsf@gmx.de> <87irohfrx1.fsf@gmx.de> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=ISO-2022-JP X-Trace: sea.gmane.org 1147239541 7507 80.91.229.2 (10 May 2006 05:39:01 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 10 May 2006 05:39:01 +0000 (UTC) Cc: alkibiades@gmx.de, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed May 10 07:38:57 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1FdhPT-0008LJ-Iw for ged-emacs-devel@m.gmane.org; Wed, 10 May 2006 07:38:55 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FdhPS-0004nn-Vc for ged-emacs-devel@m.gmane.org; Wed, 10 May 2006 01:38:54 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1FdhPI-0004na-Ee for emacs-devel@gnu.org; Wed, 10 May 2006 01:38:44 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1FdhPG-0004mj-Be for emacs-devel@gnu.org; Wed, 10 May 2006 01:38:43 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FdhPG-0004mU-2o for emacs-devel@gnu.org; Wed, 10 May 2006 01:38:42 -0400 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA:32) (Exim 4.52) id 1FdhQS-00033e-4W; Wed, 10 May 2006 01:39:56 -0400 Original-Received: from nfs.m17n.org (nfs.m17n.org [192.47.44.7]) by tsukuba.m17n.org (8.13.4/8.13.4/Debian-3sarge1) with ESMTP id k4A5cYBL015743; Wed, 10 May 2006 14:38:34 +0900 Original-Received: from etlken (etlken.m17n.org [192.47.44.125]) by nfs.m17n.org (8.13.4/8.13.4/Debian-3sarge1) with ESMTP id k4A5cXQ3013924; Wed, 10 May 2006 14:38:34 +0900 Original-Received: from handa by etlken with local (Exim 3.36 #1 (Debian)) id 1FdhOK-0005JX-00; Wed, 10 May 2006 14:37:44 +0900 Original-To: rms@gnu.org In-reply-to: (message from Richard Stallman on Tue, 09 May 2006 23:20:53 -0400) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/22.0.50 (i686-pc-linux-gnu) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:54173 Archived-At: In article , Richard Stallman writes: > In addition, the default value of > utf-translate-cjk-mode t, and to which CJK charsets Han > characters of Unicode are decoded depends on these: > (1) current-language-environment > What effect does this have? (Aside from the choice of coding system, > that is.) Some Han characters in Unicode can be decoded into several CJK charsets (e.g. chinese-gb2312, chinese-big5-1, japanese-jisx0208). current-language-environment decides which of them to use. > (4) the contents of the hash table ucs-unicode-to-mule-cjk > (a user can freely reflect one's preference on how to decode > Unicode character by modifying this hash table). > Could you tell me some examples for how users are really expected > to use this? I don't know a concrete example, but I can imagine this. U+9AD9 is a variant of U+9AD8, but japanese-jisx0208 contains only the latter. Actually, non of legacy CJK charset contains U+9AD9. But, as it is just a variant of U+9AD8, just for reading, one may want to decode it into japanese-jisx0208. In such a case, one can simply do this: (puthash #x9AD9 ?高 ucs-unicode-to-mule-cjk) > Overall: > With so many different variables that might affect the reading of > these characters, it is just too inconvenient for every file to > specify them all. So I think we need a new feature to make that easy > to do. > Here's one idea. > Add a new "variable" `buffer-coding' which is analogous to `coding'. > Whereas `coding' specifies the encoding in the file, `buffer-coding' > specifies the in-buffer encoding to produce in the buffer. Its value > could be a list or plist, which would specify the values of all these > many variables. > What do you think? If you think this is a good idea, could > you try designing the details? No, it's an incredibly hard and heavy task. When you read utf-8.el and ucs-tables.el, you'll soon realize that. I believe it's just a waste of time to work on such a thing. We have already done lots of workarounds for workarounds for workarounds for not using Unicode internally, but there's a limit. I believe no one is pleased by producing the same *.elc in such a situation. Please accept this problem as a bad feature (not a bug), and write something in etc/PROBLEMS. If not, please decide to shift to emacs-unicode just now. It's the right thing to solve this problem. --- Kenichi Handa handa@m17n.org