From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Juri Linkov Newsgroups: gmane.emacs.devel Subject: Re: inputting characters by hexadigit Date: Sun, 20 Jul 2008 23:27:47 +0300 Organization: JURTA Message-ID: <87od4sti4g.fsf@jurta.org> References: <868ww3vydn.fsf@lifelogs.com> <87myki6fqp.fsf@jurta.org> <87mykhz6tf.fsf@jurta.org> <87tzeokrku.fsf@jurta.org> <87od4wgg8p.fsf@catnip.gol.com> <86od4vmi5i.fsf@lifelogs.com> <873am6n21q.fsf@jurta.org> <87sku5if8t.fsf_-_@jurta.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1216586594 16014 80.91.229.12 (20 Jul 2008 20:43:14 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 20 Jul 2008 20:43:14 +0000 (UTC) Cc: tzz@lifelogs.com, emacs-devel@gnu.org To: Kenichi Handa Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Jul 20 22:44:03 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1KKfku-00030x-PQ for ged-emacs-devel@m.gmane.org; Sun, 20 Jul 2008 22:43:45 +0200 Original-Received: from localhost ([127.0.0.1]:38340 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KKfk1-0001TL-Qp for ged-emacs-devel@m.gmane.org; Sun, 20 Jul 2008 16:42:49 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KKfjp-0001Me-Gi for emacs-devel@gnu.org; Sun, 20 Jul 2008 16:42:37 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KKfjn-0001Km-2m for emacs-devel@gnu.org; Sun, 20 Jul 2008 16:42:36 -0400 Original-Received: from [199.232.76.173] (port=45098 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KKfjm-0001KX-Rx for emacs-devel@gnu.org; Sun, 20 Jul 2008 16:42:34 -0400 Original-Received: from relay01.kiev.sovam.com ([62.64.120.200]:2005) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1KKfjm-0000Au-Lj for emacs-devel@gnu.org; Sun, 20 Jul 2008 16:42:34 -0400 Original-Received: from [83.170.232.243] (helo=smtp.svitonline.com) by relay01.kiev.sovam.com with esmtp (Exim 4.67) (envelope-from ) id 1KKfiT-000KaE-RQ; Sun, 20 Jul 2008 23:42:29 +0300 In-Reply-To: (Kenichi Handa's message of "Sun, 20 Jul 2008 10:23:57 +0900") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (x86_64-pc-linux-gnu) X-Scanner-Signature: eacc24c597fa83a26e5f02f00d1ff66c X-DrWeb-checked: yes X-SpamTest-Envelope-From: juri@jurta.org X-SpamTest-Group-ID: 00000000 X-SpamTest-Header: Trusted X-SpamTest-Info: Profiles 4378 [July 20 2008] X-SpamTest-Info: {received from trusted relay: common white list} X-SpamTest-Info: {HEADERS: header Content-Type found without required header Content-Transfer-Encoding} X-SpamTest-Method: white ip list X-SpamTest-Rate: 10 X-SpamTest-Status: Trusted X-SpamTest-Status-Extended: trusted X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0278], KAS30/Release X-detected-kernel: by monty-python.gnu.org: FreeBSD 4.8-5.1 (or MacOS X 10.2-10.3) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:101030 Archived-At: >> > I think it is better to skip these ranges: >> > #x3400..#x4dbf -- CJK Ideograph Extension A >> > #x4e00..#x9fff -- CJK Ideograph >> > #xd800..#xfaFF -- surroage-pair, private use, CJK COMPATIBILITY IDEOGRAPH >> > #x20000..#x2ffff -- CJK Ideograph Extension B >> > and end the loop at #xeffff (#xf0000.. are for private use) > >> Actually there are no Unicode names in these ranges in UnicodeData.txt. >> It has only lines for the first and the last character in these ranges: > > Yes. But, for CJK chars: > > (get-char-code-property CHAR 'name) > > returns a valid name something like "CJK IDEOGRAPH-3400"(*) > because get-char-code-property not only looks up > UnicodeData.txt but also compute a proper value if > necessary. Thanks, I see now why it is necessary to skip these ranges. Index: lisp/international/mule-cmds.el =================================================================== RCS file: /sources/emacs/emacs/lisp/international/mule-cmds.el,v retrieving revision 1.333 diff -c -r1.333 mule-cmds.el *** lisp/international/mule-cmds.el 15 Jul 2008 18:15:03 -0000 1.333 --- lisp/international/mule-cmds.el 20 Jul 2008 20:27:21 -0000 *************** *** 2846,2855 **** (defvar nonascii-insert-offset 0 "This variable is obsolete.") (defvar nonascii-translation-table nil "This variable is obsolete.") (defun ucs-insert (arg) "Insert a character of the given Unicode code point. Interactively, prompts for a hex string giving the code." ! (interactive "sUnicode (hex): ") (or (integerp arg) (setq arg (string-to-number arg 16))) (if (or (< arg 0) (> arg #x10FFFF)) --- 2849,2879 ---- (defvar nonascii-insert-offset 0 "This variable is obsolete.") (defvar nonascii-translation-table nil "This variable is obsolete.") + (defun read-char-by-name (prompt) + "Read a character by its Unicode name or hex number string. + Display PROMPT and read a string that represents a character + by its Unicode property `name' or `old-name'. It also accepts + a hexadecimal number of Unicode code point. Returns a character + as a number." + (let (name names) + (dotimes (c #xEFFFF) + (unless (or + (and (>= c #x3400 ) (<= c #x4dbf )) ; CJK Ideograph Extension A + (and (>= c #x4e00 ) (<= c #x9fff )) ; CJK Ideograph + (and (>= c #xd800 ) (<= c #xfaff )) ; Private/Surrogate + (and (>= c #x20000) (<= c #x2ffff)) ; CJK Ideograph Extension B + ) + (if (setq name (get-char-code-property c 'name)) + (setq names (cons (cons name c) names))) + (if (setq name (get-char-code-property c 'old-name)) + (setq names (cons (cons name c) names))))) + (or (cdr (assoc (setq name (completing-read prompt names)) names)) + (string-to-number name 16)))) + (defun ucs-insert (arg) "Insert a character of the given Unicode code point. Interactively, prompts for a hex string giving the code." ! (interactive (list (read-char-by-name "Unicode (hex or name): "))) (or (integerp arg) (setq arg (string-to-number arg 16))) (if (or (< arg 0) (> arg #x10FFFF)) -- Juri Linkov http://www.jurta.org/emacs/