From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: "James H. Cloos Jr." Newsgroups: gmane.emacs.devel Subject: describe-char and unicode data Date: 09 May 2003 14:31:52 -0400 Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1052505486 20438 80.91.224.249 (9 May 2003 18:38:06 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Fri, 9 May 2003 18:38:06 +0000 (UTC) Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Fri May 09 20:38:02 2003 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 19ECks-0005J2-00 for ; Fri, 09 May 2003 20:38:02 +0200 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 19ECpw-0001Fz-00 for ; Fri, 09 May 2003 20:43:16 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 19ECmB-0008Sb-04 for emacs-devel@quimby.gnus.org; Fri, 09 May 2003 14:39:23 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13) id 19ECla-0008De-00 for emacs-devel@gnu.org; Fri, 09 May 2003 14:38:46 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13) id 19EClB-0007hO-00 for emacs-devel@gnu.org; Fri, 09 May 2003 14:38:21 -0400 Original-Received: from gnuftp.gnu.org ([199.232.41.6]) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 19ECks-0007Up-00 for emacs-devel@gnu.org; Fri, 09 May 2003 14:38:02 -0400 Original-Received: from ore.jhcloos.com ([64.240.156.239]) by gnuftp.gnu.org with esmtp (Exim 4.10.13) id 19ECfl-0006wI-00 for emacs-devel@gnu.org; Fri, 09 May 2003 14:32:45 -0400 Original-Received: from lugabout.jhcloos.org (ppp30.tc-1.buf-ch.ny.localnet.com [207.251.220.30]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits))OK)) by ore.jhcloos.com (Postfix) with ESMTP id 4A06D1C2D0 for ; Fri, 9 May 2003 13:32:34 -0500 (CDT) Original-Received: from lugabout.jhcloos.org (localhost [127.0.0.1]) id AF18C25B6A for ; Fri, 9 May 2003 18:31:52 +0000 (GMT) Original-To: emacs-devel@gnu.org Original-Lines: 23 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Emacs development discussions. List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:13788 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:13788 Describe-char shows the unicode hex value of the character in question if it exists (some chars do not translate to unicode). Would a patch that expands that to also show the relevant data from UnicodeData.txt be accepted? Step one would be code to convert UnicodeData.txt to a suitable elisp structure, generating a unicodedata.el file. Given that, the additional logic in describe-char is trivial. To give an idea of the amount of data available, UnicodeData.txt is a semicolon-separated text db with 15 fields per record, and currently has 15100 records, so loading this may be an issue. The related Unihan.txt has up to 78 possible entries for each of 71098 characters. The name entry from UnicodeData.txt and probably the kDefinition entries from Unihan.txt would be the useful additions for describe-char. The rest of the data may however be useful elsewhere. What is therefore the best structure to use for this data? -JimC