From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.devel Subject: Re: Character literals for Unicode (control) characters Date: Sun, 6 Mar 2016 10:08:39 -0800 Organization: UCLA Computer Science Department Message-ID: <56DC7227.10708@cs.ucla.edu> References: <87r3fsjenn.fsf@gnus.org> <56D8623F.6060806@cs.ucla.edu> <838u1vwqj9.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1457287762 31606 80.91.229.3 (6 Mar 2016 18:09:22 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 6 Mar 2016 18:09:22 +0000 (UTC) Cc: larsi@gnus.org, johnw@gnu.org, emacs-devel@gnu.org To: Philipp Stephani , Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Mar 06 19:09:14 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1acd7B-0003sw-FS for ged-emacs-devel@m.gmane.org; Sun, 06 Mar 2016 19:09:13 +0100 Original-Received: from localhost ([::1]:51576 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1acd7A-0007Me-Oe for ged-emacs-devel@m.gmane.org; Sun, 06 Mar 2016 13:09:12 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:46047) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1acd6z-0007MB-Co for emacs-devel@gnu.org; Sun, 06 Mar 2016 13:09:02 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1acd6y-00057J-N1 for emacs-devel@gnu.org; Sun, 06 Mar 2016 13:09:01 -0500 Original-Received: from zimbra.cs.ucla.edu ([131.179.128.68]:35650) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1acd6s-00055m-PQ; Sun, 06 Mar 2016 13:08:54 -0500 Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id C5D9B160FD0; Sun, 6 Mar 2016 10:08:45 -0800 (PST) Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id q_VAnBjAPHaW; Sun, 6 Mar 2016 10:08:45 -0800 (PST) Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 1BF30160FD5; Sun, 6 Mar 2016 10:08:45 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id tFmBAJANXQnM; Sun, 6 Mar 2016 10:08:45 -0800 (PST) Original-Received: from [192.168.1.9] (pool-100-32-155-148.lsanca.fios.verizon.net [100.32.155.148]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id E3447160FD0; Sun, 6 Mar 2016 10:08:44 -0800 (PST) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 In-Reply-To: X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 131.179.128.68 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:201008 Archived-At: Thanks for taking this on. Some comments: Why the hash table? Existing Lisp code dealing with Unicode names uses an alist, and it seems to do OK. If a hash table is needed, a hash table should also be used by the existing code elsewhere that does something similar. See the function ucs-names and its callers. If a hash table is needed, I suggest using a perfect hashing function (generated by gperf) and checking its results with get-char-code-property. That avoids the runtime overhead of initialization. It needs documentation, both in the Emacs Lisp manual and in NEWS. > +void init_character_names () > +{ The usual style is: void init_character_names (void) { No need for "const" for local variables (cost exceeds benefit). > if (c_isspace (c)) > { > if (! whitespace) > { > whitespace = true; > name[length++] = ' '; > } > } > else > { > whitespace = false; > name[length++] = c; > } This would be a bit easier to follow (and most likely a tiny bit more efficient) as something like this: > bool ws = c_isspace (c); > if (ws) > { > length -= whitespace; > c = ' '; > } > whitespace = ws; > name[length++] = c;