From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: faster unicode character name completion Date: Tue, 08 Dec 2009 10:45:56 +0900 Message-ID: References: <87einfbxdw.fsf@red-bean.com> <87fx7r68s4.fsf@stupidchicken.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1260236785 16368 80.91.229.12 (8 Dec 2009 01:46:25 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 8 Dec 2009 01:46:25 +0000 (UTC) Cc: cyd@stupidchicken.com, emacs-devel@gnu.org To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Dec 08 02:46:17 2009 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1NHp9b-0000VV-3R for ged-emacs-devel@m.gmane.org; Tue, 08 Dec 2009 02:46:15 +0100 Original-Received: from localhost ([127.0.0.1]:37928 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NHp9a-00085f-Ns for ged-emacs-devel@m.gmane.org; Mon, 07 Dec 2009 20:46:14 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NHp9V-00085S-6k for emacs-devel@gnu.org; Mon, 07 Dec 2009 20:46:09 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NHp9Q-00084k-DD for emacs-devel@gnu.org; Mon, 07 Dec 2009 20:46:08 -0500 Original-Received: from [199.232.76.173] (port=57812 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NHp9Q-00084h-6z for emacs-devel@gnu.org; Mon, 07 Dec 2009 20:46:04 -0500 Original-Received: from mx1.aist.go.jp ([150.29.246.133]:42713) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NHp9P-0007Xp-DB for emacs-devel@gnu.org; Mon, 07 Dec 2009 20:46:03 -0500 Original-Received: from rqsmtp2.aist.go.jp (rqsmtp2.aist.go.jp [150.29.254.123]) by mx1.aist.go.jp with ESMTP id nB81jvQG004520; Tue, 8 Dec 2009 10:45:57 +0900 (JST) env-from (handa@m17n.org) Original-Received: from smtp4.aist.go.jp by rqsmtp2.aist.go.jp with ESMTP id nB81jveF028817; Tue, 8 Dec 2009 10:45:57 +0900 (JST) env-from (handa@m17n.org) Original-Received: by smtp4.aist.go.jp with ESMTP id nB81jujo000476; Tue, 8 Dec 2009 10:45:56 +0900 (JST) env-from (handa@m17n.org) Original-Received: from handa by etlken with local (Exim 4.69) (envelope-from ) id 1NHp9I-0007Sg-Ke; Tue, 08 Dec 2009 10:45:56 +0900 In-Reply-To: (message from Stefan Monnier on Mon, 07 Dec 2009 09:57:46 -0500) X-detected-operating-system: by monty-python.gnu.org: Solaris 9 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:118379 Archived-At: In article , Stefan Monnier writes: >>> I don't understand what ucs-name-filter is trying to do. > > ?? It simply filters out elements that doesn't match with > > STR from NAMES (alist). > But then why is it needed? > Doesn't `completion-table-dynamic' take care of that already? I don't know. The info says this: -- Function: completion-table-dynamic function This function is a convenient way to write a function that can act as programmed completion function. The argument FUNCTION should be a function that takes one argument, a string, and returns an alist of possible completions of it. You can think of `completion-table-dynamic' as a transducer between that interface and the interface for programmed completion functions. I thought that FUNCTION should return an alist that contains ONLY valid completions. > But I have a better idea: most of the time is not spent building the > completion table, but rather just weeding out all the "chars" that don't > have names, or should I say, looking for the few rare chars that do > have a name. > So the patch below seems to eb a good compromise: it uses up just about > 1000K cons cells (i.e. 16KB on 64bit systems) to keep the precomputed > set of ~34K chars that do have a name, so that building the completion > table takes only a couple seconds. Ah, interesting approach. But, I've just found that dotimes-with-progress-reporter of the original code didn't exclude the big unused range U+30000..U+DFFFF (about 75% of the range currently checked). Just excluding that part in the original code achieves almost the same performance as your patch. Attached is that simpler version. --- Kenichi Handa handa@m17n.org (defun ucs-names () "Return alist of (CHAR-NAME . CHAR-CODE) pairs cached in `ucs-names'." (or ucs-names (let ((ranges '((#x00000 . #x033FF) ;; (#x03400 . #x04DBF) CJK Ideograph Extension A (#x04DC0 . #x04DFF) ;; (#x04E00 . #x0x09FFF) CJK Ideograph (#x0A000 . #x0D7FF) ;; (#x0D800 . #x0FAFF) Surrogate/Private (#x0FB00 . #x1FFFF) ;; (#x20000 . #xDFFFF) CJK Ideograph Extension A, B, etc, unsed (#xE0000 . #xE01EF))) c end name names) (dolist (range ranges) (setq c (car range) end (cdr range)) (while (<= c end) (if (setq name (get-char-code-property c 'name)) (push (cons name c) names)) (if (setq name (get-char-code-property c 'old-name)) (push (cons name c) names)) (setq c (1+ c)))) (setq ucs-names names))))