From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: Converting CC Mode's obarrays to hash tables. [Was: new `obarray` type] Date: Mon, 24 Jul 2017 10:06:05 -0400 Message-ID: References: <20170313220335.GA5098@acm> <20170314201414.GA4562@acm> <20170723140310.GA2551@acm> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1500905323 29396 195.159.176.226 (24 Jul 2017 14:08:43 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 24 Jul 2017 14:08:43 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (gnu/linux) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Jul 24 16:08:39 2017 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dZe2G-0007AD-Ru for ged-emacs-devel@m.gmane.org; Mon, 24 Jul 2017 16:08:37 +0200 Original-Received: from localhost ([::1]:55077 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dZe2J-0008FO-Dh for ged-emacs-devel@m.gmane.org; Mon, 24 Jul 2017 10:08:39 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39616) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dZe05-0006hv-Qi for emacs-devel@gnu.org; Mon, 24 Jul 2017 10:06:23 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dZe01-0001lr-AA for emacs-devel@gnu.org; Mon, 24 Jul 2017 10:06:21 -0400 Original-Received: from [195.159.176.226] (port=39166 helo=blaine.gmane.org) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1dZe00-0001kZ-Vz for emacs-devel@gnu.org; Mon, 24 Jul 2017 10:06:17 -0400 Original-Received: from list by blaine.gmane.org with local (Exim 4.84_2) (envelope-from ) id 1dZdzr-0000N9-Ls for emacs-devel@gnu.org; Mon, 24 Jul 2017 16:06:07 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 191 Original-X-Complaints-To: usenet@blaine.gmane.org Cancel-Lock: sha1:e8pEFOLprSu2HHYFwEKEDkhBtjM= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 195.159.176.226 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:216963 Archived-At: > As for c-keywords-obarray (an obarray which has C (etc.) keywords as its > symbols, whose property lists contain the various keyword categories the > keywords belong to) - what is the benefit of holding collections of > symbols in hash tables rather than obarrays? I'd return the question. What's the benefit of using an extra indirection through symbols to hold plists, instead of just using the plist directly. For reference, here's the relevant part of the patch: diff --git a/lisp/progmodes/cc-engine.el b/lisp/progmodes/cc-engine.el index a5ade09791..abddbdf5bf 100644 --- a/lisp/progmodes/cc-engine.el +++ b/lisp/progmodes/cc-engine.el @@ -535,14 +535,14 @@ c-keyword-sym ;; Return non-nil if the string KEYWORD is a known keyword. More ;; precisely, the value is the symbol for the keyword in ;; `c-keywords-obarray'. - (intern-soft keyword c-keywords-obarray)) + (gethash keyword c-keywords-table)) (defsubst c-keyword-member (keyword-sym lang-constant) ;; Return non-nil if the symbol KEYWORD-SYM, as returned by ;; `c-keyword-sym', is a member of LANG-CONSTANT, which is the name ;; of a language constant that ends with "-kwds". If KEYWORD-SYM is ;; nil then the result is nil. - (get keyword-sym lang-constant)) + (plist-get keyword-sym lang-constant)) ;; String syntax chars, suitable for skip-syntax-(forward|backward). (defconst c-string-syntax (if (memq 'gen-string-delim c-emacs-features) diff --git a/lisp/progmodes/cc-langs.el b/lisp/progmodes/cc-langs.el index 3b455fc090..8b6562f7df 100644 --- a/lisp/progmodes/cc-langs.el +++ b/lisp/progmodes/cc-langs.el @@ -2749,8 +2747,8 @@ 'c-opt-op-identitier-prefix (setcdr elem (cons lang-const (cdr elem)))))) result-alist)) -(c-lang-defvar c-keywords-obarray - ;; An obarray containing all keywords as symbols. The property list +(c-lang-defvar c-keywords-table + ;; A hash table containing all keywords as symbols. The property list ;; of each symbol has a non-nil entry for the specific `*-kwds' ;; lists it's a member of. ;; @@ -2767,22 +2765,23 @@ 'c-opt-op-identitier-prefix (let* ((alist (c-lang-const c-keyword-member-alist)) kwd lang-const-list - (obarray (make-vector (* (length alist) 2) 0))) + (table (make-hash-table :test #'equal))) (while alist (setq kwd (caar alist) lang-const-list (cdar alist) alist (cdr alist)) - (setplist (intern kwd obarray) - ;; Emacs has an odd bug that causes `mapcan' to fail - ;; with unintelligible errors. (XEmacs works.) - ;; (2015-06-24): This bug has not yet been fixed. - ;;(mapcan (lambda (lang-const) - ;; (list lang-const t)) - ;; lang-const-list) - (apply 'nconc (mapcar (lambda (lang-const) + (puthash kwd + ;; Emacs has an odd bug that causes `mapcan' to fail + ;; with unintelligible errors. (XEmacs works.) + ;; (2015-06-24): This bug has not yet been fixed. + ;;(mapcan (lambda (lang-const) + ;; (list lang-const t)) + ;; lang-const-list) + (apply #'nconc (mapcar (lambda (lang-const) (list lang-const t)) - lang-const-list)))) - obarray)) + lang-const-list)) + table)) + table)) (c-lang-defconst c-regular-keywords-regexp ;; Adorned regexp matching all keywords that should be fontified BTW, having re-looked at the code, I see that those plists are really justs sets, so it could be optimized even further, as in: diff --git a/lisp/progmodes/cc-engine.el b/lisp/progmodes/cc-engine.el index a5ade09791..abddbdf5bf 100644 --- a/lisp/progmodes/cc-engine.el +++ b/lisp/progmodes/cc-engine.el @@ -535,14 +535,14 @@ c-keyword-sym ;; Return non-nil if the string KEYWORD is a known keyword. More ;; precisely, the value is the symbol for the keyword in ;; `c-keywords-obarray'. - (intern-soft keyword c-keywords-obarray)) + (gethash keyword c-keywords-table)) (defsubst c-keyword-member (keyword-sym lang-constant) ;; Return non-nil if the symbol KEYWORD-SYM, as returned by ;; `c-keyword-sym', is a member of LANG-CONSTANT, which is the name ;; of a language constant that ends with "-kwds". If KEYWORD-SYM is ;; nil then the result is nil. - (get keyword-sym lang-constant)) + (memq keyword-sym lang-constant)) ;; String syntax chars, suitable for skip-syntax-(forward|backward). (defconst c-string-syntax (if (memq 'gen-string-delim c-emacs-features) diff --git a/lisp/progmodes/cc-langs.el b/lisp/progmodes/cc-langs.el index 3b455fc090..8b6562f7df 100644 --- a/lisp/progmodes/cc-langs.el +++ b/lisp/progmodes/cc-langs.el @@ -2749,8 +2747,8 @@ 'c-opt-op-identitier-prefix (setcdr elem (cons lang-const (cdr elem)))))) result-alist)) -(c-lang-defvar c-keywords-obarray - ;; An obarray containing all keywords as symbols. The property list +(c-lang-defvar c-keywords-table + ;; A hash table containing all keywords as symbols. The value ;; of each keyword is a set of symbols indicating presence in ;; each `*-kwds' lists. ;; @@ -2767,22 +2765,23 @@ 'c-opt-op-identitier-prefix (let* ((alist (c-lang-const c-keyword-member-alist)) kwd lang-const-list - (obarray (make-vector (* (length alist) 2) 0))) + (table (make-hash-table :test #'equal))) (while alist (setq kwd (caar alist) lang-const-list (cdar alist) alist (cdr alist)) - (setplist (intern kwd obarray) - ;; Emacs has an odd bug that causes `mapcan' to fail - ;; with unintelligible errors. (XEmacs works.) - ;; (2015-06-24): This bug has not yet been fixed. - ;;(mapcan (lambda (lang-const) - ;; (list lang-const t)) - ;; lang-const-list) - (apply 'nconc (mapcar (lambda (lang-const) (list lang-const t)) - lang-const-list)))) - obarray)) + (puthash kwd (mapcar #'car lang-const-list) + table)) + table)) (c-lang-defconst c-regular-keywords-regexp ;; Adorned regexp matching all keywords that should be fontified An obarray, is just a hash-table indexed by strings where every hash entry holds a 4-field struct (the fields are: `value`, `function`, and `plist`, plus the read-only `name` field) with some restrictions (a given such struct can only be placed in a single obarray; you can remove such a symbol from an obarray, but you can't put it back in). It's really rare for a program to need a hash-table where the values carried by each entry happen to have just the shape of symbols. It's common for the extra restrictions to be satisfied, but it's rare for them to be useful. c-lang-constants is one of the rare cases where you managed to make fairly good use of all 3 fields. Of course, rather than cl-structs, we could use (uninterned) symbols in the hash-table to recover the speed advantage of symbols (due to the fact that symbol-value, get, and symbol-function have their own bytecode, and symbol-plist is also hard-coded in C; so they are faster than cl-lib's struct-field access which uses aref after manually checking that the object has the expected type; and although `aref' also has its own bytecode it's made slower by the need to handle various array types and to check array bounds). > If this makes a program faster, it suggests that the obarray > implementation could be improved to match the speed of the hash > table implementation. Indeed. But in most cases, the source code is made more clear when using the hash-table API rather than the obarray API. I think we'd be better off declaring the second arg of `intern' as obsolete. Or make it accept a hash-table as second argument. > Why don't we store the main Emacs obarray in a hash table, if this > will increase speed? Lack of time/motivation (at least on my part). Stefan