From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Jean Louis Newsgroups: gmane.emacs.help Subject: Re: Which Elisp data structure is fastest for searching? Date: Thu, 26 Dec 2024 14:49:24 +0300 Message-ID: References: <86wmfnc66h.fsf@gmail.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="=_stw1.rcdrun.com-1183760-1735213767-0001-2" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="21023"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mutt/2.2.12 (2023-09-09) Cc: Joel Reicher , Help GNU Emacs To: tomas@tuxteam.de Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Thu Dec 26 12:50:04 2024 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tQmNP-0005IX-Rj for geh-help-gnu-emacs@m.gmane-mx.org; Thu, 26 Dec 2024 12:50:03 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tQmMw-0001qM-Ve; Thu, 26 Dec 2024 06:49:34 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tQmMu-0001pu-Rv for help-gnu-emacs@gnu.org; Thu, 26 Dec 2024 06:49:33 -0500 Original-Received: from stw1.rcdrun.com ([217.170.207.13]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tQmMt-0000NW-2e for help-gnu-emacs@gnu.org; Thu, 26 Dec 2024 06:49:32 -0500 Original-Received: from localhost ([::ffff:41.75.190.33]) (AUTH: PLAIN admin, TLS: TLS1.3,256bits,ECDHE_RSA_AES_256_GCM_SHA384) by stw1.rcdrun.com with ESMTPSA id 000000000007DC8D.00000000676D42C7.00121010; Thu, 26 Dec 2024 04:49:26 -0700 Mail-Followup-To: tomas@tuxteam.de, Joel Reicher , Help GNU Emacs Content-Disposition: inline In-Reply-To: Received-SPF: pass client-ip=217.170.207.13; envelope-from=bugs@gnu.support; helo=stw1.rcdrun.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.help:148998 Archived-At: This is a MIME-formatted message. If you see this text it means that your E-mail software does not support MIME-formatted messages. --=_stw1.rcdrun.com-1183760-1735213767-0001-2 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable * tomas@tuxteam.de [2024-12-26 12:25]: > On Thu, Dec 26, 2024 at 11:53:38AM +0300, Jean Louis wrote: >=20 > [Hashes vs. ...] >=20 > > > This is why different data structures exist. There is no "best"; only= "best > > > for..." > >=20 > > As is written in manual that hash table is very fast, I believe yes, > > though my search may not be. >=20 > Never believe such a thing out-of-context. Sure, that is why I am asking possibly those who made better experience with it. (measure-time (seq-filter (lambda (item) (when (string-match "MY QUERY" (cd= r item)) item)) my-alist)) =E2=9E=9C "0.004352" (measure-time (seq-filter (lambda (item) (when (string-match "MY QUERY" ite= m) item)) my-list)) =E2=9E=9C "0.004252" (measure-time (delq nil (mapcar (lambda (key) (when (string-match "MY QUERY= " (gethash key my-hash)) key)) (hash-table-keys my-hash)))) =E2=9E=9C "0.00= 4506" I have measured list, alist, hash, with bit different functions, for my 1300 page information pieces, I will keep measuring. Though small inconceivable differences matter not. I think hash method is so much easier, I can keep it all in the single hash, and only query those keys which are relevant: Following I would not query: (puthash (intern (format "id-%d-name" id)) name wrs-search-hash) (puthash (intern (format "id-%d-link" id)) link wrs-search-hash) (puthash (intern (format "id-%d-description" id)) description wrs-searc= h-hash) (puthash (intern (format "id-%d-keywords" id)) keywords wrs-search-hash= ))) I would query only those which are numeric, for example. Then I would use `id-123-name' to get the name for numeric key 123. > As others have already pointed out, it depends *a lot* on how you are > searching. If your search always involves *exact* matches, hash tables > might be what you need. I will need to use AND, OR, NOT and there will be little different functions to search within the has depending of the combinations allowed. > If these are more complex matches (think, as > a working example: case-insensitive match), "naked" hashes are pretty > useless, since your code might well end up walking all of the entries > to apply the "custom comparison function" anyway. I will only use values to search name, url, description, summary > Sometimes (again, think our "case-insensitive match" example, a hash > still might be useful: you may map the keys appropriately before > hashing (again, in our example: lower case) and search for the equally > mapped search item. Sometimes this reduces precision and you may get > "false positives" -- but then you can refine your sequential search > over a much reduced set (this is, BTW, the same thing hash tables do, > but I disgress). Okay, thanks.=20 > Another context point is size. For small things, hashes haven't an > advantage over alists. I think 1300 x 4 elements is small thing. > Here's a small code snippet. On my machine, and in the current > moon phase, hashes overtake alists at roughly 10 entries: >=20 (let* ((size 10) ; alist, hash size (runs 100000) ; benchmark runs (alist '()) (hash (make-hash-table :test 'eq :size 1549)) (keys (make-vector size nil)) (count size)) ;; setup (while (> count 0) (setq count (1- count)) (let ((sym (intern (format "SYM%04d%03d" (random 10000) count)))) (aset keys count sym) (setq alist (cons (cons sym "foo") alist)) (puthash sym "foo" hash))) ;; run (cons (benchmark runs '(assq (aref keys (random size)) alist)) (benchmark runs '(gethash (aref keys (random size)) hash))))=20 =E2=9E=9C ("Elapsed time: 0.030115s" . "Elapsed time: 0.028604s") I see no much difference, important is there is no human noticeable =F0=9F= =98=90 too long =E2=8F=B1=EF=B8=8F lag, small =E2=9A=96=EF=B8=8F lags are always t= here =F0=9F=91=8D. Okay thanks, I am advancing. --=20 Jean Louis --=_stw1.rcdrun.com-1183760-1735213767-0001-2 Content-Type: application/pgp-signature; name="signature.asc" Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEv9/jXBKLXfDiHl8IErxRIkudxlwFAmdtQsQACgkQErxRIkud xlwPaQf/aSEI5u8G7t8iAj1Bq6UFOJ5lCv979hO+sPn7trcnLoBLBAlhhc6xeUgA l6waQn7nAORmf4fDZ5C5yMnlZngA0EbAtR8Kf7pxdTX06lhcGp2JLiwAkkgGKM0m Z31U+jibLMPBbjvlIhQ3geaMgeXqB9OTnYPRvpSKnyyku90oZGi2sU/d1vVrBKeX 6TEhAvoVK0BRMeXvYL15ZNFW/ivdemcR33W6VT5++b3AUz2L+fY3zYzq8JnJC3xq nntuxCeRDZmj1UBdiOZRmg/JRJ8L4iiMsn2dBiSM/81iwAiCV20MN1bnoCWqI3u1 oCW7DEMFeeB7Bd3VqD0lvMn/es9Bbw== =uLw2 -----END PGP SIGNATURE----- --=_stw1.rcdrun.com-1183760-1735213767-0001-2--