From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Newsgroups: gmane.emacs.help Subject: Re: Which Elisp data structure is fastest for searching? Date: Thu, 26 Dec 2024 10:23:40 +0100 Message-ID: References: <86wmfnc66h.fsf@gmail.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="gqxkuxiBi2U+qzLx" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="24704"; mail-complaints-to="usenet@ciao.gmane.io" To: Joel Reicher , Help GNU Emacs Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Thu Dec 26 10:24:18 2024 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tQk6M-0006Kx-1X for geh-help-gnu-emacs@m.gmane-mx.org; Thu, 26 Dec 2024 10:24:18 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tQk60-0001Dd-Pb; Thu, 26 Dec 2024 04:23:56 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tQk5y-0001Ct-LV for help-gnu-emacs@gnu.org; Thu, 26 Dec 2024 04:23:54 -0500 Original-Received: from mail.tuxteam.de ([5.199.139.25]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tQk5w-00078K-Q3 for help-gnu-emacs@gnu.org; Thu, 26 Dec 2024 04:23:54 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=tuxteam.de; s=mail; h=From:In-Reply-To:Content-Type:MIME-Version:References:Message-ID: Subject:To:Date:Sender:Reply-To:Cc:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=XGIDuGMj6UKDDynHxxJcOTIU/T91faxzonMJ8hbbK90=; b=l4yIVRz9fGQ6rQBev4QFXJNIpg P+lfCyfYWUg6Proi1e2hzyo2dEc64B9vov9bj/7CEBZ5UKwyDWsmj05Wbs4pzqXMM8OSO2OeByVGx X+XHC/C9I30AXASR18lmhNiaBSmO8tkDDaI/i+BhGwYbU4mOBI6jglP5B0Ck2sK2W4s/wllJNWP83 +0QD2DrhDiwXqR821kAsRYpR+yEuTJt6TcBBJow1G5xLHHQ7tl+FTxpr3gkza9yHBUtQryrj8qqfg 8afe2zfx4Ew/RlIx/pVBoCTrtqYBtzEpkQZ82gwytVJXoGVSEl6bGn2g1k4i76eFO32jQXc9PHDp/ CoxS4J6w==; Original-Received: from tomas by mail.tuxteam.de with local (Exim 4.94.2) (envelope-from ) id 1tQk5k-0004WC-Sn; Thu, 26 Dec 2024 10:23:40 +0100 Content-Disposition: inline In-Reply-To: Received-SPF: pass client-ip=5.199.139.25; envelope-from=tomas@tuxteam.de; helo=mail.tuxteam.de X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.help:148991 Archived-At: --gqxkuxiBi2U+qzLx Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Dec 26, 2024 at 11:53:38AM +0300, Jean Louis wrote: [Hashes vs. ...] > > This is why different data structures exist. There is no "best"; only "= best > > for..." >=20 > As is written in manual that hash table is very fast, I believe yes, > though my search may not be. Never believe such a thing out-of-context. As others have already pointed out, it depends *a lot* on how you are searching. If your search always involves *exact* matches, hash tables might be what you need. If these are more complex matches (think, as a working example: case-insensitive match), "naked" hashes are pretty useless, since your code might well end up walking all of the entries to apply the "custom comparison function" anyway. Sometimes (again, think our "case-insensitive match" example, a hash still might be useful: you may map the keys appropriately before hashing (again, in our example: lower case) and search for the equally mapped search item. Sometimes this reduces precision and you may get "false positives" -- but then you can refine your sequential search over a much reduced set (this is, BTW, the same thing hash tables do, but I disgress). That's, roughly, how databases do their thing. Another context point is size. For small things, hashes haven't an advantage over alists. Here's a small code snippet. On my machine, and in the current moon phase, hashes overtake alists at roughly 10 entries: (let* ((size 10) ; alist, hash size (runs 100000) ; benchmark runs (alist '()) (hash (make-hash-table :test 'eq :size 1549)) (keys (make-vector size nil)) (count size)) ;; setup (while (> count 0) (setq count (1- count)) (let ((sym (intern (format "SYM%04d%03d" (random 10000) count)))) (aset keys count sym) (setq alist (cons (cons sym "foo") alist)) (puthash sym "foo" hash))) ;; run (cons (benchmark runs '(assq (aref keys (random size)) alist)) (benchmark runs '(gethash (aref keys (random size)) hash)))) Tools, jobs and that. If you enter a carpenter's shop, you never ever will find just one tool hanging of the wall. Cheers --=20 t --gqxkuxiBi2U+qzLx Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iF0EABECAB0WIQRp53liolZD6iXhAoIFyCz1etHaRgUCZ20glQAKCRAFyCz1etHa RglGAJ4wNfzrJ/WMJdlN4i5ZHZPRsL56jwCfWx5m6SduZRHI7pqlju3aTUWrP+s= =t1lm -----END PGP SIGNATURE----- --gqxkuxiBi2U+qzLx--