unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
From: <tomas@tuxteam.de>
To: Joel Reicher <joel.reicher@gmail.com>,
	Help GNU Emacs <help-gnu-emacs@gnu.org>
Subject: Re: Which Elisp data structure is fastest for searching?
Date: Thu, 26 Dec 2024 10:23:40 +0100	[thread overview]
Message-ID: <Z20gnOmHQv08x0FB@tuxteam.de> (raw)
In-Reply-To: <Z20ZksL7H4P0tRA0@lco2>

[-- Attachment #1: Type: text/plain, Size: 2187 bytes --]

On Thu, Dec 26, 2024 at 11:53:38AM +0300, Jean Louis wrote:

[Hashes vs. ...]

> > This is why different data structures exist. There is no "best"; only "best
> > for..."
> 
> As is written in manual that hash table is very fast, I believe yes,
> though my search may not be.

Never believe such a thing out-of-context.

As others have already pointed out, it depends *a lot* on how you are
searching. If your search always involves *exact* matches, hash tables
might be what you need. If these are more complex matches (think, as
a working example: case-insensitive match), "naked" hashes are pretty
useless, since your code might well end up walking all of the entries
to apply the "custom comparison function" anyway.

Sometimes (again, think our "case-insensitive match" example, a hash
still might be useful: you may map the keys appropriately before
hashing (again, in our example: lower case) and search for the equally
mapped search item. Sometimes this reduces precision and you may get
"false positives" -- but then you can refine your sequential search
over a much reduced set (this is, BTW, the same thing hash tables do,
but I disgress).

That's, roughly, how databases do their thing.

Another context point is size. For small things, hashes haven't an
advantage over alists.

Here's a small code snippet. On my machine, and in the current
moon phase, hashes overtake alists at roughly 10 entries:

(let* ((size 10) ; alist, hash size
       (runs 100000) ; benchmark runs
       (alist '())
       (hash (make-hash-table :test 'eq :size 1549))
       (keys (make-vector size nil))
       (count size))
  ;; setup
  (while (> count 0)
    (setq count (1- count))
    (let ((sym (intern (format "SYM%04d%03d" (random 10000) count))))
      (aset keys count sym)
      (setq alist (cons (cons sym "foo") alist))
      (puthash sym "foo" hash)))
  ;; run
  (cons
   (benchmark runs '(assq (aref keys (random size)) alist))
   (benchmark runs '(gethash (aref keys (random size)) hash))))

Tools, jobs and that. If you enter a carpenter's shop, you never
ever will find just one tool hanging of the wall.

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

  reply	other threads:[~2024-12-26  9:23 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-25  9:33 Which Elisp data structure is fastest for searching? Jean Louis
2024-12-26  6:24 ` Joel Reicher
2024-12-26  8:53   ` Jean Louis
2024-12-26  9:23     ` tomas [this message]
2024-12-26 11:49       ` Jean Louis
2024-12-26 23:03     ` Joel Reicher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z20gnOmHQv08x0FB@tuxteam.de \
    --to=tomas@tuxteam.de \
    --cc=help-gnu-emacs@gnu.org \
    --cc=joel.reicher@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).