* Which Elisp data structure is fastest for searching?
@ 2024-12-25 9:33 Jean Louis
2024-12-26 6:24 ` Joel Reicher
0 siblings, 1 reply; 6+ messages in thread
From: Jean Louis @ 2024-12-25 9:33 UTC (permalink / raw)
To: Help GNU Emacs
It says in Emacs Lisp manual:
* Hash Tables:: Very fast lookup-tables.
(info "(elisp) Hash Tables")
8 Hash Tables
*************
A hash table is a very fast kind of lookup table, somewhat like an alist
(*note Association Lists::) in that it maps keys to corresponding
values.
The above is not so conclusive and I have not done measurements.
It says "somewhat like an alist", but is the alist faster in searching
through elements or the hash table?
I wish to make website search, something I always had before on
websites, though at the time I was using Perl and shell tools. Now I
am using Emacs Lisp for website applications.
Each piece of information in the data structure such as plist, alist,
hash, will have:
- title
- URL
- description
- summary plain text which is shorter than the actual text
I will search all of the pieces of information.
With alist that may be easier. With hash, I would need to structure
the `value' part again into four different things, not so?
What would be better decision?
Jean Louis
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Which Elisp data structure is fastest for searching?
2024-12-25 9:33 Which Elisp data structure is fastest for searching? Jean Louis
@ 2024-12-26 6:24 ` Joel Reicher
2024-12-26 8:53 ` Jean Louis
0 siblings, 1 reply; 6+ messages in thread
From: Joel Reicher @ 2024-12-26 6:24 UTC (permalink / raw)
To: Jean Louis; +Cc: Help GNU Emacs
Jean Louis <bugs@gnu.support> writes:
[...]
> A hash table is a very fast kind of lookup table, somewhat like
> an alist (*note Association Lists::) in that it maps keys to
> corresponding values.
>
> The above is not so conclusive and I have not done measurements.
>
> It says "somewhat like an alist", but is the alist faster in
> searching through elements or the hash table?
It depends on the kind of search you are doing. As an extreme
example, imagine your search was not based on the hash table key;
you might have to go through every entry in the table to find what
you're looking for.
This is why different data structures exist. There is no "best";
only "best for..."
Regards,
- Joel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Which Elisp data structure is fastest for searching?
2024-12-26 6:24 ` Joel Reicher
@ 2024-12-26 8:53 ` Jean Louis
2024-12-26 9:23 ` tomas
2024-12-26 23:03 ` Joel Reicher
0 siblings, 2 replies; 6+ messages in thread
From: Jean Louis @ 2024-12-26 8:53 UTC (permalink / raw)
To: Joel Reicher; +Cc: Help GNU Emacs
* Joel Reicher <joel.reicher@gmail.com> [2024-12-26 09:25]:
> > A hash table is a very fast kind of lookup table, somewhat like an alist
> > (*note Association Lists::) in that it maps keys to corresponding
> > values.
> >
> > The above is not so conclusive and I have not done measurements.
> >
> > It says "somewhat like an alist", but is the alist faster in searching
> > through elements or the hash table?
>
> It depends on the kind of search you are doing. As an extreme example,
> imagine your search was not based on the hash table key; you might have to
> go through every entry in the table to find what you're looking for.
>
> This is why different data structures exist. There is no "best"; only "best
> for..."
As is written in manual that hash table is very fast, I believe yes,
though my search may not be.
I will use following approach:
- there will be list or hash with values that are to be searched,
those values will have their ID, that information will be prepared
for easier searching, like special symbols removed, only words
remaining, maybe even small words could be removed
- there will be different hash with accurate information about the ID,
such as title, URL, description
That is approach I know. Then I can give relevant information for
website search.
I wonder if Emacs can remain in memory keeping to answer HTTP
requests, so that I do not load it every time.
--
Jean Louis
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Which Elisp data structure is fastest for searching?
2024-12-26 8:53 ` Jean Louis
@ 2024-12-26 9:23 ` tomas
2024-12-26 11:49 ` Jean Louis
2024-12-26 23:03 ` Joel Reicher
1 sibling, 1 reply; 6+ messages in thread
From: tomas @ 2024-12-26 9:23 UTC (permalink / raw)
To: Joel Reicher, Help GNU Emacs
[-- Attachment #1: Type: text/plain, Size: 2187 bytes --]
On Thu, Dec 26, 2024 at 11:53:38AM +0300, Jean Louis wrote:
[Hashes vs. ...]
> > This is why different data structures exist. There is no "best"; only "best
> > for..."
>
> As is written in manual that hash table is very fast, I believe yes,
> though my search may not be.
Never believe such a thing out-of-context.
As others have already pointed out, it depends *a lot* on how you are
searching. If your search always involves *exact* matches, hash tables
might be what you need. If these are more complex matches (think, as
a working example: case-insensitive match), "naked" hashes are pretty
useless, since your code might well end up walking all of the entries
to apply the "custom comparison function" anyway.
Sometimes (again, think our "case-insensitive match" example, a hash
still might be useful: you may map the keys appropriately before
hashing (again, in our example: lower case) and search for the equally
mapped search item. Sometimes this reduces precision and you may get
"false positives" -- but then you can refine your sequential search
over a much reduced set (this is, BTW, the same thing hash tables do,
but I disgress).
That's, roughly, how databases do their thing.
Another context point is size. For small things, hashes haven't an
advantage over alists.
Here's a small code snippet. On my machine, and in the current
moon phase, hashes overtake alists at roughly 10 entries:
(let* ((size 10) ; alist, hash size
(runs 100000) ; benchmark runs
(alist '())
(hash (make-hash-table :test 'eq :size 1549))
(keys (make-vector size nil))
(count size))
;; setup
(while (> count 0)
(setq count (1- count))
(let ((sym (intern (format "SYM%04d%03d" (random 10000) count))))
(aset keys count sym)
(setq alist (cons (cons sym "foo") alist))
(puthash sym "foo" hash)))
;; run
(cons
(benchmark runs '(assq (aref keys (random size)) alist))
(benchmark runs '(gethash (aref keys (random size)) hash))))
Tools, jobs and that. If you enter a carpenter's shop, you never
ever will find just one tool hanging of the wall.
Cheers
--
t
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Which Elisp data structure is fastest for searching?
2024-12-26 9:23 ` tomas
@ 2024-12-26 11:49 ` Jean Louis
0 siblings, 0 replies; 6+ messages in thread
From: Jean Louis @ 2024-12-26 11:49 UTC (permalink / raw)
To: tomas; +Cc: Joel Reicher, Help GNU Emacs
[-- Attachment #1: Type: text/plain, Size: 3916 bytes --]
* tomas@tuxteam.de <tomas@tuxteam.de> [2024-12-26 12:25]:
> On Thu, Dec 26, 2024 at 11:53:38AM +0300, Jean Louis wrote:
>
> [Hashes vs. ...]
>
> > > This is why different data structures exist. There is no "best"; only "best
> > > for..."
> >
> > As is written in manual that hash table is very fast, I believe yes,
> > though my search may not be.
>
> Never believe such a thing out-of-context.
Sure, that is why I am asking possibly those who made better
experience with it.
(measure-time (seq-filter (lambda (item) (when (string-match "MY QUERY" (cdr item)) item)) my-alist)) ➜ "0.004352"
(measure-time (seq-filter (lambda (item) (when (string-match "MY QUERY" item) item)) my-list)) ➜ "0.004252"
(measure-time (delq nil (mapcar (lambda (key) (when (string-match "MY QUERY" (gethash key my-hash)) key)) (hash-table-keys my-hash)))) ➜ "0.004506"
I have measured list, alist, hash, with bit different functions, for
my 1300 page information pieces, I will keep measuring. Though small
inconceivable differences matter not.
I think hash method is so much easier, I can keep it all in the single
hash, and only query those keys which are relevant:
Following I would not query:
(puthash (intern (format "id-%d-name" id)) name wrs-search-hash)
(puthash (intern (format "id-%d-link" id)) link wrs-search-hash)
(puthash (intern (format "id-%d-description" id)) description wrs-search-hash)
(puthash (intern (format "id-%d-keywords" id)) keywords wrs-search-hash)))
I would query only those which are numeric, for example. Then I would
use `id-123-name' to get the name for numeric key 123.
> As others have already pointed out, it depends *a lot* on how you are
> searching. If your search always involves *exact* matches, hash tables
> might be what you need.
I will need to use AND, OR, NOT and there will be little different
functions to search within the has depending of the combinations
allowed.
> If these are more complex matches (think, as
> a working example: case-insensitive match), "naked" hashes are pretty
> useless, since your code might well end up walking all of the entries
> to apply the "custom comparison function" anyway.
I will only use values to search name, url, description, summary
> Sometimes (again, think our "case-insensitive match" example, a hash
> still might be useful: you may map the keys appropriately before
> hashing (again, in our example: lower case) and search for the equally
> mapped search item. Sometimes this reduces precision and you may get
> "false positives" -- but then you can refine your sequential search
> over a much reduced set (this is, BTW, the same thing hash tables do,
> but I disgress).
Okay, thanks.
> Another context point is size. For small things, hashes haven't an
> advantage over alists.
I think 1300 x 4 elements is small thing.
> Here's a small code snippet. On my machine, and in the current
> moon phase, hashes overtake alists at roughly 10 entries:
>
(let* ((size 10) ; alist, hash size
(runs 100000) ; benchmark runs
(alist '())
(hash (make-hash-table :test 'eq :size 1549))
(keys (make-vector size nil))
(count size))
;; setup
(while (> count 0)
(setq count (1- count))
(let ((sym (intern (format "SYM%04d%03d" (random 10000) count))))
(aset keys count sym)
(setq alist (cons (cons sym "foo") alist))
(puthash sym "foo" hash)))
;; run
(cons
(benchmark runs '(assq (aref keys (random size)) alist))
(benchmark runs '(gethash (aref keys (random size)) hash))))
➜ ("Elapsed time: 0.030115s" . "Elapsed time: 0.028604s")
I see no much difference, important is there is no human noticeable 😐
too long ⏱️ lag, small ⚖️ lags are always there 👍.
Okay thanks, I am advancing.
--
Jean Louis
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Which Elisp data structure is fastest for searching?
2024-12-26 8:53 ` Jean Louis
2024-12-26 9:23 ` tomas
@ 2024-12-26 23:03 ` Joel Reicher
1 sibling, 0 replies; 6+ messages in thread
From: Joel Reicher @ 2024-12-26 23:03 UTC (permalink / raw)
To: Help GNU Emacs
Jean Louis <bugs@gnu.support> writes:
[...]
> As is written in manual that hash table is very fast, I believe
> yes, though my search may not be.
>
> I will use following approach:
>
> - there will be list or hash with values that are to be
> searched,
> those values will have their ID, that information will be
> prepared for easier searching, like special symbols removed,
> only words remaining, maybe even small words could be removed
This makes me think you don't yet know what your equality
predicate is, and whether your search space is totally or
partially ordered, or unordered.
I don't think you will be able to choose a data structure until
you have these things figured out.
> - there will be different hash with accurate information about
> the ID,
> such as title, URL, description
I am not sure you are using the word "hash" the same way "hash
table" does, but if I'm right about the above, it doesn't matter,
because you must figure those things out first.
> That is approach I know. Then I can give relevant information
> for website search.
>
> I wonder if Emacs can remain in memory keeping to answer HTTP
> requests, so that I do not load it every time.
Yes it can, but I think worrying about that now is a bit like
worrying about how long to age the cheese when you haven't even
milked the cow.
Regards,
- Joel
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-12-26 23:03 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-25 9:33 Which Elisp data structure is fastest for searching? Jean Louis
2024-12-26 6:24 ` Joel Reicher
2024-12-26 8:53 ` Jean Louis
2024-12-26 9:23 ` tomas
2024-12-26 11:49 ` Jean Louis
2024-12-26 23:03 ` Joel Reicher
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.