* Which Elisp data structure is fastest for searching? @ 2024-12-25 9:33 Jean Louis 2024-12-26 6:24 ` Joel Reicher 0 siblings, 1 reply; 8+ messages in thread From: Jean Louis @ 2024-12-25 9:33 UTC (permalink / raw) To: Help GNU Emacs It says in Emacs Lisp manual: * Hash Tables:: Very fast lookup-tables. (info "(elisp) Hash Tables") 8 Hash Tables ************* A hash table is a very fast kind of lookup table, somewhat like an alist (*note Association Lists::) in that it maps keys to corresponding values. The above is not so conclusive and I have not done measurements. It says "somewhat like an alist", but is the alist faster in searching through elements or the hash table? I wish to make website search, something I always had before on websites, though at the time I was using Perl and shell tools. Now I am using Emacs Lisp for website applications. Each piece of information in the data structure such as plist, alist, hash, will have: - title - URL - description - summary plain text which is shorter than the actual text I will search all of the pieces of information. With alist that may be easier. With hash, I would need to structure the `value' part again into four different things, not so? What would be better decision? Jean Louis ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Which Elisp data structure is fastest for searching? 2024-12-25 9:33 Which Elisp data structure is fastest for searching? Jean Louis @ 2024-12-26 6:24 ` Joel Reicher 2024-12-26 8:53 ` Jean Louis 0 siblings, 1 reply; 8+ messages in thread From: Joel Reicher @ 2024-12-26 6:24 UTC (permalink / raw) To: Jean Louis; +Cc: Help GNU Emacs Jean Louis <bugs@gnu.support> writes: [...] > A hash table is a very fast kind of lookup table, somewhat like > an alist (*note Association Lists::) in that it maps keys to > corresponding values. > > The above is not so conclusive and I have not done measurements. > > It says "somewhat like an alist", but is the alist faster in > searching through elements or the hash table? It depends on the kind of search you are doing. As an extreme example, imagine your search was not based on the hash table key; you might have to go through every entry in the table to find what you're looking for. This is why different data structures exist. There is no "best"; only "best for..." Regards, - Joel ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Which Elisp data structure is fastest for searching? 2024-12-26 6:24 ` Joel Reicher @ 2024-12-26 8:53 ` Jean Louis 2024-12-26 9:23 ` tomas 2024-12-26 23:03 ` Joel Reicher 0 siblings, 2 replies; 8+ messages in thread From: Jean Louis @ 2024-12-26 8:53 UTC (permalink / raw) To: Joel Reicher; +Cc: Help GNU Emacs * Joel Reicher <joel.reicher@gmail.com> [2024-12-26 09:25]: > > A hash table is a very fast kind of lookup table, somewhat like an alist > > (*note Association Lists::) in that it maps keys to corresponding > > values. > > > > The above is not so conclusive and I have not done measurements. > > > > It says "somewhat like an alist", but is the alist faster in searching > > through elements or the hash table? > > It depends on the kind of search you are doing. As an extreme example, > imagine your search was not based on the hash table key; you might have to > go through every entry in the table to find what you're looking for. > > This is why different data structures exist. There is no "best"; only "best > for..." As is written in manual that hash table is very fast, I believe yes, though my search may not be. I will use following approach: - there will be list or hash with values that are to be searched, those values will have their ID, that information will be prepared for easier searching, like special symbols removed, only words remaining, maybe even small words could be removed - there will be different hash with accurate information about the ID, such as title, URL, description That is approach I know. Then I can give relevant information for website search. I wonder if Emacs can remain in memory keeping to answer HTTP requests, so that I do not load it every time. -- Jean Louis ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Which Elisp data structure is fastest for searching? 2024-12-26 8:53 ` Jean Louis @ 2024-12-26 9:23 ` tomas 2024-12-26 11:49 ` Jean Louis 2024-12-26 23:03 ` Joel Reicher 1 sibling, 1 reply; 8+ messages in thread From: tomas @ 2024-12-26 9:23 UTC (permalink / raw) To: Joel Reicher, Help GNU Emacs [-- Attachment #1: Type: text/plain, Size: 2187 bytes --] On Thu, Dec 26, 2024 at 11:53:38AM +0300, Jean Louis wrote: [Hashes vs. ...] > > This is why different data structures exist. There is no "best"; only "best > > for..." > > As is written in manual that hash table is very fast, I believe yes, > though my search may not be. Never believe such a thing out-of-context. As others have already pointed out, it depends *a lot* on how you are searching. If your search always involves *exact* matches, hash tables might be what you need. If these are more complex matches (think, as a working example: case-insensitive match), "naked" hashes are pretty useless, since your code might well end up walking all of the entries to apply the "custom comparison function" anyway. Sometimes (again, think our "case-insensitive match" example, a hash still might be useful: you may map the keys appropriately before hashing (again, in our example: lower case) and search for the equally mapped search item. Sometimes this reduces precision and you may get "false positives" -- but then you can refine your sequential search over a much reduced set (this is, BTW, the same thing hash tables do, but I disgress). That's, roughly, how databases do their thing. Another context point is size. For small things, hashes haven't an advantage over alists. Here's a small code snippet. On my machine, and in the current moon phase, hashes overtake alists at roughly 10 entries: (let* ((size 10) ; alist, hash size (runs 100000) ; benchmark runs (alist '()) (hash (make-hash-table :test 'eq :size 1549)) (keys (make-vector size nil)) (count size)) ;; setup (while (> count 0) (setq count (1- count)) (let ((sym (intern (format "SYM%04d%03d" (random 10000) count)))) (aset keys count sym) (setq alist (cons (cons sym "foo") alist)) (puthash sym "foo" hash))) ;; run (cons (benchmark runs '(assq (aref keys (random size)) alist)) (benchmark runs '(gethash (aref keys (random size)) hash)))) Tools, jobs and that. If you enter a carpenter's shop, you never ever will find just one tool hanging of the wall. Cheers -- t [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Which Elisp data structure is fastest for searching? 2024-12-26 9:23 ` tomas @ 2024-12-26 11:49 ` Jean Louis 0 siblings, 0 replies; 8+ messages in thread From: Jean Louis @ 2024-12-26 11:49 UTC (permalink / raw) To: tomas; +Cc: Joel Reicher, Help GNU Emacs [-- Attachment #1: Type: text/plain, Size: 3916 bytes --] * tomas@tuxteam.de <tomas@tuxteam.de> [2024-12-26 12:25]: > On Thu, Dec 26, 2024 at 11:53:38AM +0300, Jean Louis wrote: > > [Hashes vs. ...] > > > > This is why different data structures exist. There is no "best"; only "best > > > for..." > > > > As is written in manual that hash table is very fast, I believe yes, > > though my search may not be. > > Never believe such a thing out-of-context. Sure, that is why I am asking possibly those who made better experience with it. (measure-time (seq-filter (lambda (item) (when (string-match "MY QUERY" (cdr item)) item)) my-alist)) ➜ "0.004352" (measure-time (seq-filter (lambda (item) (when (string-match "MY QUERY" item) item)) my-list)) ➜ "0.004252" (measure-time (delq nil (mapcar (lambda (key) (when (string-match "MY QUERY" (gethash key my-hash)) key)) (hash-table-keys my-hash)))) ➜ "0.004506" I have measured list, alist, hash, with bit different functions, for my 1300 page information pieces, I will keep measuring. Though small inconceivable differences matter not. I think hash method is so much easier, I can keep it all in the single hash, and only query those keys which are relevant: Following I would not query: (puthash (intern (format "id-%d-name" id)) name wrs-search-hash) (puthash (intern (format "id-%d-link" id)) link wrs-search-hash) (puthash (intern (format "id-%d-description" id)) description wrs-search-hash) (puthash (intern (format "id-%d-keywords" id)) keywords wrs-search-hash))) I would query only those which are numeric, for example. Then I would use `id-123-name' to get the name for numeric key 123. > As others have already pointed out, it depends *a lot* on how you are > searching. If your search always involves *exact* matches, hash tables > might be what you need. I will need to use AND, OR, NOT and there will be little different functions to search within the has depending of the combinations allowed. > If these are more complex matches (think, as > a working example: case-insensitive match), "naked" hashes are pretty > useless, since your code might well end up walking all of the entries > to apply the "custom comparison function" anyway. I will only use values to search name, url, description, summary > Sometimes (again, think our "case-insensitive match" example, a hash > still might be useful: you may map the keys appropriately before > hashing (again, in our example: lower case) and search for the equally > mapped search item. Sometimes this reduces precision and you may get > "false positives" -- but then you can refine your sequential search > over a much reduced set (this is, BTW, the same thing hash tables do, > but I disgress). Okay, thanks. > Another context point is size. For small things, hashes haven't an > advantage over alists. I think 1300 x 4 elements is small thing. > Here's a small code snippet. On my machine, and in the current > moon phase, hashes overtake alists at roughly 10 entries: > (let* ((size 10) ; alist, hash size (runs 100000) ; benchmark runs (alist '()) (hash (make-hash-table :test 'eq :size 1549)) (keys (make-vector size nil)) (count size)) ;; setup (while (> count 0) (setq count (1- count)) (let ((sym (intern (format "SYM%04d%03d" (random 10000) count)))) (aset keys count sym) (setq alist (cons (cons sym "foo") alist)) (puthash sym "foo" hash))) ;; run (cons (benchmark runs '(assq (aref keys (random size)) alist)) (benchmark runs '(gethash (aref keys (random size)) hash)))) ➜ ("Elapsed time: 0.030115s" . "Elapsed time: 0.028604s") I see no much difference, important is there is no human noticeable 😐 too long ⏱️ lag, small ⚖️ lags are always there 👍. Okay thanks, I am advancing. -- Jean Louis [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Which Elisp data structure is fastest for searching? 2024-12-26 8:53 ` Jean Louis 2024-12-26 9:23 ` tomas @ 2024-12-26 23:03 ` Joel Reicher 2024-12-27 11:16 ` Jean Louis 1 sibling, 1 reply; 8+ messages in thread From: Joel Reicher @ 2024-12-26 23:03 UTC (permalink / raw) To: Help GNU Emacs Jean Louis <bugs@gnu.support> writes: [...] > As is written in manual that hash table is very fast, I believe > yes, though my search may not be. > > I will use following approach: > > - there will be list or hash with values that are to be > searched, > those values will have their ID, that information will be > prepared for easier searching, like special symbols removed, > only words remaining, maybe even small words could be removed This makes me think you don't yet know what your equality predicate is, and whether your search space is totally or partially ordered, or unordered. I don't think you will be able to choose a data structure until you have these things figured out. > - there will be different hash with accurate information about > the ID, > such as title, URL, description I am not sure you are using the word "hash" the same way "hash table" does, but if I'm right about the above, it doesn't matter, because you must figure those things out first. > That is approach I know. Then I can give relevant information > for website search. > > I wonder if Emacs can remain in memory keeping to answer HTTP > requests, so that I do not load it every time. Yes it can, but I think worrying about that now is a bit like worrying about how long to age the cheese when you haven't even milked the cow. Regards, - Joel ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Which Elisp data structure is fastest for searching? 2024-12-26 23:03 ` Joel Reicher @ 2024-12-27 11:16 ` Jean Louis 2024-12-28 3:32 ` Joel Reicher 0 siblings, 1 reply; 8+ messages in thread From: Jean Louis @ 2024-12-27 11:16 UTC (permalink / raw) To: Joel Reicher; +Cc: Help GNU Emacs * Joel Reicher <joel.reicher@gmail.com> [2024-12-27 02:06]: > Jean Louis <bugs@gnu.support> writes: > > [...] > > > As is written in manual that hash table is very fast, I believe yes, > > though my search may not be. > > > > I will use following approach: > > > > - there will be list or hash with values that are to be searched, > > those values will have their ID, that information will be prepared for > > easier searching, like special symbols removed, only words remaining, > > maybe even small words could be removed > > This makes me think you don't yet know what your equality predicate is, and > whether your search space is totally or partially ordered, or > unordered. No, I have no idea to what equality predicate refers. You see, for searching it is better to remove new lines, and multiple spaces, emoji, symbols, any of characters that are not letters and numbers. So the search over such strings is easier to accomplish. Each string will have its ID. By using ID I can then find name, URL, description within the hash table. And actually, all of the information can be stored in the single hash. id-string -- contains "93206|We strive to bring customers" So searching would be inside of id-string keys. id-string contains id. then once found, I can extract other keys: id-name, it-title, id-description to show it as list of links. > I don't think you will be able to choose a data structure until you have > these things figured out. Yes, but I am on the verge. I have already list, alist and hash and I think hash is still easiest to do for reason of how to choose those keys. > > - there will be different hash with accurate information about the ID, > > such as title, URL, description > > I am not sure you are using the word "hash" the same way "hash table" does, > but if I'm right about the above, it doesn't matter, because you must figure > those things out first. Like this: | Key | Value | |----------------+-------------------------------------| | id-string | "123|name link description summary" | | id-name | "name" | | id-link | "url" | | id-description | "description" | That means all the search will be through values, and only keys named "-string". Once found, ID is taken from string, and then remaining values displayed. > > That is approach I know. Then I can give relevant information for > > website search. > > > > I wonder if Emacs can remain in memory keeping to answer HTTP requests, > > so that I do not load it every time. > > Yes it can, but I think worrying about that now is a bit like worrying about > how long to age the cheese when you haven't even milked the cow. Hahahahhaha -- Jean Louis ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Which Elisp data structure is fastest for searching? 2024-12-27 11:16 ` Jean Louis @ 2024-12-28 3:32 ` Joel Reicher 0 siblings, 0 replies; 8+ messages in thread From: Joel Reicher @ 2024-12-28 3:32 UTC (permalink / raw) To: Help GNU Emacs Jean Louis <bugs@gnu.support> writes: > * Joel Reicher <joel.reicher@gmail.com> [2024-12-27 02:06]: [...] >> This makes me think you don't yet know what your equality >> predicate is, and whether your search space is totally or >> partially ordered, or unordered. > > No, I have no idea to what equality predicate refers. I'm happy to continue this if you like, but I think we should take it off this list as it's not really related Emacs or elisp. Regards, - Joel ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-12-28 3:32 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-12-25 9:33 Which Elisp data structure is fastest for searching? Jean Louis 2024-12-26 6:24 ` Joel Reicher 2024-12-26 8:53 ` Jean Louis 2024-12-26 9:23 ` tomas 2024-12-26 11:49 ` Jean Louis 2024-12-26 23:03 ` Joel Reicher 2024-12-27 11:16 ` Jean Louis 2024-12-28 3:32 ` Joel Reicher
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).