unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* van Emde Boas hash.
@ 2009-11-19 12:04 A. Soare
  2009-11-19 15:23 ` Stefan Monnier
  0 siblings, 1 reply; 15+ messages in thread
From: A. Soare @ 2009-11-19 12:04 UTC (permalink / raw)
  To: Emacs   Dev  [emacs-devel]



I am going to implement van Emde Boas. Do you consider that this hash method could find some useful applications in emacs?

If so, I will try to implement it with Emacs Lisp data structures...




____________________________________________________

Gagnez un séjour au Maroc en découvrant les sketchs désopilants des ReVoila sur http://www.lesrevoila.fr/







^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: van Emde Boas hash.
  2009-11-19 12:04 A. Soare
@ 2009-11-19 15:23 ` Stefan Monnier
  2009-11-19 16:44   ` Deniz Dogan
  2009-11-27  9:14   ` A Soare
  0 siblings, 2 replies; 15+ messages in thread
From: Stefan Monnier @ 2009-11-19 15:23 UTC (permalink / raw)
  To: alinsoar; +Cc: Emacs Dev [emacs-devel]

> I am going to implement van Emde Boas. Do you consider that this hash
> method could find some useful applications in Emacs?

I have no idea what it is,


        Stefan




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: van Emde Boas hash.
  2009-11-19 15:23 ` Stefan Monnier
@ 2009-11-19 16:44   ` Deniz Dogan
  2009-11-27  9:14   ` A Soare
  1 sibling, 0 replies; 15+ messages in thread
From: Deniz Dogan @ 2009-11-19 16:44 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: alinsoar, Emacs Dev [emacs-devel]

2009/11/19 Stefan Monnier <monnier@iro.umontreal.ca>:
>> I am going to implement van Emde Boas. Do you consider that this hash
>> method could find some useful applications in Emacs?
>
> I have no idea what it is,
>
>
>        Stefan
>
>
>

http://en.wikipedia.org/wiki/Van_Emde_Boas_tree
Summary: Associative array at O(log m) for "all operations", where "m"
is the number of bits used for the integer keys.

-- 
Deniz Dogan




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: van Emde Boas hash.
@ 2009-11-20  6:41 A. Soare
  2009-11-23  4:06 ` tomas
  0 siblings, 1 reply; 15+ messages in thread
From: A. Soare @ 2009-11-20  6:41 UTC (permalink / raw)
  To: Deniz Dogan; +Cc: Stefan Monnier, Emacs Dev [emacs-devel]


Van Emde Boas arrays are hashed arrays (of numbers for example), which have the property that all operations are realized into a constant number of steps.

For example, for an array of length 2^32, the complexity is O(0), and the number of steps is less than log_2 (32) = "maximum 5 steps" for every operation.

The operations are:

* insert a new number
* delete a number
* seek for a number
* find the successor of a number
* find the predecessor of a number



____________________________________________________

Gagnez un séjour au Maroc en découvrant les sketchs désopilants des ReVoila sur http://www.lesrevoila.fr/







^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: van Emde Boas hash.
@ 2009-11-22 12:35 A. Soare
  2009-11-23  2:25 ` Stefan Monnier
  0 siblings, 1 reply; 15+ messages in thread
From: A. Soare @ 2009-11-22 12:35 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Emacs   Dev  [emacs-devel]


What algorithm uses emacs for hashing the obarrays of symbols ?





____________________________________________________

Gagnez un séjour au Maroc en découvrant les sketchs désopilants des ReVoila sur http://www.lesrevoila.fr/







^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: van Emde Boas hash.
  2009-11-22 12:35 A. Soare
@ 2009-11-23  2:25 ` Stefan Monnier
  0 siblings, 0 replies; 15+ messages in thread
From: Stefan Monnier @ 2009-11-23  2:25 UTC (permalink / raw)
  To: alinsoar; +Cc: Emacs Dev [emacs-devel]

> What algorithm uses emacs for hashing the obarrays of symbols ?

Read the source: C-h f intern RET then click on top right to get to the
source code, ...

It's a hash table with simple haqshing into buckets that contain linked
lists of symbols.


        Stefan





^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: van Emde Boas hash.
  2009-11-20  6:41 A. Soare
@ 2009-11-23  4:06 ` tomas
  2009-11-23 14:39   ` A Soare
  0 siblings, 1 reply; 15+ messages in thread
From: tomas @ 2009-11-23  4:06 UTC (permalink / raw)
  To: emacs-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, Nov 20, 2009 at 07:41:15AM +0100, A. Soare wrote:
> 
> Van Emde Boas arrays are hashed arrays [...]
> 
> For example, for an array of length 2^32, the complexity is O(0), and
> the number of steps is less than log_2 (32) = "maximum 5 steps" for every operation.
> 
> The operations are:
> 
> * insert a new number
> * delete a number
> * seek for a number
> * find the successor of a number
> * find the predecessor of a number

If  I understood the Wikipedia page correctly (thnks, Deniz), you
can find "next" and "previous" to a number not in the hash? Then
they look like a good candidate to represent markers and boundaries of
overlays, no?

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFLCgpZBcgs9XrR2kYRAvh6AJ9TxH4v/4jFSN90gK/X/QHgJQuf/wCfUYYU
mZjOWuTH3spYQJWvZ3HtWzg=
=7Ccl
-----END PGP SIGNATURE-----




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: van Emde Boas hash.
@ 2009-11-23 14:04 A. Soare
  2009-11-23 16:43 ` Stefan Monnier
  0 siblings, 1 reply; 15+ messages in thread
From: A. Soare @ 2009-11-23 14:04 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Emacs   Dev  [emacs-devel]


> > What algorithm uses emacs for hashing the obarrays of symbols ?
> 
> Read the source: C-h f intern RET then click on top right to get to the
> source code, ...
> 
> It's a hash table with simple haqshing into buckets that contain linked
> lists of symbols.

As far as I understand, the algorthm is so:

oblookup ( obarray, sym )

 ;; hash is an integer, the index of a bucket that may contain sym
 hash = hash_string (sym)
 ;; bucket is a vector. obarray is a vector of many buckets
 bucket = obarray [hash]
 ;; search sym in its bucket using a naive search
 FOR every symbol S in bucket, check whether S has the same name
 as sym. If so, return S. If no symbol matches, returns the hash.


Even if it is very unlikely that we can find many symbols into a bucket, the algorithm does not require constant time for every search.

Hashing using van Emde Boas requires in the worst case a time of log_2(N), where N is the length of the largest symbol name in obarray.




Alin.



____________________________________________________

Derniers jours pour remporter le séjour au Maroc sur http://www.lesrevoila.fr/ 







^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: van Emde Boas hash.
  2009-11-23  4:06 ` tomas
@ 2009-11-23 14:39   ` A Soare
  0 siblings, 0 replies; 15+ messages in thread
From: A Soare @ 2009-11-23 14:39 UTC (permalink / raw)
  To: emacs-devel



> If  I understood the Wikipedia page correctly (thnks, Deniz), you
> can find "next" and "previous" to a number not in the hash?

When looking for the next of a number N, it will return the next/prev number
from the array greater than N. Does not matter whether N belongs or not to the
array.

> Then
> they look like a good candidate to represent markers and boundaries of
> overlays, no?

I do not understand. How are the markers and boundaries represented ?

Alin.


PS:



I post again this message, because I discovered http://post.gmane.org/, which is
very nice for reading the news groups.

Emacs can be used convenient to read/post to emacs devel? Can somebody help me
to confugire gnus or something else for?








^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: van Emde Boas hash.
  2009-11-23 14:04 van Emde Boas hash A. Soare
@ 2009-11-23 16:43 ` Stefan Monnier
  2009-11-23 16:57   ` David Kastrup
  2009-11-27  9:04   ` A Soare
  0 siblings, 2 replies; 15+ messages in thread
From: Stefan Monnier @ 2009-11-23 16:43 UTC (permalink / raw)
  To: alinsoar; +Cc: Emacs Dev [emacs-devel]

> As far as I understand, the algorthm is so:

> oblookup ( obarray, sym )

>  ;; hash is an integer, the index of a bucket that may contain sym
>  hash = hash_string (sym)
>  ;; bucket is a vector. obarray is a vector of many buckets
>  bucket = obarray [hash]
>  ;; search sym in its bucket using a naive search
>  FOR every symbol S in bucket, check whether S has the same name
>  as sym. If so, return S. If no symbol matches, returns the hash.


> Even if it is very unlikely that we can find many symbols into
> a bucket, the algorithm does not require constant time for
> every search.

Yes, it's a very simple hashing algorithm.  I'm sure we can come up with
something more efficient.  Then again, I haven't seen any indication
that this is a performance problem.


        Stefan




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: van Emde Boas hash.
  2009-11-23 16:43 ` Stefan Monnier
@ 2009-11-23 16:57   ` David Kastrup
  2009-11-27  9:04   ` A Soare
  1 sibling, 0 replies; 15+ messages in thread
From: David Kastrup @ 2009-11-23 16:57 UTC (permalink / raw)
  To: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> As far as I understand, the algorthm is so:
>
>> oblookup ( obarray, sym )
>
>>  ;; hash is an integer, the index of a bucket that may contain sym
>>  hash = hash_string (sym)
>>  ;; bucket is a vector. obarray is a vector of many buckets
>>  bucket = obarray [hash]
>>  ;; search sym in its bucket using a naive search
>>  FOR every symbol S in bucket, check whether S has the same name
>>  as sym. If so, return S. If no symbol matches, returns the hash.
>
>
>> Even if it is very unlikely that we can find many symbols into
>> a bucket, the algorithm does not require constant time for
>> every search.
>
> Yes, it's a very simple hashing algorithm.  I'm sure we can come up
> with something more efficient.  Then again, I haven't seen any
> indication that this is a performance problem.

I should think that it makes up a sizeable portion of byte-recompiling
and autoloading, both of which are not often benchmarked.  But it isn't
something which you commonly do on data sets in a loop.  Scanning files
is done just once usually.

-- 
David Kastrup





^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: van Emde Boas hash.
  2009-11-23 16:43 ` Stefan Monnier
  2009-11-23 16:57   ` David Kastrup
@ 2009-11-27  9:04   ` A Soare
  1 sibling, 0 replies; 15+ messages in thread
From: A Soare @ 2009-11-27  9:04 UTC (permalink / raw)
  To: emacs-devel

> Yes, it's a very simple hashing algorithm.  I'm sure we can come up with
> something more efficient.  Then again, I haven't seen any indication
> that this is a performance problem.

I think that the lists can be hashed using van Emde Boas trees, and this would
give a really great improvement to emacs.




Alin





^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: van Emde Boas hash.
  2009-11-19 15:23 ` Stefan Monnier
  2009-11-19 16:44   ` Deniz Dogan
@ 2009-11-27  9:14   ` A Soare
  2009-11-27 16:56     ` Stefan Monnier
  1 sibling, 1 reply; 15+ messages in thread
From: A Soare @ 2009-11-27  9:14 UTC (permalink / raw)
  To: emacs-devel

> I have no idea what it is,
> 
>         Stefan
> 
> 

I see in elisp manual so:

   * Lookup in a hash table is extremely fast for large tables--in
     fact, the time required is essentially _independent_ of how many
     elements are stored in the table.  For smaller tables (a few tens
     of elements) alists may still be faster because hash tables have a
     more-or-less constant overhead.

What algorithm uses the hash of elisp ? It is faster than van Emde?



I did look at the code, and I cannot understand the algorithm from
`make-hash-table'.













^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: van Emde Boas hash.
  2009-11-27  9:14   ` A Soare
@ 2009-11-27 16:56     ` Stefan Monnier
  0 siblings, 0 replies; 15+ messages in thread
From: Stefan Monnier @ 2009-11-27 16:56 UTC (permalink / raw)
  To: A Soare; +Cc: emacs-devel

> I did look at the code, and I cannot understand the algorithm from
> `make-hash-table'.

It's one of the standard hashing schemes, where the hash-table is
resized as it grows, which should hopefully keep the access time
more-or-less constant.


        Stefan




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: van Emde Boas hash.
@ 2009-11-27 18:01 A. Soare
  0 siblings, 0 replies; 15+ messages in thread
From: A. Soare @ 2009-11-27 18:01 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Emacs   Dev  [emacs-devel]


> > I did look at the code, and I cannot understand the algorithm from
> > `make-hash-table'.
> 
> It's one of the standard hashing schemes, where the hash-table is
> resized as it grows, which should hopefully keep the access time
> more-or-less constant.
> 
> 

Seems like red black trees.

This is not better than van Emde B.




Alin




____________________________________________________

Derniers jours pour remporter le séjour au Maroc sur http://www.lesrevoila.fr/ 







^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2009-11-27 18:01 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-23 14:04 van Emde Boas hash A. Soare
2009-11-23 16:43 ` Stefan Monnier
2009-11-23 16:57   ` David Kastrup
2009-11-27  9:04   ` A Soare
  -- strict thread matches above, loose matches on Subject: below --
2009-11-27 18:01 A. Soare
2009-11-22 12:35 A. Soare
2009-11-23  2:25 ` Stefan Monnier
2009-11-20  6:41 A. Soare
2009-11-23  4:06 ` tomas
2009-11-23 14:39   ` A Soare
2009-11-19 12:04 A. Soare
2009-11-19 15:23 ` Stefan Monnier
2009-11-19 16:44   ` Deniz Dogan
2009-11-27  9:14   ` A Soare
2009-11-27 16:56     ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).