bool-vector implementation in the Emacs core

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* bool-vector implementation in the Emacs core
@ 2004-01-21 16:52 Ted Zlatanov
  2004-01-21 17:06 ` Paul Jarc
  2004-01-23  0:16 ` Kim F. Storm
  0 siblings, 2 replies; 13+ messages in thread
From: Ted Zlatanov @ 2004-01-21 16:52 UTC (permalink / raw)


It looks like the current Emacs bool-vector implementation uses true
uncompressed bit vectors (bit strings).  For applications such as
Gnus, where bit ranges can be large, this is unnecessary.  Can there
be either an alternate implementation (bool-inversion-list) or an
alternate internal storage of bool-vectors in the Emacs C core?

I'm thinking, specifically, of either the Gnus ranges or inversion
lists.  Gnus ranges are like this:

(1 (6 . 8) 1000 (1500 . 1600) 2000)

Inversion lists store the number where the toggle occurs, so the
above example would be:

(1 2 6 8 1000 1001 1500 1600 2000 2001) [ or the equivalent vector ]

as an inversion list.  Inversion lists are faster for searching and
most other set operations (I heard about them in the context of
Unicode programming), while ranges are more intuitive to a human
reader.  I prefer inversion lists, personally - if something should
go in the core, it better be fast.  It won't be as fast as bit
strings, of course, but the difference in memory consumption should
be significant.

We're talking about implementations of ranges in the Gnus developer
list right now, and our implementations of ranges is written in
Lisp.  I can implement inversion lists in Lisp as well, but ranges
are such an essential Gnus piece that it seems like seeking core
support for them would be a good idea.

When dealing with vectors, we can preallocate a fixed number of slots
at the end of the vector to allow for growth or shrinkage of the
inversion list.  The empty slots can be contain the repetition of
the last value, or 0 as an out-of-band value.  I think a list-based
implementation is slightly better, though.

I did a search on this topic, but could not find anything.  I hope my
proposal is of interest to people other than Gnus developers (Unicode
coders, for instance).

Thanks
Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bool-vector implementation in the Emacs core
  2004-01-21 16:52 bool-vector implementation in the Emacs core Ted Zlatanov
@ 2004-01-21 17:06 ` Paul Jarc
  2004-01-21 18:35   ` Ted Zlatanov
  2004-01-23  0:16 ` Kim F. Storm
  1 sibling, 1 reply; 13+ messages in thread
From: Paul Jarc @ 2004-01-21 17:06 UTC (permalink / raw)
  Cc: emacs-devel

Ted Zlatanov <tzz@lifelogs.com> wrote:
> (1 (6 . 8) 1000 (1500 . 1600) 2000)
>
> Inversion lists store the number where the toggle occurs, so the
> above example would be:
>
> (1 2 6 8 1000 1001 1500 1600 2000 2001) [ or the equivalent vector ]

ITYM:
(1 2 6 9 1000 1001 1500 1601 2000 2001)


paul

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bool-vector implementation in the Emacs core
  2004-01-21 17:06 ` Paul Jarc
@ 2004-01-21 18:35   ` Ted Zlatanov
  0 siblings, 0 replies; 13+ messages in thread
From: Ted Zlatanov @ 2004-01-21 18:35 UTC (permalink / raw)


On Wed, 21 Jan 2004, prj@po.cwru.edu wrote:

Ted Zlatanov <tzz@lifelogs.com> wrote:
>> (1 (6 . 8) 1000 (1500 . 1600) 2000)
>>
>> Inversion lists store the number where the toggle occurs, so the
>> above example would be:
>>
>> (1 2 6 8 1000 1001 1500 1600 2000 2001) [ or the equivalent
>> vector ]
> 
> ITYM:
> (1 2 6 9 1000 1001 1500 1601 2000 2001)

Right, thank you!

Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bool-vector implementation in the Emacs core
  2004-01-21 16:52 bool-vector implementation in the Emacs core Ted Zlatanov
  2004-01-21 17:06 ` Paul Jarc
@ 2004-01-23  0:16 ` Kim F. Storm
  2004-01-23  0:37   ` Kenichi Handa
  2004-01-23 20:18   ` Ted Zlatanov
  1 sibling, 2 replies; 13+ messages in thread
From: Kim F. Storm @ 2004-01-23  0:16 UTC (permalink / raw)
  Cc: emacs-devel

Ted Zlatanov <tzz@lifelogs.com> writes:

> 
> We're talking about implementations of ranges in the Gnus developer
> list right now, and our implementations of ranges is written in
> Lisp.  I can implement inversion lists in Lisp as well, but ranges
> are such an essential Gnus piece that it seems like seeking core
> support for them would be a good idea.

What kind of C-level support are you looking for?

If you implement inversion lists with lists, I don't really think
lookups or inserts will be significantly faster in C than in
byte-compiled Elisp.

I do think that it would be a good idea to add the bool-vector support
to the emacs lisp core files though, e.g. in subr.el.

What kind of API do you envision?

> 
> When dealing with vectors, we can preallocate a fixed number of slots
> at the end of the vector to allow for growth or shrinkage of the
> inversion list.  The empty slots can be contain the repetition of
> the last value, or 0 as an out-of-band value.  I think a list-based
> implementation is slightly better, though.

IMO, vectors are the wrong tool for this kind of job -- lists are both
faster and more flexible to use for inversion lists.

> I did a search on this topic, but could not find anything.  I hope my
> proposal is of interest to people other than Gnus developers (Unicode
> coders, for instance).

It's a good question -- will unicode benefit from this at the C-level? 

-- 
Kim F. Storm <storm@cua.dk> http://www.cua.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bool-vector implementation in the Emacs core
  2004-01-23  0:16 ` Kim F. Storm
@ 2004-01-23  0:37   ` Kenichi Handa
  2004-01-23  2:19     ` Stefan Monnier
  2004-01-23 20:26     ` Ted Zlatanov
  2004-01-23 20:18   ` Ted Zlatanov
  1 sibling, 2 replies; 13+ messages in thread
From: Kenichi Handa @ 2004-01-23  0:37 UTC (permalink / raw)
  Cc: tzz, emacs-devel

In article <m3k73jtt8b.fsf@kfs-l.imdomain.dk>, no-spam@cua.dk (Kim F. Storm) writes:
>>  I did a search on this topic, but could not find anything.  I hope my
>>  proposal is of interest to people other than Gnus developers (Unicode
>>  coders, for instance).

> It's a good question -- will unicode benefit from this at the C-level? 

I'm not sure.  Emacs-unicode uses char-table for many
things.  It is a pseudo array of 0..#x3FFFFF.  So, if
indices fit in this range, you can use it not only for a
character, but for any integer.

Ex. Set `t' to these ranges: (1 (6 . 8) 1000 (1500 . 1600) 2000)

(setq tbl (make-char-table nil))
(aset tbl 1 t)
(set-char-table-range tbl '(6 . 8) t)
(aset tbl 1000 t)
(set-char-table-range tbl '(1500 . 1600) t)
(aset tbl 2000 t)

It may consume more memory than ranges or inversion list,
(aref tbl N) is quite faster than them.

It is also possible to implement a code that doesn't limit
the range to 0..#x3fffff if it is really required.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bool-vector implementation in the Emacs core
  2004-01-23  0:37   ` Kenichi Handa
@ 2004-01-23  2:19     ` Stefan Monnier
  2004-01-23 20:26     ` Ted Zlatanov
  1 sibling, 0 replies; 13+ messages in thread
From: Stefan Monnier @ 2004-01-23  2:19 UTC (permalink / raw)
  Cc: tzz, emacs-devel, no-spam

> I'm not sure.  Emacs-unicode uses char-table for many
> things.  It is a pseudo array of 0..#x3FFFFF.  So, if

Note that in Emacs non-unicode (e.g. Emacs-CVS and older), char-tables are
not arrays (although they look like one): the domain is non-contiguous and
there are a few entries that when set impact others as well (they are called
generic-chars).

        Stefan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bool-vector implementation in the Emacs core
  2004-01-23  0:16 ` Kim F. Storm
  2004-01-23  0:37   ` Kenichi Handa
@ 2004-01-23 20:18   ` Ted Zlatanov
  2004-01-24  2:11     ` Kim F. Storm
  1 sibling, 1 reply; 13+ messages in thread
From: Ted Zlatanov @ 2004-01-23 20:18 UTC (permalink / raw)

On 23 Jan 2004, no-spam@cua.dk wrote:

> Ted Zlatanov <tzz@lifelogs.com> writes:
> 
>> 
>> We're talking about implementations of ranges in the Gnus developer
>> list right now, and our implementations of ranges is written in
>> Lisp.  I can implement inversion lists in Lisp as well, but ranges
>> are such an essential Gnus piece that it seems like seeking core
>> support for them would be a good idea.
> 
> What kind of C-level support are you looking for?

Something like a bool-vector, but internally implemented with an
inversion list.  A double-linked C list of integers is perfect, so you
can insert and delete quickly.  I can base it on the ELisp lists, I
was just hoping to make the bool-vector operations insanely fast :)

Now, there are two approaches - either provide an alternate internal
implementation of the bool-vector when you create it, or provide a
whole new data type.  I'd rather be able to make a bool-vector with
an inversion list internally.

> If you implement inversion lists with lists, I don't really think
> lookups or inserts will be significantly faster in C than in
> byte-compiled Elisp.

You may be right, I'm not familiar with the differences.  It seems to
me that a double-linked list in C that can only hold integers will be
significantly faster than the equivalent interpreted code using the
general-purpose ELisp lists.

> I do think that it would be a good idea to add the bool-vector
> support to the emacs lisp core files though, e.g. in subr.el.
> 
> What kind of API do you envision?

Same as bool-vector: aref, aset.  Also union, intersection, inversion,
etc. set operations would be nice.  With inversion lists those
operations are very simple.  Inversion, for instance - you just
precede the list with a 0, or delete the 0 if it's already there.

Thanks
Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bool-vector implementation in the Emacs core
  2004-01-23  0:37   ` Kenichi Handa
  2004-01-23  2:19     ` Stefan Monnier
@ 2004-01-23 20:26     ` Ted Zlatanov
  2004-01-24 21:40       ` Richard Stallman
  1 sibling, 1 reply; 13+ messages in thread
From: Ted Zlatanov @ 2004-01-23 20:26 UTC (permalink / raw)

On Fri, 23 Jan 2004, handa@m17n.org wrote:

> I'm not sure.  Emacs-unicode uses char-table for many
> things.  It is a pseudo array of 0..#x3FFFFF.  So, if
> indices fit in this range, you can use it not only for a
> character, but for any integer.

That's good to know, thanks.

> [a char-table] may consume more memory than ranges or inversion
> list, (aref tbl N) is quite faster than them.

Lookups are faster, sure (ignoring the effect of swapping because
you're using too much memory - that effect is very significant
sometimes).  Inversion lists are significantly faster in almost every
other set operation, though: initializing to the same value, union,
intersection, inversion.  They are slower near the worst case, where
every element is a toggle, so the length of the list is the same as
the length of the bit string - but in the more common case of sparse
ranges that we find in Gnus and Unicode programming, I would say
inversion lists will perform very well.

My suggestion was to implement a bool-vector that can use a inversion
list if suitable at the programmer's request, and maybe even convert
it to an alternate representation when necessary.  I would rather see
that than a new data type in Emacs Lisp (that was my alternate
suggestion, if the internal inversion list for a bool-vector is not
usable)..

Thanks
Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bool-vector implementation in the Emacs core
  2004-01-23 20:18   ` Ted Zlatanov
@ 2004-01-24  2:11     ` Kim F. Storm
  0 siblings, 0 replies; 13+ messages in thread
From: Kim F. Storm @ 2004-01-24  2:11 UTC (permalink / raw)
  Cc: emacs-devel

Ted Zlatanov <tzz@lifelogs.com> writes:

> On 23 Jan 2004, no-spam@cua.dk wrote:
> 
> > Ted Zlatanov <tzz@lifelogs.com> writes:
> > 
> >> 
> >> We're talking about implementations of ranges in the Gnus developer
> >> list right now, and our implementations of ranges is written in
> >> Lisp.  I can implement inversion lists in Lisp as well, but ranges
> >> are such an essential Gnus piece that it seems like seeking core
> >> support for them would be a good idea.
> > 
> > What kind of C-level support are you looking for?
> 
> Something like a bool-vector, but internally implemented with an
> inversion list.  A double-linked C list of integers is perfect, so you
> can insert and delete quickly.  I can base it on the ELisp lists, I
> was just hoping to make the bool-vector operations insanely fast :)

If you always lookup from the start of the list, a double-linked list
isn't needed; just keep a pointer to the previous element and use
setcdr to insert or delete elements in the list and setcar to change
a specific element of the list.

> 
> Now, there are two approaches - either provide an alternate internal
> implementation of the bool-vector when you create it, or provide a
> whole new data type.  I'd rather be able to make a bool-vector with
> an inversion list internally.

Here is a starting point written in Elisp; nothing fancy yet, but I
think it shows that it can be done fairly efficiently using lists.


;;; Implement bool-vectors as inversion lists.
;;; (c) 2004 Kim F. Storm <storm@cua.dk>.  All rights reserved.

(defun make-bool-vector ()
  "Create an empty bool vector."
  '(-1))

(defun bool-vector-locate (bv elt)
  "Locate cons cell in bool vector BV which precedes element ELT.
For internal use only.  Return value is (PRED . ON)."
  (let (on (b (cdr bv)))
    (while (and (consp b) (> elt (car b)))
      (setq bv b
	    b (cdr b)
	    on (not on)))
    (cons bv on)))

(defun bool-vector-add (bv elt)
  "Add to bool vector BV the element ELT."
  (let* ((b-on (bool-vector-locate bv elt))
	 (b (car b-on)) (on (cdr b-on))
	 (c (cdr b)))
    (cond
     ((null c)
      (setcdr b (list elt (1+ elt))))
     ((< (1+ elt) (car c))
      (if (not on)
	  (setcdr b (cons elt (cons (1+ elt) (cdr b))))))
     ((< elt (car c))
      (if (not on)
	  (setcar c elt)))
     ((= elt (car c))
      (if on
	  (if (and (consp (cdr c)) (= (1+ elt) (car (cdr c))))
	      (setcdr b (cdr (cdr c)))
	    (setcar c (1+ elt)))))))
  bv)

(defun bool-vector-test (bv elt)
  "Test if bool vector BV contains element ELT."
  (let* ((b-on (bool-vector-locate bv elt))
	 (b (car b-on)) (on (cdr b-on))
	 (c (cdr b)))
    (and (consp c) 
	 (if on (< elt (car c)) (= elt (car c))))))

-- 
Kim F. Storm <storm@cua.dk> http://www.cua.dk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bool-vector implementation in the Emacs core
  2004-01-23 20:26     ` Ted Zlatanov
@ 2004-01-24 21:40       ` Richard Stallman
  2004-01-25  2:59         ` Ted Zlatanov
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Stallman @ 2004-01-24 21:40 UTC (permalink / raw)
  Cc: emacs-devel

    My suggestion was to implement a bool-vector that can use a inversion
    list if suitable at the programmer's request, and maybe even convert
    it to an alternate representation when necessary.

Why not just write code to use such a list explicitly?
Why do you need it to be hidden inside of a bool-vector?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bool-vector implementation in the Emacs core
  2004-01-24 21:40       ` Richard Stallman
@ 2004-01-25  2:59         ` Ted Zlatanov
  2004-01-26 19:23           ` Richard Stallman
  0 siblings, 1 reply; 13+ messages in thread
From: Ted Zlatanov @ 2004-01-25  2:59 UTC (permalink / raw)

On Sat, 24 Jan 2004, rms@gnu.org wrote:

> Why not just write code to use such a list explicitly?

I can write it myself in ELisp (the code is very simple), but I had
two reasons for bringing it up on emacs-devel:

- I think other Emacs developers could benefit from the same code.
  Inversion lists are very fast in most cases, and for sparse ranges
  are significantly faster and use less memory than the current
  implementation of bool-vector (which is basically a bitstring).

- I thought that a straight C implementation might be even better
  than using ELisp primitives; it's best to ask that question of the
  Emacs developers since I'm not familiar with the Emacs internals.

> Why do you need it to be hidden inside of a bool-vector?

Because the aref/aset API of bool-vector is nice, and existing code
that uses bool-vector can use inversion lists without major changes.

Also, bool-vector is definitely the right abstraction for what
inversion lists do.  There's no need to invent a new data type, with a
new API, for basically a bool-vector with an alternate internal
representation.

You can specify hashtable size when you create it.  In the same way
you could specify the bool-vector preference to be 'dense or 'sparse
for the range.  Then Emacs would give you an inversion list for sparse
ranges or a bitstring for dense ranges.  That's what I was really
hoping for.

Thank you
Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bool-vector implementation in the Emacs core
  2004-01-25  2:59         ` Ted Zlatanov
@ 2004-01-26 19:23           ` Richard Stallman
  2004-01-27  1:32             ` Kim F. Storm
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Stallman @ 2004-01-26 19:23 UTC (permalink / raw)
  Cc: emacs-devel

I am not sure this is useful in general and worth adding to the C code.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bool-vector implementation in the Emacs core
  2004-01-26 19:23           ` Richard Stallman
@ 2004-01-27  1:32             ` Kim F. Storm
  0 siblings, 0 replies; 13+ messages in thread
From: Kim F. Storm @ 2004-01-27  1:32 UTC (permalink / raw)
  Cc: emacs-devel

Richard Stallman <rms@gnu.org> writes:

> I am not sure this is useful in general and worth adding to the C code.

Well, we already have support for (dense) bool-vectors of fixed length
in the core, so his proposal to extend that to support sparse (unlimited) bool
vectors sounds sensible to me.

However I think that--contrary to the dense bool vectors--the
inversion lists can just as well be implemented in Lisp, so I agree
that it is not worth adding to the C code just to extend the aref/aset
API to cover sparse bool-vectors.

Actually, looking at the code of aref and aset, I think it may very well
make sense to keep them separate.

BTW, the code I posted a few days ago had a bug in make-bool-vector.
Here is a fixed version, renamed not to collide with the built-in
make-bool-vector:

(defun make-sparse-bool-vector ()
  "Create an empty bool vector."
  (cons -1 nil))

-- 
Kim F. Storm <storm@cua.dk> http://www.cua.dk

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2004-01-27  1:32 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-21 16:52 bool-vector implementation in the Emacs core Ted Zlatanov
2004-01-21 17:06 ` Paul Jarc
2004-01-21 18:35   ` Ted Zlatanov
2004-01-23  0:16 ` Kim F. Storm
2004-01-23  0:37   ` Kenichi Handa
2004-01-23  2:19     ` Stefan Monnier
2004-01-23 20:26     ` Ted Zlatanov
2004-01-24 21:40       ` Richard Stallman
2004-01-25  2:59         ` Ted Zlatanov
2004-01-26 19:23           ` Richard Stallman
2004-01-27  1:32             ` Kim F. Storm
2004-01-23 20:18   ` Ted Zlatanov
2004-01-24  2:11     ` Kim F. Storm

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).