Suggestions to remove one alist's members from another

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Suggestions to remove one alist's members from another
@ 2010-04-09 12:27 Wilson Snyder
  2010-04-09 15:12 ` David Kastrup
  0 siblings, 1 reply; 9+ messages in thread
From: Wilson Snyder @ 2010-04-09 12:27 UTC (permalink / raw)
  To: emacs-devel

Hello,

One of my packages has a function like this:

(defun not-in-alist (in-alist not-alist)
  "Return alist of elements in IN-ALIST that aren't also in NOT-ALIST.
Also remove any duplicates in IN-ALIST."
  (let (out-alist)
    (while in-alist
      (if (not (or (assoc (car (car in-alist)) not-alist)
		   (assoc (car (car in-alist)) out-alist)))
	  (setq out-alist (cons (car in-alist) out-alist)))
      (setq in-alist (cdr in-alist)))
    (nreverse out-alist)))

This code was expedient, but obviously not fast for large lists.
I want to replace it with something closer to O(1), even if it
has larger overhead.

Does anyone have an existing function or example I should use to
do this?  It seems relatively primitive.

If not, my thought is this: when the list was over some size
I'd determine experimentally, it would instead build a hash
table from in-list and hit the not-alist against that.  When
complete it would unfortunately require a second pass
through the in-alist to return it maintaining the original
order.

Thoughts?

Thanks

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Suggestions to remove one alist's members from another
@ 2010-04-09 13:02 Wilson Snyder
  2010-04-09 14:11 ` Davis Herring
  0 siblings, 1 reply; 9+ messages in thread
From: Wilson Snyder @ 2010-04-09 13:02 UTC (permalink / raw)
  To: emacs-devel

Hello,

One of my packages has a function like this:

(defun not-in-alist (in-alist not-alist)
  "Return alist of elements in IN-ALIST that aren't also in NOT-ALIST.
Also remove any duplicates in IN-ALIST."
  (let (out-alist)
    (while in-alist
      (if (not (or (assoc (car (car in-alist)) not-alist)
		   (assoc (car (car in-alist)) out-alist)))
	  (setq out-alist (cons (car in-alist) out-alist)))
      (setq in-alist (cdr in-alist)))
    (nreverse out-alist)))

This code was expedient, but obviously not fast for large lists.
I want to replace it with something closer to O(1), even if it
has larger overhead.

Does anyone have an existing function or example I should use to
do this?  It seems relatively primitive.

If not, my thought is this: when the list was over some size
I'd determine experimentally, it would instead build a hash
table from in-list and hit the not-alist against that.  When
complete it would unfortunately require a second pass
through the in-alist to return it maintaining the original
order.

Thoughts?

Thanks

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Suggestions to remove one alist's members from another
  2010-04-09 13:02 Wilson Snyder
@ 2010-04-09 14:11 ` Davis Herring
  2010-04-09 14:16   ` Wilson Snyder
  0 siblings, 1 reply; 9+ messages in thread
From: Davis Herring @ 2010-04-09 14:11 UTC (permalink / raw)
  To: Wilson Snyder; +Cc: emacs-devel

> If not, my thought is this: when the list was over some size
> I'd determine experimentally, it would instead build a hash
> table from in-list and hit the not-alist against that.  When
> complete it would unfortunately require a second pass
> through the in-alist to return it maintaining the original
> order.

Why not build a hash table (of the cars of the elements) from not-alist
instead?  Then you can just walk in-alist, skipping elements that are in
the hash (that is, whose cars are in it) and adding the rest to out-alist
and (their cars to) the hash.

Davis

-- 
This product is sold by volume, not by mass.  If it appears too dense or
too sparse, it is because mass-energy conversion has occurred during
shipping.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Suggestions to remove one alist's members from another
  2010-04-09 14:11 ` Davis Herring
@ 2010-04-09 14:16   ` Wilson Snyder
  2010-04-09 14:45     ` Drew Adams
                       ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Wilson Snyder @ 2010-04-09 14:16 UTC (permalink / raw)
  To: herring; +Cc: emacs-devel


>> If not, my thought is this: when the list was over some size
>> I'd determine experimentally, it would instead build a hash
>> table from in-list and hit the not-alist against that.  When
>> complete it would unfortunately require a second pass
>> through the in-alist to return it maintaining the original
>> order.
>
>Why not build a hash table (of the cars of the elements) from not-alist
>instead?  Then you can just walk in-alist, skipping elements that are in
>the hash (that is, whose cars are in it) and adding the rest to out-alist
>and (their cars to) the hash.

Thanks, that's a good improvement. I also would add each
in-list element after each test for hash membership, as I
want to eliminate duplicates.

It still seems like this should already exist somewhere...

-Wilson




^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Suggestions to remove one alist's members from another
  2010-04-09 14:16   ` Wilson Snyder
@ 2010-04-09 14:45     ` Drew Adams
  2010-04-09 14:48     ` Davis Herring
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Drew Adams @ 2010-04-09 14:45 UTC (permalink / raw)
  To: 'Wilson Snyder', herring; +Cc: emacs-devel

> >> If not, my thought is this: when the list was over some size
> >> I'd determine experimentally, it would instead build a hash
> >> table from in-list and hit the not-alist against that.  When
> >> complete it would unfortunately require a second pass
> >> through the in-alist to return it maintaining the original
> >> order.
> >
> >Why not build a hash table (of the cars of the elements) 
> >from not-alist instead?  Then you can just walk in-alist,
> >skipping elements that are in the hash (that is, whose
> >cars are in it) and adding the rest to out-alist
> >and (their cars to) the hash.
> 
> Thanks, that's a good improvement. I also would add each
> in-list element after each test for hash membership, as I
> want to eliminate duplicates.
> 
> It still seems like this should already exist somewhere...

Well, there are the `[n]set-difference' sequence functions (in cl-seq.el), but
they're not O(1). You might want to take a look at them anyway.





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Suggestions to remove one alist's members from another
  2010-04-09 14:16   ` Wilson Snyder
  2010-04-09 14:45     ` Drew Adams
@ 2010-04-09 14:48     ` Davis Herring
  2010-04-09 15:05     ` David Kastrup
  2010-04-09 17:31     ` Stephen J. Turnbull
  3 siblings, 0 replies; 9+ messages in thread
From: Davis Herring @ 2010-04-09 14:48 UTC (permalink / raw)
  To: Wilson Snyder; +Cc: emacs-devel

> Thanks, that's a good improvement. I also would add each
> in-list element after each test for hash membership, as I
> want to eliminate duplicates.

>>and adding the rest to out-alist and (their cars to) the hash.

Yeah, I would too.  :)

Davis

-- 
This product is sold by volume, not by mass.  If it appears too dense or
too sparse, it is because mass-energy conversion has occurred during
shipping.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Suggestions to remove one alist's members from another
  2010-04-09 14:16   ` Wilson Snyder
  2010-04-09 14:45     ` Drew Adams
  2010-04-09 14:48     ` Davis Herring
@ 2010-04-09 15:05     ` David Kastrup
  2010-04-09 17:31     ` Stephen J. Turnbull
  3 siblings, 0 replies; 9+ messages in thread
From: David Kastrup @ 2010-04-09 15:05 UTC (permalink / raw)
  To: emacs-devel

wsnyder@wsnyder.org (Wilson Snyder) writes:

>>> If not, my thought is this: when the list was over some size
>>> I'd determine experimentally, it would instead build a hash
>>> table from in-list and hit the not-alist against that.  When
>>> complete it would unfortunately require a second pass
>>> through the in-alist to return it maintaining the original
>>> order.
>>
>>Why not build a hash table (of the cars of the elements) from not-alist
>>instead?  Then you can just walk in-alist, skipping elements that are in
>>the hash (that is, whose cars are in it) and adding the rest to out-alist
>>and (their cars to) the hash.
>
> Thanks, that's a good improvement. I also would add each
> in-list element after each test for hash membership, as I
> want to eliminate duplicates.
>
> It still seems like this should already exist somewhere...

I am somewhat annoyed to find that some old work of mine done on reftex
seems to have been lost in the course of some upgrades.

Here are some examples how to do stuff like that (written by myself,
in spite of the name):

(defun TeX-delete-dups-by-car (alist &optional keep-list)
  "Return a list of all elements in ALIST, but each car only once.
Elements of KEEP-LIST are not removed even if duplicate."
  ;; Copy of `reftex-uniquify-by-car' (written by David Kastrup).
  (setq keep-list (sort (copy-sequence keep-list) #'string<))
  (setq alist (sort (copy-sequence alist)
		    (lambda (a b)
		      (string< (car a) (car b)))))
  (let ((new alist) elt)
    (while new
      (setq elt (caar new))
      (while (and keep-list (string< (car keep-list) elt))
	(setq keep-list (cdr keep-list)))
      (unless (and keep-list (string= elt (car keep-list)))
	(while (string= elt (car (cadr new)))
	  (setcdr new (cddr new))))
      (setq new (cdr new))))
  alist)

(defun TeX-delete-duplicate-strings (list)
  "Return a list of all strings in LIST, but each only once."
  (setq list (TeX-sort-strings list))
  (let ((new list) elt)
    (while new
      (setq elt (car new))
      (while (string= elt (cadr new))
	(setcdr new (cddr new)))
      (setq new (cdr new))))
  list)

Now that changes the order of elements.  I had versions which did not do
that either, but apparently lost them.

The key is to have the algorithms work on sorted lists.

-- 
David Kastrup





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Suggestions to remove one alist's members from another
  2010-04-09 12:27 Suggestions to remove one alist's members from another Wilson Snyder
@ 2010-04-09 15:12 ` David Kastrup
  0 siblings, 0 replies; 9+ messages in thread
From: David Kastrup @ 2010-04-09 15:12 UTC (permalink / raw)
  To: emacs-devel

wsnyder@wsnyder.org (Wilson Snyder) writes:

> One of my packages has a function like this:
>
> (defun not-in-alist (in-alist not-alist)
>   "Return alist of elements in IN-ALIST that aren't also in NOT-ALIST.
> Also remove any duplicates in IN-ALIST."
>   (let (out-alist)
>     (while in-alist
>       (if (not (or (assoc (car (car in-alist)) not-alist)
> 		   (assoc (car (car in-alist)) out-alist)))
> 	  (setq out-alist (cons (car in-alist) out-alist)))
>       (setq in-alist (cdr in-alist)))
>     (nreverse out-alist)))
>
> This code was expedient, but obviously not fast for large lists.
> I want to replace it with something closer to O(1), even if it
> has larger overhead.
>
> Does anyone have an existing function or example I should use to
> do this?  It seems relatively primitive.
>
> If not, my thought is this: when the list was over some size
> I'd determine experimentally, it would instead build a hash
> table from in-list and hit the not-alist against that.  When
> complete it would unfortunately require a second pass
> through the in-alist to return it maintaining the original
> order.
>
> Thoughts?

I would not use hash tables for one-shot tasks like that.  Instead, use
sorting.  Hash tables are good for _maintaining_ unique data, adding and
removing stuff.  You could consider putting your data in hash lists in
the first place and keeping it there.

But if the data set does not change, a hash table is overkill as opposed
to merge-like algorithms on sorted lists.

-- 
David Kastrup





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Suggestions to remove one alist's members from another
  2010-04-09 14:16   ` Wilson Snyder
                       ` (2 preceding siblings ...)
  2010-04-09 15:05     ` David Kastrup
@ 2010-04-09 17:31     ` Stephen J. Turnbull
  3 siblings, 0 replies; 9+ messages in thread
From: Stephen J. Turnbull @ 2010-04-09 17:31 UTC (permalink / raw)
  To: Wilson Snyder; +Cc: emacs-devel

Wilson Snyder writes:

 > >Why not build a hash table (of the cars of the elements) from not-alist
 > >instead?  Then you can just walk in-alist, skipping elements that are in
 > >the hash (that is, whose cars are in it) and adding the rest to out-alist
 > >and (their cars to) the hash.
 > 
 > Thanks, that's a good improvement. I also would add each
 > in-list element after each test for hash membership, as I
 > want to eliminate duplicates.

Note that Davis's algorithm already does this, IIUC.  Elements that
are just skipped don't need to be added because they are in the hash
already, while elements that are added to the out-alist get added to
the hash, too.

 > It still seems like this should already exist somewhere...

The Common Lisp emulation has such operations, but I think they're not
stable.




^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-04-09 17:31 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-09 12:27 Suggestions to remove one alist's members from another Wilson Snyder
2010-04-09 15:12 ` David Kastrup
  -- strict thread matches above, loose matches on Subject: below --
2010-04-09 13:02 Wilson Snyder
2010-04-09 14:11 ` Davis Herring
2010-04-09 14:16   ` Wilson Snyder
2010-04-09 14:45     ` Drew Adams
2010-04-09 14:48     ` Davis Herring
2010-04-09 15:05     ` David Kastrup
2010-04-09 17:31     ` Stephen J. Turnbull

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).