unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* Uniq list in Guile
@ 2013-10-26  7:12 Dmitry Bogatov
  2013-10-26  8:03 ` Panicz Maciej Godek
  2013-10-26 10:16 ` Ian Price
  0 siblings, 2 replies; 6+ messages in thread
From: Dmitry Bogatov @ 2013-10-26  7:12 UTC (permalink / raw)
  To: guile-user


[-- Attachment #1.1: Type: text/plain, Size: 551 bytes --]


Hello!

Recently I faced neseserity to uniq list in Guile. Surprised, that I
did not found ready solution, I decided to prepare it.

By analogy with sorting functions in `sort.c` I prepared
`uniq.c`(attached, but unfinished patch, lacking proper documentation).
But after I became curious, how slower would be Scheme version with
(set-cdr!). To my test (random list of small integers) it is about twice
as slow.

Is Guile Mainline interested in these routines, and if yes, does in your
opinion doubling execution speed worth resorting to C hacking?


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: Implement `uniq` and `uniq!` in C. --]
[-- Type: text/x-diff, Size: 8123 bytes --]

From 0e5331e20446ba3318fd6ab0aabbcc40a4b4a5ab Mon Sep 17 00:00:00 2001
From: Dmitry Bogatov <KAction@gnu.org>
Date: Tue, 22 Oct 2013 15:56:43 +0400
Subject: [PATCH] Uniq Scheme lists in C

Signed-off-by: Dmitry Bogatov <KAction@gnu.org>
---
 doc/ref/api-utility.texi | 29 ++++++++++++++-
 libguile/Makefile.am     |  4 +++
 libguile/init.c          |  2 ++
 libguile/uniq.c          | 93 ++++++++++++++++++++++++++++++++++++++++++++++++
 libguile/uniq.h          | 39 ++++++++++++++++++++
 5 files changed, 166 insertions(+), 1 deletion(-)
 create mode 100644 libguile/uniq.c
 create mode 100644 libguile/uniq.h

diff --git a/doc/ref/api-utility.texi b/doc/ref/api-utility.texi
index 76c50b2..22ffb27 100644
--- a/doc/ref/api-utility.texi
+++ b/doc/ref/api-utility.texi
@@ -18,6 +18,7 @@ applications, they are collected in a @dfn{utility} chapter.
 * Object Properties::           A modern interface to object properties.
 * Sorting::                     Sort utility procedures.
 * Copying::                     Copying deep structures.
+* Uniqueing::                   Uniqueing lists.
 * General Conversion::          Converting objects to strings.
 * Hooks::                       User-customizable event lists.
 @end menu
@@ -256,7 +257,7 @@ to @var{value}.
 @end deffn
 
 
-@node Sorting
+@node Sorting, Copying, Object Properties, Utility Functions
 @subsection Sorting
 
 @c FIXME::martin: Review me!
@@ -372,6 +373,32 @@ the range of the vector which gets sorted.  The return value
 is not specified.
 @end deffn
 
+@node Uniqueing
+
+@cindex uniqueing
+@cindex uniqueing lists
+
+Another common operation is getting unique elements of
+sequence. Following procedures perform action, analogous to
+@code{uniq}(1) program --- delete adjanced equal elements. If you need
+remove all duplicates, @var{sort} or @var{sort!} it before. Since this
+operation change length of sequence, vectors are not supported.
+
+@c snarfed from uniq.c:40
+@deffn {Scheme Procedure} uniq! items eq
+Uniquify the list @var{items}, by deleting equal adjanced
+elements. @var{eq} is used for comparing the sequence
+elements.  The sorting is destructive, that means that
+the input sequence is modified to produce the sorted result.
+
+@end deffn
+
+@c snarfed from uniq.c:72
+@deffn {Scheme Procedure} uniq items eq
+Uniquify the list @var{items}, by deleting equal adjanced
+elements.  @var{eq} is used for comparing the sequence
+elements.
+@end deffn
 
 @node Copying
 @subsection Copying Deep Structures
diff --git a/libguile/Makefile.am b/libguile/Makefile.am
index ce437e4..2928843 100644
--- a/libguile/Makefile.am
+++ b/libguile/Makefile.am
@@ -221,6 +221,7 @@ libguile_@GUILE_EFFECTIVE_VERSION@_la_SOURCES =				\
 	version.c				\
 	vm.c					\
 	vports.c				\
+	uniq.c					\
 	weak-set.c				\
 	weak-table.c				\
 	weak-vector.c
@@ -319,6 +320,7 @@ DOT_X_FILES =					\
 	vectors.x				\
 	version.x				\
 	vports.x				\
+	uniq.x					\
 	weak-set.x				\
 	weak-table.x				\
 	weak-vector.x
@@ -422,6 +424,7 @@ DOT_DOC_FILES = 				\
 	vectors.doc				\
 	version.doc				\
 	vports.doc				\
+	uniq.doc				\
 	weak-set.doc				\
 	weak-table.doc				\
 	weak-vector.doc
@@ -642,6 +645,7 @@ modinclude_HEADERS =				\
 	vm-expand.h				\
 	vm.h					\
 	vports.h				\
+	uniq.h					\
 	weak-set.h				\
 	weak-table.h				\
 	weak-vector.h
diff --git a/libguile/init.c b/libguile/init.c
index 6787483..51698c4 100644
--- a/libguile/init.c
+++ b/libguile/init.c
@@ -111,6 +111,7 @@
 #include "libguile/smob.h"
 #include "libguile/socket.h"
 #include "libguile/sort.h"
+#include "libguile/uniq.h"
 #include "libguile/srcprop.h"
 #include "libguile/stackchk.h"
 #include "libguile/stacks.h"
@@ -465,6 +466,7 @@ scm_i_init_guile (void *base)
   scm_init_socket ();
 #endif
   scm_init_sort ();
+  scm_init_uniq ();
   scm_init_srcprop ();     /* requires smob_prehistory */
   scm_init_stackchk ();
 
diff --git a/libguile/uniq.c b/libguile/uniq.c
new file mode 100644
index 0000000..4cc2b86
--- /dev/null
+++ b/libguile/uniq.c
@@ -0,0 +1,93 @@
+/* Copyright (C) 2013 Free Software Foundation, Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public License
+ * as published by the Free Software Foundation; either version 3 of
+ * the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301 USA
+ */
+\f
+#ifdef HAVE_CONFIG_H
+# include <config.h>
+#endif
+
+#include "libguile/_scm.h"
+#include "libguile/eval.h"
+#include "libguile/arrays.h"
+#include "libguile/array-map.h"
+#include "libguile/feature.h"
+#include "libguile/vectors.h"
+#include "libguile/async.h"
+#include "libguile/dynwind.h"
+
+#include "libguile/uniq.h"
+\f
+
+SCM_DEFINE (scm_uniq_x, "uniq!", 2, 0, 0,
+            (SCM items, SCM eq),
+	    "Uniquify the list @var{items}, by deleting equal adjanced\n"
+	    "elements. @var{eq} is used for comparing the sequence\n"
+	    "elements.  The sorting is destructive, that means that\n"
+	    "the input sequence is modified to produce the sorted result.\n")
+#define FUNC_NAME s_scm_uniq_x
+{
+  int len;
+  SCM it = items;
+
+  if (SCM_NULL_OR_NIL_P (items))
+    return items;
+
+  SCM_VALIDATE_LIST_COPYLEN(1, items, len);
+
+  while (!SCM_NULL_OR_NIL_P(SCM_CDR(it)))
+    {
+      SCM cur =  SCM_CAR(it);
+      SCM next = SCM_CADR(it);
+      if (scm_is_true (scm_call_2 (eq, cur, next)))
+	{
+	  SCM_SETCDR(it, SCM_CDDR(it));
+	}
+      else
+	{
+	  it = SCM_CDR(it);
+	}
+    }
+  return items;
+}
+#undef FUNC_NAME
+
+SCM_DEFINE (scm_uniq, "uniq", 2, 0, 0,
+            (SCM items, SCM eq),
+	    "Uniquify the list @var{items}, by deleting equal adjanced\n"
+	    "elements.  @var{eq} is used for comparing the sequence \n"
+	    "elements.")
+#define FUNC_NAME s_scm_uniq
+{
+  if (SCM_NULL_OR_NIL_P (items))
+    return items;
+
+  return scm_uniq_x(scm_list_copy(items), eq);
+}
+#undef FUNC_NAME
+
+
+void
+scm_init_uniq (void)
+{
+#include "libguile/uniq.x"
+}
+
+/*
+  Local Variables:
+  c-file-style: "gnu"
+  End:
+*/
diff --git a/libguile/uniq.h b/libguile/uniq.h
new file mode 100644
index 0000000..6926d01
--- /dev/null
+++ b/libguile/uniq.h
@@ -0,0 +1,39 @@
+/* classes: h_files */
+
+#ifndef SCM_UNIQ_H
+#define SCM_UNIQ_H
+
+/* Copyright (C) 2013 Free Software Foundation, Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public License
+ * as published by the Free Software Foundation; either version 3 of
+ * the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301 USA
+ */
+
+\f
+
+#include "libguile/__scm.h"
+
+\f
+SCM_API SCM scm_uniq_x(SCM items, SCM eq);
+SCM_API SCM scm_uniq(SCM items, SCM eq);
+SCM_INTERNAL void scm_init_uniq(void);
+
+#endif  /* SCM_UNIQ_H */
+
+/*
+  Local Variables:
+  c-file-style: "gnu"
+  End:
+*/
-- 
Recipients list generated via git-blame. Tell me, if you object.


[-- Attachment #1.3: Code for simple benchmarking --]
[-- Type: text/plain, Size: 555 bytes --]

(define* (random-list #:key (length 1000) (limit 8))
    (let loop ((seed '()) (index 0))
        (if (= length index)
            seed
            (loop (cons (random limit) seed) (1+ index)))))

(define nl (random-list #:length 10000))
(define sl (random-list #:length 10000))

(define (scm-uniq! lst eq)
    (let loop ((it lst))
        (when (pair? (cdr it))
              (if (eq (car it) (cadr it))
                  (begin
                      (set-cdr! it (cddr it))
                      (loop it))
                  (loop (cdr it)))))
    lst)

[-- Attachment #1.4: Type: text/plain, Size: 242 bytes --]


--
Best regards, Dmitry Bogatov <KAction@gnu.org>,
Free Software supporter and netiquette guardian.
	git clone git://kaction.name/rc-files.git --depth 1
	GPG: 54B7F00D
Html mail and proprietary format attachments are forwarded to /dev/null.

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Uniq list in Guile
  2013-10-26  7:12 Uniq list in Guile Dmitry Bogatov
@ 2013-10-26  8:03 ` Panicz Maciej Godek
  2013-10-26  9:06   ` Thien-Thi Nguyen
  2013-10-26 10:16 ` Ian Price
  1 sibling, 1 reply; 6+ messages in thread
From: Panicz Maciej Godek @ 2013-10-26  8:03 UTC (permalink / raw)
  To: Dmitry Bogatov; +Cc: guile-user

[-- Attachment #1: Type: text/plain, Size: 1227 bytes --]

There's also a "delete-duplicates" function available that comes with
SRFI-1.
It has a different behaviour, because it uniques the whole list, not just
the groups of adjacent elements, and is a pure function, but usually can be
used to achieve the desired effect.
The manual claims that it has an O(n^2) complexity. There's a simple way to
decrease the complexity to linear time (+ sorting time, if one wishes to
preserve the order) using a hash table, like that:

(define (unique elements)
  (let ((result (make-hash-table)))
    (let loop ((i 0)(lst elements))
      (if (pair? lst)
          (begin
            (hash-set! result (car lst) i)
            (loop (1+ i) (cdr lst)))
          (map car (sort-list (hash-map->list cons result)
                              (lambda (a b) (< (cdr a) (cdr b)))))))))

Perhaps someone else's opinion would be useful here, but I don't
think that UNIX-style "uniq" procedure should be common to Scheme
programming, and in particular -- its mutable variant, which makes it
completely uncomposable (the UNIX variant is immutable, so it can be
used with pipes). Note that uniq has to be used with sort in order to
express the above concept (i.e. removing duplicate entries).

Regards,
M.

[-- Attachment #2: Type: text/html, Size: 1572 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Uniq list in Guile
  2013-10-26  8:03 ` Panicz Maciej Godek
@ 2013-10-26  9:06   ` Thien-Thi Nguyen
  2013-10-26 12:05     ` Dmitry Bogatov
  0 siblings, 1 reply; 6+ messages in thread
From: Thien-Thi Nguyen @ 2013-10-26  9:06 UTC (permalink / raw)
  To: guile-user

[-- Attachment #1: Type: text/plain, Size: 560 bytes --]

() Panicz Maciej Godek <godek.maciek@gmail.com>
() Sat, 26 Oct 2013 10:03:44 +0200

   There's also a "delete-duplicates" function available that comes with
   SRFI-1.

See also ‘(ice-9 common-list) uniq’, which hardcodes ‘eq?’ comparison
(via ‘memq’).  I, too, think it's better to not add this as a primitive.

-- 
Thien-Thi Nguyen
   GPG key: 4C807502
   (if you're human and you know it)
      read my lisp: (responsep (questions 'technical)
                               (not (via 'mailing-list)))
                     => nil

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Uniq list in Guile
  2013-10-26  7:12 Uniq list in Guile Dmitry Bogatov
  2013-10-26  8:03 ` Panicz Maciej Godek
@ 2013-10-26 10:16 ` Ian Price
  2013-10-26 23:14   ` Ludovic Courtès
  1 sibling, 1 reply; 6+ messages in thread
From: Ian Price @ 2013-10-26 10:16 UTC (permalink / raw)
  To: Dmitry Bogatov; +Cc: guile-user

Dmitry Bogatov <KAction@gnu.org> writes:

> Hello!
>
> Recently I faced neseserity to uniq list in Guile. Surprised, that I
> did not found ready solution, I decided to prepare it.
It's not really that common. I wrote my own "uniq" (which is a terrible
name unix has inflicted on us) a long while back, and I think I've used
it exactly once. By contrast, I use shuffle (also not in the standard
library) fairly often, and I got that one shot down.

> By analogy with sorting functions in `sort.c` I prepared
> `uniq.c`(attached, but unfinished patch, lacking proper documentation).
> But after I became curious, how slower would be Scheme version with
> (set-cdr!). To my test (random list of small integers) it is about twice
> as slow.
>
> Is Guile Mainline interested in these routines, and if yes, does in your
> opinion doubling execution speed worth resorting to C hacking?
Without an actual benchmark, no.

-- 
Ian Price -- shift-reset.com

"Programming is like pinball. The reward for doing it well is
the opportunity to do it again" - from "The Wizardy Compiled"



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Uniq list in Guile
  2013-10-26  9:06   ` Thien-Thi Nguyen
@ 2013-10-26 12:05     ` Dmitry Bogatov
  0 siblings, 0 replies; 6+ messages in thread
From: Dmitry Bogatov @ 2013-10-26 12:05 UTC (permalink / raw)
  To: Thien-Thi Nguyen; +Cc: guile-user

[-- Attachment #1: Type: text/plain, Size: 605 bytes --]


Thien-Thi Nguyen <ttn@gnu.org> writes:

>    There's also a "delete-duplicates" function available that comes with
>    SRFI-1.
>
> See also ‘(ice-9 common-list) uniq’, which hardcodes ‘eq?’ comparison
> (via ‘memq’).  I, too, think it's better to not add this as a
> primitive.
Thanks for replies. These functions will be in use.


--
Best regards, Dmitry Bogatov <KAction@gnu.org>,
Free Software supporter and netiquette guardian.
	git clone git://kaction.name/rc-files.git --depth 1
	GPG: 54B7F00D
Html mail and proprietary format attachments are forwarded to /dev/null.

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Uniq list in Guile
  2013-10-26 10:16 ` Ian Price
@ 2013-10-26 23:14   ` Ludovic Courtès
  0 siblings, 0 replies; 6+ messages in thread
From: Ludovic Courtès @ 2013-10-26 23:14 UTC (permalink / raw)
  To: guile-user

Ian Price <ianprice90@googlemail.com> skribis:

>> Hello!
>>
>> Recently I faced neseserity to uniq list in Guile. Surprised, that I
>> did not found ready solution, I decided to prepare it.
> It's not really that common.

+1 (I’ve never needed it, FWIW.)

> I wrote my own "uniq" (which is a terrible name unix has inflicted on
> us) a long while back, and I think I've used it exactly once. By
> contrast, I use shuffle (also not in the standard library) fairly
> often, and I got that one shot down.
>
>> By analogy with sorting functions in `sort.c` I prepared
>> `uniq.c`(attached, but unfinished patch, lacking proper documentation).
>> But after I became curious, how slower would be Scheme version with
>> (set-cdr!). To my test (random list of small integers) it is about twice
>> as slow.
>>
>> Is Guile Mainline interested in these routines, and if yes, does in your
>> opinion doubling execution speed worth resorting to C hacking?
> Without an actual benchmark, no.

Agreed.

Ludo’.




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-10-26 23:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-26  7:12 Uniq list in Guile Dmitry Bogatov
2013-10-26  8:03 ` Panicz Maciej Godek
2013-10-26  9:06   ` Thien-Thi Nguyen
2013-10-26 12:05     ` Dmitry Bogatov
2013-10-26 10:16 ` Ian Price
2013-10-26 23:14   ` Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).