unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#22689: 25.1.50; implement logcount
@ 2016-02-15 23:18 Mark Oteiza
  2017-09-29 17:41 ` bug#22689: [PATCH] Add logcount Mark Oteiza
  2017-09-30 22:48 ` Paul Eggert
  0 siblings, 2 replies; 16+ messages in thread
From: Mark Oteiza @ 2016-02-15 23:18 UTC (permalink / raw)
  To: 22689


Wishlist:

Logcount, AKA popcount, Hamming weight, sideways sum.  Basically,
counting the number of 1s in a number's binary representation. The
concept is similar to that in `bool-vector-count-population', except the
argument would be a number, presumably a non-negative integer.

There are a number of ways to go about it, and beyond that I'm not sure
how to write it into data.c:

- Some compilers have a __builtin_popcount (gcc since 2004 and llvm
  since 2005)
- gmp has a popcount function
- lots of algorithms implementing it

Many are given here:

https://en.wikipedia.org/wiki/Hamming_weight

another resource:

http://rosettacode.org/wiki/Population_count

Guile's implementation uses table lookup:

http://git.savannah.gnu.org/cgit/guile.git/tree/libguile/numbers.c#n5234

When I needed it, I just naïvely ported an algorithm to elisp (in
particular ignoring the case of negative numbers):

  (let ((m1 #x555555555555555)
        (m2 #x333333333333333)
        (m4 #x0f0f0f0f0f0f0f0f)
        (h01 #x0101010101010101))
    (setq x (- x (logand (lsh x -1) m1)))
    (setq x (+ (logand x m2) (logand (lsh x -2) m2)))
    (setq x (logand (+ x (lsh x -4)) m4))
    (lsh (* x h01) -56))

The table lookup isn't so different

  (defvar logtab [0 1 1 2 1 2 2 3 1 2 2 3 2 3 3 4])
  (let ((count 0))
    (while (not (zerop x))
      (setq count (+ count (aref logtab (logand 15 x))))
      (setq x (lsh x -4)))
    count)

I couldn't find any implementation of this in calc either, but such a
function/operation would fit in calc-bin.





^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#22689: [PATCH] Add logcount
  2016-02-15 23:18 bug#22689: 25.1.50; implement logcount Mark Oteiza
@ 2017-09-29 17:41 ` Mark Oteiza
  2017-09-30 11:42   ` Eli Zaretskii
  2017-09-30 22:48 ` Paul Eggert
  1 sibling, 1 reply; 16+ messages in thread
From: Mark Oteiza @ 2017-09-29 17:41 UTC (permalink / raw)
  To: 22689

Hi,

I made an attempt implementing this:


diff --git a/src/data.c b/src/data.c
index e070be6c20..1332173dcd 100644
--- a/src/data.c
+++ b/src/data.c
@@ -3069,6 +3069,57 @@ usage: (logxor &rest INTS-OR-MARKERS)  */)
   return arith_driver (Alogxor, nargs, args);
 }
 
+#if defined (__POPCNT__) && defined (__GNUC__) && (__GNUC__> 4 || (__GNUC__== 4 && __GNUC_MINOR__> 1))
+#define HAVE_BUILTIN_POPCOUNTLL
+#endif
+
+static uint32_t
+bitcount32 (uint32_t b)
+{
+  b -= (b >> 1) & 0x55555555;
+  b = (b & 0x33333333) + ((b >> 2) & 0x33333333);
+  b = (b + (b >> 4)) & 0x0f0f0f0f;
+  return (b * 0x01010101) >> 24;
+}  
+
+static uint64_t
+bitcount64 (uint64_t b)
+{
+  b -= (b >> 1) & 0x5555555555555555;
+  b = (b & 0x3333333333333333) + ((b >> 2) & 0x3333333333333333);
+  b = (b + (b >> 4)) & 0x0f0f0f0f0f0f0f0f;
+  return (b * 0x0101010101010101) >> 56;
+}
+
+DEFUN ("logcount", Flogcount, Slogcount, 1, 1, 0,
+       doc: /* Return population count of VALUE. */)
+  (register Lisp_Object value)
+{
+  Lisp_Object res;
+
+  CHECK_NUMBER (value);
+
+#ifdef HAVE_BUILTIN_POPCOUNTLL
+  XSETINT (res, __builtin_popcount (XUINT (value)));
+#else  /* HAVE_BUILTIN_POPCOUNTLL */
+  if (XINT (value) <= UINT_MAX)
+    XSETINT (res, bitcount32 (XUINT (value)));
+  else if (XINT (value) <= ULONG_MAX)
+    XSETINT (res, bitcount64 (XUINT (value)));
+  else
+    {
+      int count;
+      EMACS_UINT v = XUINT (value);
+      for (count = 0; v; count++)
+        {
+          v &= v - 1;
+        }
+      XSETINT (res, v);
+    }
+#endif /* HAVE_BUILTIN_POPCOUNTLL */
+  return res;
+}
+
 static Lisp_Object
 ash_lsh_impl (Lisp_Object value, Lisp_Object count, bool lsh)
 {
@@ -3856,6 +3907,7 @@ syms_of_data (void)
   defsubr (&Slogand);
   defsubr (&Slogior);
   defsubr (&Slogxor);
+  defsubr (&Slogcount);
   defsubr (&Slsh);
   defsubr (&Sash);
   defsubr (&Sadd1);





^ permalink raw reply related	[flat|nested] 16+ messages in thread

* bug#22689: [PATCH] Add logcount
  2017-09-29 17:41 ` bug#22689: [PATCH] Add logcount Mark Oteiza
@ 2017-09-30 11:42   ` Eli Zaretskii
  2017-09-30 13:16     ` Mark Oteiza
  0 siblings, 1 reply; 16+ messages in thread
From: Eli Zaretskii @ 2017-09-30 11:42 UTC (permalink / raw)
  To: Mark Oteiza; +Cc: 22689

> Date: Fri, 29 Sep 2017 13:41:34 -0400
> From: Mark Oteiza <mvoteiza@udel.edu>
> 
> I made an attempt implementing this:

Thanks.  I'm not an expert on these ops, but otherwise the code looks
OK to me, modulo the minor comments below.

This will eventually need a NEWS entry and some docs in the ELisp
manual.

> +#if defined (__POPCNT__) && defined (__GNUC__) && (__GNUC__> 4 || (__GNUC__== 4 && __GNUC_MINOR__> 1))
> +#define HAVE_BUILTIN_POPCOUNTLL
> +#endif

Where does __POPCNT__ definition come from?  I don't see it in my
GCC's "gcc -dM" output.

As for the rest of the condition, I think you should use GNUC_PREREQ,
like this:

 #if GNUC_PREREQ (4, 1, 0)


> +static uint64_t
> +bitcount64 (uint64_t b)
> +{
> +  b -= (b >> 1) & 0x5555555555555555;
> +  b = (b & 0x3333333333333333) + ((b >> 2) & 0x3333333333333333);
> +  b = (b + (b >> 4)) & 0x0f0f0f0f0f0f0f0f;
> +  return (b * 0x0101010101010101) >> 56;

Shouldn't these constants have a ULL suffix, to make sure they are not
truncated to a 32-bit int?

> +#else  /* HAVE_BUILTIN_POPCOUNTLL */
> +  if (XINT (value) <= UINT_MAX)
> +    XSETINT (res, bitcount32 (XUINT (value)));
> +  else if (XINT (value) <= ULONG_MAX)
> +    XSETINT (res, bitcount64 (XUINT (value)));

The comparisons against Uxxx_MAX seem to assume that VALUE is
unsigned, but that's not guaranteed, right?





^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#22689: [PATCH] Add logcount
  2017-09-30 11:42   ` Eli Zaretskii
@ 2017-09-30 13:16     ` Mark Oteiza
  2017-09-30 13:59       ` Eli Zaretskii
  0 siblings, 1 reply; 16+ messages in thread
From: Mark Oteiza @ 2017-09-30 13:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 22689

On 30/09/17 at 02:42pm, Eli Zaretskii wrote:
> > +#if defined (__POPCNT__) && defined (__GNUC__) && (__GNUC__> 4 || (__GNUC__== 4 && __GNUC_MINOR__> 1))
> > +#define HAVE_BUILTIN_POPCOUNTLL
> > +#endif
> 
> Where does __POPCNT__ definition come from?  I don't see it in my
> GCC's "gcc -dM" output.
> 
> As for the rest of the condition, I think you should use GNUC_PREREQ,
> like this:
> 
>  #if GNUC_PREREQ (4, 1, 0)

I guess it comes from nowhere, that condition and the two helper
functions come from here
https://rosettacode.org/wiki/Population_count#C

> > +static uint64_t
> > +bitcount64 (uint64_t b)
> > +{
> > +  b -= (b >> 1) & 0x5555555555555555;
> > +  b = (b & 0x3333333333333333) + ((b >> 2) & 0x3333333333333333);
> > +  b = (b + (b >> 4)) & 0x0f0f0f0f0f0f0f0f;
> > +  return (b * 0x0101010101010101) >> 56;
> 
> Shouldn't these constants have a ULL suffix, to make sure they are not
> truncated to a 32-bit int?

Probably, I don't know--I'm out of my comfort zone here.

> > +#else  /* HAVE_BUILTIN_POPCOUNTLL */
> > +  if (XINT (value) <= UINT_MAX)
> > +    XSETINT (res, bitcount32 (XUINT (value)));
> > +  else if (XINT (value) <= ULONG_MAX)
> > +    XSETINT (res, bitcount64 (XUINT (value)));
> 
> The comparisons against Uxxx_MAX seem to assume that VALUE is
> unsigned, but that's not guaranteed, right?

True.  Should I instead be doing

  XINT (value) <= xxx_MAX &&
  XINT (value) >= xxx_MIN

or might there be a better check?  For negative values popcount
typically counts ones of the two's complement





^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#22689: [PATCH] Add logcount
  2017-09-30 13:16     ` Mark Oteiza
@ 2017-09-30 13:59       ` Eli Zaretskii
  2017-09-30 14:55         ` Mark Oteiza
  0 siblings, 1 reply; 16+ messages in thread
From: Eli Zaretskii @ 2017-09-30 13:59 UTC (permalink / raw)
  To: Mark Oteiza; +Cc: 22689

> Date: Sat, 30 Sep 2017 09:16:36 -0400
> From: Mark Oteiza <mvoteiza@udel.edu>
> Cc: 22689@debbugs.gnu.org
> 
> On 30/09/17 at 02:42pm, Eli Zaretskii wrote:
> > > +#if defined (__POPCNT__) && defined (__GNUC__) && (__GNUC__> 4 || (__GNUC__== 4 && __GNUC_MINOR__> 1))
> > > +#define HAVE_BUILTIN_POPCOUNTLL
> > > +#endif
> > 
> > Where does __POPCNT__ definition come from?  I don't see it in my
> > GCC's "gcc -dM" output.
> > 
> > As for the rest of the condition, I think you should use GNUC_PREREQ,
> > like this:
> > 
> >  #if GNUC_PREREQ (4, 1, 0)
> 
> I guess it comes from nowhere, that condition and the two helper
> functions come from here
> https://rosettacode.org/wiki/Population_count#C

I only see that symbol in the GCC sources, which I don't think are
relevant here.

If you drop the __POPCNT__ part, does the code still work for you?

> > > +static uint64_t
> > > +bitcount64 (uint64_t b)
> > > +{
> > > +  b -= (b >> 1) & 0x5555555555555555;
> > > +  b = (b & 0x3333333333333333) + ((b >> 2) & 0x3333333333333333);
> > > +  b = (b + (b >> 4)) & 0x0f0f0f0f0f0f0f0f;
> > > +  return (b * 0x0101010101010101) >> 56;
> > 
> > Shouldn't these constants have a ULL suffix, to make sure they are not
> > truncated to a 32-bit int?
> 
> Probably, I don't know--I'm out of my comfort zone here.

I'd use the suffixes.

> > > +#else  /* HAVE_BUILTIN_POPCOUNTLL */
> > > +  if (XINT (value) <= UINT_MAX)
> > > +    XSETINT (res, bitcount32 (XUINT (value)));
> > > +  else if (XINT (value) <= ULONG_MAX)
> > > +    XSETINT (res, bitcount64 (XUINT (value)));
> > 
> > The comparisons against Uxxx_MAX seem to assume that VALUE is
> > unsigned, but that's not guaranteed, right?
> 
> True.  Should I instead be doing
> 
>   XINT (value) <= xxx_MAX &&
>   XINT (value) >= xxx_MIN
> 
> or might there be a better check?  For negative values popcount
> typically counts ones of the two's complement

I'd just assign to an unsigned temporary, IIUC the semantics.

Btw, on Windows, a long is a 32-bit type, so I think we need
to check against ULONG_LONG_MAX as well.





^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#22689: [PATCH] Add logcount
  2017-09-30 13:59       ` Eli Zaretskii
@ 2017-09-30 14:55         ` Mark Oteiza
  2017-09-30 15:38           ` Eli Zaretskii
  2017-09-30 16:50           ` Benjamin Riefenstahl
  0 siblings, 2 replies; 16+ messages in thread
From: Mark Oteiza @ 2017-09-30 14:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 22689

On 30/09/17 at 04:59pm, Eli Zaretskii wrote:
> I only see that symbol in the GCC sources, which I don't think are
> relevant here.
> 
> If you drop the __POPCNT__ part, does the code still work for you?

Yes.

> > > > +#else  /* HAVE_BUILTIN_POPCOUNTLL */
> > > > +  if (XINT (value) <= UINT_MAX)
> > > > +    XSETINT (res, bitcount32 (XUINT (value)));
> > > > +  else if (XINT (value) <= ULONG_MAX)
> > > > +    XSETINT (res, bitcount64 (XUINT (value)));
> > > 
> > > The comparisons against Uxxx_MAX seem to assume that VALUE is
> > > unsigned, but that's not guaranteed, right?
> > 
> > True.  Should I instead be doing
> > 
> >   XINT (value) <= xxx_MAX &&
> >   XINT (value) >= xxx_MIN
> > 
> > or might there be a better check?  For negative values popcount
> > typically counts ones of the two's complement
> 
> I'd just assign to an unsigned temporary, IIUC the semantics.
> 
> Btw, on Windows, a long is a 32-bit type, so I think we need
> to check against ULONG_LONG_MAX as well.

I tried implementing your comments (and added a test).  I don't know
what to do in the #else for ULONG_LONG_MAX.

diff --git a/src/data.c b/src/data.c
index e070be6c20..8b3866151d 100644
--- a/src/data.c
+++ b/src/data.c
@@ -3069,6 +3069,66 @@ usage: (logxor &rest INTS-OR-MARKERS)  */)
   return arith_driver (Alogxor, nargs, args);
 }
 
+#if GNUC_PREREQ (4, 1, 0)
+#define HAVE_BUILTIN_POPCOUNTLL
+#endif
+
+#ifndef HAVE_BUILTIN_POPCOUNTLL
+static uint32_t
+logcount32 (uint32_t b)
+{
+  b -= (b >> 1) & 0x55555555;
+  b = (b & 0x33333333) + ((b >> 2) & 0x33333333);
+  b = (b + (b >> 4)) & 0x0f0f0f0f;
+  return (b * 0x01010101) >> 24;
+}  
+
+static uint64_t
+logcount64 (uint64_t b)
+{
+  b -= (b >> 1) & 0x5555555555555555ULL;
+  b = (b & 0x3333333333333333ULL) + ((b >> 2) & 0x3333333333333333ULL);
+  b = (b + (b >> 4)) & 0x0f0f0f0f0f0f0f0fULL;
+  return (b * 0x0101010101010101ULL) >> 56;
+}
+#endif /* HAVE_BUILTIN_POPCOUNTLL */
+
+DEFUN ("logcount", Flogcount, Slogcount, 1, 1, 0,
+       doc: /* Return population count of VALUE.
+If VALUE is negative, the count is of its two's complement representation.  */)
+  (register Lisp_Object value)
+{
+  Lisp_Object res;
+  EMACS_UINT v;
+
+  CHECK_NUMBER (value);
+
+  v = XUINT (value);
+#ifdef HAVE_BUILTIN_POPCOUNTLL
+  if (v <= UINT_MAX)
+    XSETINT (res, __builtin_popcount (v));
+  else if (v <= ULONG_MAX)
+    XSETINT (res, __builtin_popcountl (v));
+  else if (v <= ULONG_LONG_MAX)
+    XSETINT (res, __builtin_popcountll (v));
+#else  /* HAVE_BUILTIN_POPCOUNTLL */
+  if (v <= UINT_MAX)
+    XSETINT (res, logcount32 (v));
+  else if (v <= ULONG_MAX)
+    XSETINT (res, logcount64 (v));
+#endif /* HAVE_BUILTIN_POPCOUNTLL */
+  else
+    {
+      unsigned int count;
+      for (count = 0; v; count++)
+        {
+          v &= v - 1;
+        }
+      XSETINT (res, v);
+    }
+  return res;
+}
+
 static Lisp_Object
 ash_lsh_impl (Lisp_Object value, Lisp_Object count, bool lsh)
 {
@@ -3856,6 +3916,7 @@ syms_of_data (void)
   defsubr (&Slogand);
   defsubr (&Slogior);
   defsubr (&Slogxor);
+  defsubr (&Slogcount);
   defsubr (&Slsh);
   defsubr (&Sash);
   defsubr (&Sadd1);
diff --git a/test/src/data-tests.el b/test/src/data-tests.el
index 8de8c145d4..d1154cc5c4 100644
--- a/test/src/data-tests.el
+++ b/test/src/data-tests.el
@@ -107,6 +107,19 @@
   (should (isnan (min 1.0 0.0e+NaN)))
   (should (isnan (min 1.0 0.0e+NaN 1.1))))
 
+(defun data-tests-popcnt (byte)
+  "Calculate the Hamming weight of BYTE."
+  (setq byte (- byte (logand (lsh byte -1) #x55555555)))
+  (setq byte (+ (logand byte #x33333333) (logand (lsh byte -2) #x33333333)))
+  (lsh (* (logand (+ byte (lsh byte -4)) #x0f0f0f0f) #x01010101) -24))
+
+(ert-deftest data-tests-logcount ()
+  (should (cl-loop for n in (number-sequence 0 255)
+                   always (= (logcount n) (data-tests-popcnt n))))
+  ;; https://oeis.org/A000120
+  (should (= 11 (logcount 9727)))
+  (should (= 8 (logcount 9999))))
+
 ;; Bool vector tests.  Compactly represent bool vectors as hex
 ;; strings.
 





^ permalink raw reply related	[flat|nested] 16+ messages in thread

* bug#22689: [PATCH] Add logcount
  2017-09-30 14:55         ` Mark Oteiza
@ 2017-09-30 15:38           ` Eli Zaretskii
  2017-09-30 16:03             ` Mark Oteiza
  2017-09-30 16:50           ` Benjamin Riefenstahl
  1 sibling, 1 reply; 16+ messages in thread
From: Eli Zaretskii @ 2017-09-30 15:38 UTC (permalink / raw)
  To: Mark Oteiza; +Cc: 22689

> Date: Sat, 30 Sep 2017 10:55:25 -0400
> From: Mark Oteiza <mvoteiza@udel.edu>
> Cc: 22689@debbugs.gnu.org
> 
> > Btw, on Windows, a long is a 32-bit type, so I think we need
> > to check against ULONG_LONG_MAX as well.
> 
> I tried implementing your comments (and added a test).  I don't know
> what to do in the #else for ULONG_LONG_MAX.

Same as you do for ULONG_MAX, I think.





^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#22689: [PATCH] Add logcount
  2017-09-30 15:38           ` Eli Zaretskii
@ 2017-09-30 16:03             ` Mark Oteiza
  2017-09-30 16:17               ` Eli Zaretskii
  0 siblings, 1 reply; 16+ messages in thread
From: Mark Oteiza @ 2017-09-30 16:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 22689

On 30/09/17 at 06:38pm, Eli Zaretskii wrote:
> > Date: Sat, 30 Sep 2017 10:55:25 -0400
> > From: Mark Oteiza <mvoteiza@udel.edu>
> > Cc: 22689@debbugs.gnu.org
> > 
> > > Btw, on Windows, a long is a 32-bit type, so I think we need
> > > to check against ULONG_LONG_MAX as well.
> > 
> > I tried implementing your comments (and added a test).  I don't know
> > what to do in the #else for ULONG_LONG_MAX.
> 
> Same as you do for ULONG_MAX, I think.

so the following?

  else if (v <= ULONG_MAX || v <= ULONG_LONG_MAX)
      XSETINT (res, logcount64 (v));






^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#22689: [PATCH] Add logcount
  2017-09-30 16:03             ` Mark Oteiza
@ 2017-09-30 16:17               ` Eli Zaretskii
  2017-09-30 17:07                 ` Mark Oteiza
  0 siblings, 1 reply; 16+ messages in thread
From: Eli Zaretskii @ 2017-09-30 16:17 UTC (permalink / raw)
  To: Mark Oteiza; +Cc: 22689

> Date: Sat, 30 Sep 2017 12:03:31 -0400
> From: Mark Oteiza <mvoteiza@udel.edu>
> Cc: 22689@debbugs.gnu.org
> 
> > > I tried implementing your comments (and added a test).  I don't know
> > > what to do in the #else for ULONG_LONG_MAX.
> > 
> > Same as you do for ULONG_MAX, I think.
> 
> so the following?
> 
>   else if (v <= ULONG_MAX || v <= ULONG_LONG_MAX)
>       XSETINT (res, logcount64 (v));

Yes.





^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#22689: [PATCH] Add logcount
  2017-09-30 14:55         ` Mark Oteiza
  2017-09-30 15:38           ` Eli Zaretskii
@ 2017-09-30 16:50           ` Benjamin Riefenstahl
  2017-09-30 16:59             ` Mark Oteiza
  1 sibling, 1 reply; 16+ messages in thread
From: Benjamin Riefenstahl @ 2017-09-30 16:50 UTC (permalink / raw)
  To: Mark Oteiza; +Cc: 22689

Hi Mark,

Just a drive-by note:

Mark Oteiza writes:
> +    {
> +      unsigned int count;
> +      for (count = 0; v; count++)
> +        {
> +          v &= v - 1;
> +        }
> +      XSETINT (res, v);
> +    }

Isn't this a fancy way of just saying "XSETINT (res, 0)"?  The variable
"count" is incremented but never used, so without it the loop
degenerates to

      while (v)
        {
          v &= v - 1;
        }

I other words, this just loops until "v == 0".

Maybe that assignment should be "XSETINT (res, count)"?  That would
actually use the variable "count".

benny





^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#22689: [PATCH] Add logcount
  2017-09-30 16:50           ` Benjamin Riefenstahl
@ 2017-09-30 16:59             ` Mark Oteiza
  0 siblings, 0 replies; 16+ messages in thread
From: Mark Oteiza @ 2017-09-30 16:59 UTC (permalink / raw)
  To: Benjamin Riefenstahl; +Cc: 22689

On 30/09/17 at 06:50pm, Benjamin Riefenstahl wrote:
> Hi Mark,
> 
> Just a drive-by note:
> 
> Mark Oteiza writes:
> > +    {
> > +      unsigned int count;
> > +      for (count = 0; v; count++)
> > +        {
> > +          v &= v - 1;
> > +        }
> > +      XSETINT (res, v);
> > +    }
> 
> Isn't this a fancy way of just saying "XSETINT (res, 0)"?  The variable
> "count" is incremented but never used, so without it the loop
> degenerates to
> 
>       while (v)
>         {
>           v &= v - 1;
>         }
> 
> I other words, this just loops until "v == 0".
> 
> Maybe that assignment should be "XSETINT (res, count)"?  That would
> actually use the variable "count".

I think you're right, thank you.





^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#22689: [PATCH] Add logcount
  2017-09-30 16:17               ` Eli Zaretskii
@ 2017-09-30 17:07                 ` Mark Oteiza
  2017-09-30 17:53                   ` Eli Zaretskii
  0 siblings, 1 reply; 16+ messages in thread
From: Mark Oteiza @ 2017-09-30 17:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 22689

On 30/09/17 at 07:17pm, Eli Zaretskii wrote:
> > Date: Sat, 30 Sep 2017 12:03:31 -0400
> > From: Mark Oteiza <mvoteiza@udel.edu>
> > Cc: 22689@debbugs.gnu.org
> > 
> > > > I tried implementing your comments (and added a test).  I don't know
> > > > what to do in the #else for ULONG_LONG_MAX.
> > > 
> > > Same as you do for ULONG_MAX, I think.
> > 
> > so the following?
> > 
> >   else if (v <= ULONG_MAX || v <= ULONG_LONG_MAX)
> >       XSETINT (res, logcount64 (v));
> 
> Yes.

Ok.  Patch including this, Benjamin's catch, and documentation.


 doc/lispref/numbers.texi | 10 ++++++++
 etc/NEWS                 |  3 +++
 src/data.c               | 61 ++++++++++++++++++++++++++++++++++++++++++++++++
 test/src/data-tests.el   | 13 +++++++++++
 4 files changed, 87 insertions(+)

diff --git a/doc/lispref/numbers.texi b/doc/lispref/numbers.texi
index 3fdc94169b..29c6429eda 100644
--- a/doc/lispref/numbers.texi
+++ b/doc/lispref/numbers.texi
@@ -1107,6 +1107,16 @@ Bitwise Operations
 @end example
 @end defun
 
+@defun logcount integer
+This function returns the population count of @var{integer}: the
+number of ones in the binary representation of @var{integer}.
+
+@example
+(logcount 42)     ; 42 = #b101010
+     @result{} 3
+@end example
+@end defun
+
 @node Math Functions
 @section Standard Mathematical Functions
 @cindex transcendental functions
diff --git a/etc/NEWS b/etc/NEWS
index 238c7b7ea4..8fbc354fc0 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -31,6 +31,9 @@ When you add a new item, use the appropriate mark if you are sure it applies,
 \f
 * Changes in Emacs 27.1
 
++++
+** New function 'logcount' calculates an integer's Hamming weight.
+
 \f
 * Editing Changes in Emacs 27.1
 
diff --git a/src/data.c b/src/data.c
index e070be6c20..f470d6ffaa 100644
--- a/src/data.c
+++ b/src/data.c
@@ -3069,6 +3069,66 @@ usage: (logxor &rest INTS-OR-MARKERS)  */)
   return arith_driver (Alogxor, nargs, args);
 }
 
+#if GNUC_PREREQ (4, 1, 0)
+#define HAVE_BUILTIN_POPCOUNTLL
+#endif
+
+#ifndef HAVE_BUILTIN_POPCOUNTLL
+static uint32_t
+logcount32 (uint32_t b)
+{
+  b -= (b >> 1) & 0x55555555;
+  b = (b & 0x33333333) + ((b >> 2) & 0x33333333);
+  b = (b + (b >> 4)) & 0x0f0f0f0f;
+  return (b * 0x01010101) >> 24;
+}  
+
+static uint64_t
+logcount64 (uint64_t b)
+{
+  b -= (b >> 1) & 0x5555555555555555ULL;
+  b = (b & 0x3333333333333333ULL) + ((b >> 2) & 0x3333333333333333ULL);
+  b = (b + (b >> 4)) & 0x0f0f0f0f0f0f0f0fULL;
+  return (b * 0x0101010101010101ULL) >> 56;
+}
+#endif /* HAVE_BUILTIN_POPCOUNTLL */
+
+DEFUN ("logcount", Flogcount, Slogcount, 1, 1, 0,
+       doc: /* Return population count of VALUE.
+If VALUE is negative, the count is of its two's complement representation.  */)
+  (register Lisp_Object value)
+{
+  Lisp_Object res;
+  EMACS_UINT v;
+
+  CHECK_NUMBER (value);
+
+  v = XUINT (value);
+#ifdef HAVE_BUILTIN_POPCOUNTLL
+  if (v <= UINT_MAX)
+    XSETINT (res, __builtin_popcount (v));
+  else if (v <= ULONG_MAX)
+    XSETINT (res, __builtin_popcountl (v));
+  else if (v <= ULONG_LONG_MAX)
+    XSETINT (res, __builtin_popcountll (v));
+#else  /* HAVE_BUILTIN_POPCOUNTLL */
+  if (v <= UINT_MAX)
+    XSETINT (res, logcount32 (v));
+  else if (v <= ULONG_MAX || v <= ULONG_LONG_MAX)
+    XSETINT (res, logcount64 (v));
+#endif /* HAVE_BUILTIN_POPCOUNTLL */
+  else
+    {
+      unsigned int count;
+      for (count = 0; v; count++)
+        {
+          v &= v - 1;
+        }
+      XSETINT (res, count);
+    }
+  return res;
+}
+
 static Lisp_Object
 ash_lsh_impl (Lisp_Object value, Lisp_Object count, bool lsh)
 {
@@ -3856,6 +3916,7 @@ syms_of_data (void)
   defsubr (&Slogand);
   defsubr (&Slogior);
   defsubr (&Slogxor);
+  defsubr (&Slogcount);
   defsubr (&Slsh);
   defsubr (&Sash);
   defsubr (&Sadd1);
diff --git a/test/src/data-tests.el b/test/src/data-tests.el
index 8de8c145d4..d1154cc5c4 100644
--- a/test/src/data-tests.el
+++ b/test/src/data-tests.el
@@ -107,6 +107,19 @@
   (should (isnan (min 1.0 0.0e+NaN)))
   (should (isnan (min 1.0 0.0e+NaN 1.1))))
 
+(defun data-tests-popcnt (byte)
+  "Calculate the Hamming weight of BYTE."
+  (setq byte (- byte (logand (lsh byte -1) #x55555555)))
+  (setq byte (+ (logand byte #x33333333) (logand (lsh byte -2) #x33333333)))
+  (lsh (* (logand (+ byte (lsh byte -4)) #x0f0f0f0f) #x01010101) -24))
+
+(ert-deftest data-tests-logcount ()
+  (should (cl-loop for n in (number-sequence 0 255)
+                   always (= (logcount n) (data-tests-popcnt n))))
+  ;; https://oeis.org/A000120
+  (should (= 11 (logcount 9727)))
+  (should (= 8 (logcount 9999))))
+
 ;; Bool vector tests.  Compactly represent bool vectors as hex
 ;; strings.
 





^ permalink raw reply related	[flat|nested] 16+ messages in thread

* bug#22689: [PATCH] Add logcount
  2017-09-30 17:07                 ` Mark Oteiza
@ 2017-09-30 17:53                   ` Eli Zaretskii
  2017-09-30 18:18                     ` Mark Oteiza
  0 siblings, 1 reply; 16+ messages in thread
From: Eli Zaretskii @ 2017-09-30 17:53 UTC (permalink / raw)
  To: Mark Oteiza; +Cc: 22689

> Date: Sat, 30 Sep 2017 13:07:58 -0400
> From: Mark Oteiza <mvoteiza@udel.edu>
> Cc: 22689@debbugs.gnu.org
> 
> Ok.  Patch including this, Benjamin's catch, and documentation.

Thanks.  A minor comment on the docs:

> +@defun logcount integer
> +This function returns the population count of @var{integer}: the
> +number of ones in the binary representation of @var{integer}.

I would mention here the "Hamming weight" term (in @dfn) that is
referenced in NEWS, and I would also add a few index entries:

 @cindex popcount
 @cindex Hamming weight
 @cindex counting set bits





^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#22689: [PATCH] Add logcount
  2017-09-30 17:53                   ` Eli Zaretskii
@ 2017-09-30 18:18                     ` Mark Oteiza
  0 siblings, 0 replies; 16+ messages in thread
From: Mark Oteiza @ 2017-09-30 18:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 22689-done

On 30/09/17 at 08:53pm, Eli Zaretskii wrote:
> > Date: Sat, 30 Sep 2017 13:07:58 -0400
> > From: Mark Oteiza <mvoteiza@udel.edu>
> > Cc: 22689@debbugs.gnu.org
> > 
> > Ok.  Patch including this, Benjamin's catch, and documentation.
> 
> Thanks.  A minor comment on the docs:
> 
> > +@defun logcount integer
> > +This function returns the population count of @var{integer}: the
> > +number of ones in the binary representation of @var{integer}.
> 
> I would mention here the "Hamming weight" term (in @dfn) that is
> referenced in NEWS, and I would also add a few index entries:
> 
>  @cindex popcount
>  @cindex Hamming weight
>  @cindex counting set bits

Done, pushed as 185f3334.  Thank you.





^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#22689: [PATCH] Add logcount
  2016-02-15 23:18 bug#22689: 25.1.50; implement logcount Mark Oteiza
  2017-09-29 17:41 ` bug#22689: [PATCH] Add logcount Mark Oteiza
@ 2017-09-30 22:48 ` Paul Eggert
  2017-09-30 23:22   ` Mark Oteiza
  1 sibling, 1 reply; 16+ messages in thread
From: Paul Eggert @ 2017-09-30 22:48 UTC (permalink / raw)
  To: Mark Oteiza; +Cc: 22689

[-- Attachment #1: Type: text/plain, Size: 509 bytes --]

I looked at this patch after it landed in 'master'. Thanks for contributing it.

A couple of minor improvements. First, the implementation of logcount can use 
the count_one_bits functions already used by Emacs, rather than reinventing that 
wheel; these functions should be just as fast as __builtin_popcount etc. when 
using GCC. Second, logcount should count zero bits in negative numbers, for 
compatibility with Common Lisp. I installed the attached two patches to 
implement these improvements.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Simplify-logcount-implementation.patch --]
[-- Type: text/x-patch; name="0001-Simplify-logcount-implementation.patch", Size: 2608 bytes --]

From 6fe5f5db496f5962e1504991bf77e2b7d23a681f Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sat, 30 Sep 2017 13:12:33 -0700
Subject: [PATCH 1/2] Simplify logcount implementation

* src/data.c (HAVE_BUILTIN_POPCOUNTLL, logcount32, logcount64):
Remove.
(Flogcount): Simplify by using count_one_bits.
---
 src/data.c | 60 +++++++-----------------------------------------------------
 1 file changed, 7 insertions(+), 53 deletions(-)

diff --git a/src/data.c b/src/data.c
index b595e3fb1a..fd8cdd19aa 100644
--- a/src/data.c
+++ b/src/data.c
@@ -3069,64 +3069,18 @@ usage: (logxor &rest INTS-OR-MARKERS)  */)
   return arith_driver (Alogxor, nargs, args);
 }
 
-#if GNUC_PREREQ (4, 1, 0)
-#define HAVE_BUILTIN_POPCOUNTLL
-#endif
-
-#ifndef HAVE_BUILTIN_POPCOUNTLL
-static uint32_t
-logcount32 (uint32_t b)
-{
-  b -= (b >> 1) & 0x55555555;
-  b = (b & 0x33333333) + ((b >> 2) & 0x33333333);
-  b = (b + (b >> 4)) & 0x0f0f0f0f;
-  return (b * 0x01010101) >> 24;
-}
-
-static uint64_t
-logcount64 (uint64_t b)
-{
-  b -= (b >> 1) & 0x5555555555555555ULL;
-  b = (b & 0x3333333333333333ULL) + ((b >> 2) & 0x3333333333333333ULL);
-  b = (b + (b >> 4)) & 0x0f0f0f0f0f0f0f0fULL;
-  return (b * 0x0101010101010101ULL) >> 56;
-}
-#endif /* HAVE_BUILTIN_POPCOUNTLL */
-
 DEFUN ("logcount", Flogcount, Slogcount, 1, 1, 0,
        doc: /* Return population count of VALUE.
 If VALUE is negative, the count is of its two's complement representation.  */)
-  (register Lisp_Object value)
+  (Lisp_Object value)
 {
-  Lisp_Object res;
-  EMACS_UINT v;
-
   CHECK_NUMBER (value);
-
-  v = XUINT (value);
-#ifdef HAVE_BUILTIN_POPCOUNTLL
-  if (v <= UINT_MAX)
-    XSETINT (res, __builtin_popcount (v));
-  else if (v <= ULONG_MAX)
-    XSETINT (res, __builtin_popcountl (v));
-  else if (v <= ULONG_LONG_MAX)
-    XSETINT (res, __builtin_popcountll (v));
-#else  /* HAVE_BUILTIN_POPCOUNTLL */
-  if (v <= UINT_MAX)
-    XSETINT (res, logcount32 (v));
-  else if (v <= ULONG_MAX || v <= ULONG_LONG_MAX)
-    XSETINT (res, logcount64 (v));
-#endif /* HAVE_BUILTIN_POPCOUNTLL */
-  else
-    {
-      unsigned int count;
-      for (count = 0; v; count++)
-        {
-          v &= v - 1;
-        }
-      XSETINT (res, count);
-    }
-  return res;
+  EMACS_UINT v = XUINT (value);
+  return make_number (EMACS_UINT_WIDTH <= UINT_WIDTH
+		      ? count_one_bits (v)
+		      : EMACS_UINT_WIDTH <= ULONG_WIDTH
+		      ? count_one_bits_l (v)
+		      : count_one_bits_ll (v));
 }
 
 static Lisp_Object
-- 
2.13.5


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 0002-Make-logcount-act-like-CL-on-negative-arg.patch --]
[-- Type: text/x-patch; name="0002-Make-logcount-act-like-CL-on-negative-arg.patch", Size: 3148 bytes --]

From 3b6d6b03383ac413fd0f25c4ae4fed1c780a3c29 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sat, 30 Sep 2017 15:36:52 -0700
Subject: [PATCH 2/2] Make logcount act like CL on negative arg

Behaving like Common Lisp is less likely to lead to surprises,
as it yields the same answers on 32- vs 64-bit machines.
* doc/lispref/numbers.texi (Bitwise Operations):
Document behavior on negative integers.
* src/data.c (Flogcount):
Behave like Common Lisp for negative arguments.
* test/src/data-tests.el (data-tests-popcnt)
(data-tests-logcount): Test negative args too.
---
 doc/lispref/numbers.texi | 7 ++++++-
 src/data.c               | 6 ++++--
 test/src/data-tests.el   | 4 +++-
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/doc/lispref/numbers.texi b/doc/lispref/numbers.texi
index 5058063af4..be74b0c611 100644
--- a/doc/lispref/numbers.texi
+++ b/doc/lispref/numbers.texi
@@ -1113,9 +1113,14 @@ Bitwise Operations
 @defun logcount integer
 This function returns the @dfn{Hamming weight} of @var{integer}: the
 number of ones in the binary representation of @var{integer}.
+If @var{integer} is negative, it returns the number of zero bits in
+its two's complement binary representation.  The result is always
+nonnegative.
 
 @example
-(logcount 42)     ; 42 = #b101010
+(logcount 43)     ; 43 = #b101011
+     @result{} 4
+(logcount -43)    ; -43 = #b111...1010101
      @result{} 3
 @end example
 @end defun
diff --git a/src/data.c b/src/data.c
index fd8cdd19aa..2e7f3e017b 100644
--- a/src/data.c
+++ b/src/data.c
@@ -3071,11 +3071,13 @@ usage: (logxor &rest INTS-OR-MARKERS)  */)
 
 DEFUN ("logcount", Flogcount, Slogcount, 1, 1, 0,
        doc: /* Return population count of VALUE.
-If VALUE is negative, the count is of its two's complement representation.  */)
+This is the number of one bits in the two's complement representation
+of VALUE.  If VALUE is negative, return the number of zero bits in the
+representation.  */)
   (Lisp_Object value)
 {
   CHECK_NUMBER (value);
-  EMACS_UINT v = XUINT (value);
+  EMACS_INT v = XINT (value) < 0 ? -1 - XINT (value) : XINT (value);
   return make_number (EMACS_UINT_WIDTH <= UINT_WIDTH
 		      ? count_one_bits (v)
 		      : EMACS_UINT_WIDTH <= ULONG_WIDTH
diff --git a/test/src/data-tests.el b/test/src/data-tests.el
index d1154cc5c4..374d1689b9 100644
--- a/test/src/data-tests.el
+++ b/test/src/data-tests.el
@@ -109,12 +109,14 @@
 
 (defun data-tests-popcnt (byte)
   "Calculate the Hamming weight of BYTE."
+  (if (< byte 0)
+      (setq byte (lognot byte)))
   (setq byte (- byte (logand (lsh byte -1) #x55555555)))
   (setq byte (+ (logand byte #x33333333) (logand (lsh byte -2) #x33333333)))
   (lsh (* (logand (+ byte (lsh byte -4)) #x0f0f0f0f) #x01010101) -24))
 
 (ert-deftest data-tests-logcount ()
-  (should (cl-loop for n in (number-sequence 0 255)
+  (should (cl-loop for n in (number-sequence -255 255)
                    always (= (logcount n) (data-tests-popcnt n))))
   ;; https://oeis.org/A000120
   (should (= 11 (logcount 9727)))
-- 
2.13.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* bug#22689: [PATCH] Add logcount
  2017-09-30 22:48 ` Paul Eggert
@ 2017-09-30 23:22   ` Mark Oteiza
  0 siblings, 0 replies; 16+ messages in thread
From: Mark Oteiza @ 2017-09-30 23:22 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 22689

On 30/09/17 at 03:48pm, Paul Eggert wrote:
> I looked at this patch after it landed in 'master'. Thanks for contributing it.
> 
> A couple of minor improvements. First, the implementation of logcount can
> use the count_one_bits functions already used by Emacs, rather than
> reinventing that wheel; these functions should be just as fast as
> __builtin_popcount etc. when using GCC. Second, logcount should count zero
> bits in negative numbers, for compatibility with Common Lisp. I installed
> the attached two patches to implement these improvements.

Thank you for the improvements--I was not aware of count-one-bits.h but
am pleasantly surprised.





^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-09-30 23:22 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-15 23:18 bug#22689: 25.1.50; implement logcount Mark Oteiza
2017-09-29 17:41 ` bug#22689: [PATCH] Add logcount Mark Oteiza
2017-09-30 11:42   ` Eli Zaretskii
2017-09-30 13:16     ` Mark Oteiza
2017-09-30 13:59       ` Eli Zaretskii
2017-09-30 14:55         ` Mark Oteiza
2017-09-30 15:38           ` Eli Zaretskii
2017-09-30 16:03             ` Mark Oteiza
2017-09-30 16:17               ` Eli Zaretskii
2017-09-30 17:07                 ` Mark Oteiza
2017-09-30 17:53                   ` Eli Zaretskii
2017-09-30 18:18                     ` Mark Oteiza
2017-09-30 16:50           ` Benjamin Riefenstahl
2017-09-30 16:59             ` Mark Oteiza
2017-09-30 22:48 ` Paul Eggert
2017-09-30 23:22   ` Mark Oteiza

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).