integer overflow handling for most-negative-fixnum

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* integer overflow handling for most-negative-fixnum
@ 2018-07-19  2:39 Andy Moreton
  2018-07-20 22:10 ` Paul Eggert
  0 siblings, 1 reply; 29+ messages in thread
From: Andy Moreton @ 2018-07-19  2:39 UTC (permalink / raw)
  To: emacs-devel


The patches for bug#30408 contained this testcase:

> +(ert-deftest read-large-integer ()
> +  (should-error (read (format "%d0" most-negative-fixnum))
> +                :type 'overflow-error)
> +  (should-error (read (format "%+d" (* -8.0 most-negative-fixnum)))
> +                :type 'overflow-error)
> +  (should-error (read (substring (format "%d" most-negative-fixnum) 1))
> +                :type 'overflow-error)

These tests are all reasonable, as these numbers are definitely out of
fixnum range.

> +  (should-error (read (format "#x%x" most-negative-fixnum))
> +                :type 'overflow-error)
> +  (should-error (read (format "#o%o" most-negative-fixnum))
> +                :type 'overflow-error)

However, these tests (and the overflow behaviour) seem completely wrong
to me. The reported error is for an out of range fixnum, when what was
tested was an explicitly valid integer value.

Why are non-base10 numbers treated as signed ? Why can they not be
treated as unsigned, so the hex and octal values can be read as valid
input ?

ELISP> most-negative-fixnum
-2305843009213693952 (#o200000000000000000000, #x2000000000000000)
ELISP> -2305843009213693952
-2305843009213693952 (#o200000000000000000000, #x2000000000000000)
ELISP> #o200000000000000000000
*** Read error ***  Arithmetic overflow error: "200000000000000000000 is out of fixnum range; maybe set ‘read-integer-overflow-as-float’?"
ELISP> #x2000000000000000
*** Read error ***  Arithmetic overflow error: "2000000000000000 is out of fixnum range; maybe set ‘read-integer-overflow-as-float’?"

Why does the error message ignore the base of the input value ? It
should give a more accurate description of the problematic input.


    AndyM




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: integer overflow handling for most-negative-fixnum
  2018-07-19  2:39 integer overflow handling for most-negative-fixnum Andy Moreton
@ 2018-07-20 22:10 ` Paul Eggert
  2018-07-21  5:22   ` Helmut Eller
                     ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Paul Eggert @ 2018-07-20 22:10 UTC (permalink / raw)
  To: Andy Moreton, emacs-devel

On 07/18/2018 07:39 PM, Andy Moreton wrote:
>> +  (should-error (read (format "#x%x" most-negative-fixnum))
>> +                :type 'overflow-error)
>> +  (should-error (read (format "#o%o" most-negative-fixnum))
>> +                :type 'overflow-error)
> However, these tests (and the overflow behaviour) seem completely wrong
> to me. The reported error is for an out of range fixnum, when what was
> tested was an explicitly valid integer value.

Here, the code is executing something like (read "#x2000000000000000"), 
which is indeed out of range for a fixnum, since fixnums are signed and 
go up only to #x1fffffffffffffff (on 64-bit hosts).

> Why are non-base10 numbers treated as signed ?

Emacs fixnums are signed; there is no 'unsigned' type in Emacs Lisp. 
Although we could of course add such a type, it'd be better to expend 
our limited development resources on adding bignums. The 'unsigned' type 
in C has been a glitch magnet.

It might perhaps be useful to add an Emacs Lisp syntax for negative 
hexadecimal numbers, e.g., -#x10 would be equivalent to -16.

> Why does the error message ignore the base of the input value ? It
> should give a more accurate description of the problematic input.

Yes, it should. I'll see if I can pry loose some time to look into that.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: integer overflow handling for most-negative-fixnum
  2018-07-20 22:10 ` Paul Eggert
@ 2018-07-21  5:22   ` Helmut Eller
  2018-07-21  9:47   ` Andy Moreton
  2018-07-21  9:49   ` Paul Eggert
  2 siblings, 0 replies; 29+ messages in thread
From: Helmut Eller @ 2018-07-21  5:22 UTC (permalink / raw)
  To: emacs-devel

On Fri, Jul 20 2018, Paul Eggert wrote:

> It might perhaps be useful to add an Emacs Lisp syntax for negative
> hexadecimal numbers, e.g., -#x10 would be equivalent to -16.

That's already there: #x-10 is read as -16.

Helmut




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: integer overflow handling for most-negative-fixnum
  2018-07-20 22:10 ` Paul Eggert
  2018-07-21  5:22   ` Helmut Eller
@ 2018-07-21  9:47   ` Andy Moreton
  2018-07-21 10:14     ` Eli Zaretskii
                       ` (2 more replies)
  2018-07-21  9:49   ` Paul Eggert
  2 siblings, 3 replies; 29+ messages in thread
From: Andy Moreton @ 2018-07-21  9:47 UTC (permalink / raw)
  To: emacs-devel

On Fri 20 Jul 2018, Paul Eggert wrote:

> On 07/18/2018 07:39 PM, Andy Moreton wrote:
>> Why are non-base10 numbers treated as signed ?
>
> Emacs fixnums are signed; there is no 'unsigned' type in Emacs Lisp. Although
> we could of course add such a type, it'd be better to expend our limited
> development resources on adding bignums.

Bignums are not relevant to this discussion. Non base10 representations
of emacs fixnums are not treated consistently, and prevent round trip
handling of the full range of valid fixnum values.

> The 'unsigned' type in C has been a glitch magnet.

I disagree: it depends on what you are doing. Signed types cause fewer
problems for arithmetic, but unsigned types are essential for bit
twiddling of hardware register values. Problems usually arise from
mixing signed and unsigned types in the same expression.

> It might perhaps be useful to add an Emacs Lisp syntax for negative
> hexadecimal numbers, e.g., -#x10 would be equivalent to -16.

As noted later in this thread, #x-10 is valid syntax for this value.

ELISP> most-negative-fixnum
-2305843009213693952 (#o200000000000000000000, #x2000000000000000)
ELISP> #x2000000000000000
*** Read error ***  Arithmetic overflow error: "2000000000000000 (base 16) is out of fixnum range; maybe set ‘read-integer-overflow-as-float’?"
ELISP> #x-2000000000000000
-2305843009213693952 (#o200000000000000000000, #x2000000000000000)

If the reader will not accept #x2000000000000000 as input then the value
of most-negative-fixnum as a non base10 number octal should have
negative sign to ensure consistent handling.

Either the print routines or the reader have a bug.

    AndyM

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: integer overflow handling for most-negative-fixnum
  2018-07-20 22:10 ` Paul Eggert
  2018-07-21  5:22   ` Helmut Eller
  2018-07-21  9:47   ` Andy Moreton
@ 2018-07-21  9:49   ` Paul Eggert
  2 siblings, 0 replies; 29+ messages in thread
From: Paul Eggert @ 2018-07-21  9:49 UTC (permalink / raw)
  To: Andy Moreton, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 320 bytes --]

Paul Eggert wrote:
> 
>> Why does the error message ignore the base of the input value ? It
>> should give a more accurate description of the problematic input.
> 
> Yes, it should. I'll see if I can pry loose some time to look into that.

I installed the attached to report the base of the out-of-range fixnum.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Report-base-of-out-of-range-input-fixnums.patch --]
[-- Type: text/x-patch; name="0001-Report-base-of-out-of-range-input-fixnums.patch", Size: 1167 bytes --]

From 1780502da6b9ac8d3063dfd56f675318568283dc Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sat, 21 Jul 2018 00:25:27 -0700
Subject: [PATCH] Report base of out-of-range input fixnums

* src/lread.c (string_to_number): Report the base of an
out-of-range fixnum.  Problem reported by Andy Moreton in:
https://lists.gnu.org/r/emacs-devel/2018-07/msg00696.html
---
 src/lread.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/lread.c b/src/lread.c
index 4eba863..50fc6ef 100644
--- a/src/lread.c
+++ b/src/lread.c
@@ -3798,10 +3798,11 @@ string_to_number (char const *string, int base, int flags)
 
       if (! (state & DOT_CHAR) && ! (flags & S2N_OVERFLOW_TO_FLOAT))
 	{
-	  AUTO_STRING (fmt, ("%s is out of fixnum range; "
+	  AUTO_STRING (fmt, ("%s (base %d) is out of fixnum range; "
 			     "maybe set `read-integer-overflow-as-float'?"));
 	  AUTO_STRING_WITH_LEN (arg, string, cp - string);
-	  xsignal1 (Qoverflow_error, CALLN (Fformat_message, fmt, arg));
+	  xsignal1 (Qoverflow_error,
+		    CALLN (Fformat_message, fmt, arg, make_number (base)));
 	}
     }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: integer overflow handling for most-negative-fixnum
  2018-07-21  9:47   ` Andy Moreton
@ 2018-07-21 10:14     ` Eli Zaretskii
  2018-07-21 13:06       ` Andy Moreton
  2018-07-21 17:15       ` Stefan Monnier
  2018-07-21 12:42     ` Helmut Eller
  2018-07-21 17:46     ` Paul Eggert
  2 siblings, 2 replies; 29+ messages in thread
From: Eli Zaretskii @ 2018-07-21 10:14 UTC (permalink / raw)
  To: Andy Moreton; +Cc: emacs-devel

> From: Andy Moreton <andrewjmoreton@gmail.com>
> Date: Sat, 21 Jul 2018 10:47:25 +0100
> 
> ELISP> most-negative-fixnum
> -2305843009213693952 (#o200000000000000000000, #x2000000000000000)
> ELISP> #x2000000000000000
> *** Read error ***  Arithmetic overflow error: "2000000000000000 (base 16) is out of fixnum range; maybe set ‘read-integer-overflow-as-float’?"
> ELISP> #x-2000000000000000
> -2305843009213693952 (#o200000000000000000000, #x2000000000000000)
> 
> If the reader will not accept #x2000000000000000 as input then the value
> of most-negative-fixnum as a non base10 number octal should have
> negative sign to ensure consistent handling.

Please don't forget the important use case of showing the hex
representation of a negative number.  There are situations where I'd
like to have "M-: -10 RET" display #x3ffffffffffffff6 rather than
#x-0a.

So I'd prefer we fixed the reader, if that's possible without breaking
something important.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: integer overflow handling for most-negative-fixnum
  2018-07-21  9:47   ` Andy Moreton
  2018-07-21 10:14     ` Eli Zaretskii
@ 2018-07-21 12:42     ` Helmut Eller
  2018-07-21 17:46     ` Paul Eggert
  2 siblings, 0 replies; 29+ messages in thread
From: Helmut Eller @ 2018-07-21 12:42 UTC (permalink / raw)
  To: emacs-devel

> Either the print routines or the reader have a bug.

Read and print work just fine:
(= (read (prin1-to-string most-negative-fixnum))
    most-negative-fixnum) => t

The problem is (format "%x" most-negative-fixnum) => "2000000000000000".
If this should return "-1" that would obviously be an incompatible
change.  And in many cases one really wants to "see" the two complement
representation of negative fixnums.  Just like in gdb: p/x -1 prints
0xffffff.  Interestingly, gdb doesn't seem to have a mode to print hex
numbers with a negative sign; so I guess nobody ever wanted that.

In contrast, it should be unproblematic to add an optional base argument
to number-to-string.  So (number-to-string -1 16) could return
"-2000000000000000" without breaking anything.  Actually, I occasionally
wondered why string-to-number accepts a base argument but
number-to-string doesn't.

(Of course, the question what (format "%x" <bignum>) should do is still
open; the easiest would probably be to signal an error.)

Helmut

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: integer overflow handling for most-negative-fixnum
  2018-07-21 10:14     ` Eli Zaretskii
@ 2018-07-21 13:06       ` Andy Moreton
  2018-07-21 17:15       ` Stefan Monnier
  1 sibling, 0 replies; 29+ messages in thread
From: Andy Moreton @ 2018-07-21 13:06 UTC (permalink / raw)
  To: emacs-devel

On Sat 21 Jul 2018, Eli Zaretskii wrote:

>> From: Andy Moreton <andrewjmoreton@gmail.com>
>> Date: Sat, 21 Jul 2018 10:47:25 +0100
>> 
>> ELISP> most-negative-fixnum
>> -2305843009213693952 (#o200000000000000000000, #x2000000000000000)
>> ELISP> #x2000000000000000
>> *** Read error *** Arithmetic overflow error: "2000000000000000 (base 16) is
>> out of fixnum range; maybe set ‘read-integer-overflow-as-float’?"
>> ELISP> #x-2000000000000000
>> -2305843009213693952 (#o200000000000000000000, #x2000000000000000)
>> 
>> If the reader will not accept #x2000000000000000 as input then the value
>> of most-negative-fixnum as a non base10 number octal should have
>> negative sign to ensure consistent handling.
>
> Please don't forget the important use case of showing the hex
> representation of a negative number.  There are situations where I'd
> like to have "M-: -10 RET" display #x3ffffffffffffff6 rather than
> #x-0a.

That use case is how I stumbled upon this in the first place :-)

I also much prefer seeing hex without a sign i.e. showing the
underlying represention, not the value.

> So I'd prefer we fixed the reader, if that's possible without breaking
> something important.

I agree, and I hope that is possible.

    AndyM




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: integer overflow handling for most-negative-fixnum
  2018-07-21 10:14     ` Eli Zaretskii
  2018-07-21 13:06       ` Andy Moreton
@ 2018-07-21 17:15       ` Stefan Monnier
  2018-07-21 17:48         ` Paul Eggert
  2018-07-21 18:10         ` Eli Zaretskii
  1 sibling, 2 replies; 29+ messages in thread
From: Stefan Monnier @ 2018-07-21 17:15 UTC (permalink / raw)
  To: emacs-devel

>> ELISP> most-negative-fixnum
>> -2305843009213693952 (#o200000000000000000000, #x2000000000000000)
>> ELISP> #x2000000000000000
>> *** Read error ***  Arithmetic overflow error: "2000000000000000 (base 16)
>> is out of fixnum range; maybe set ‘read-integer-overflow-as-float’?"
>> ELISP> #x-2000000000000000
>> -2305843009213693952 (#o200000000000000000000, #x2000000000000000)
>> 
>> If the reader will not accept #x2000000000000000 as input then the value
>> of most-negative-fixnum as a non base10 number octal should have
>> negative sign to ensure consistent handling.

I don't see what this has to do with the reader's behavior on
#x2000000000000000.  The only problem I see above is how to *print*
a value like #x-2000000000000000, which is currently done incorrectly.

> Please don't forget the important use case of showing the hex
> representation of a negative number.  There are situations where I'd
> like to have "M-: -10 RET" display #x3ffffffffffffff6 rather than
> #x-0a.

IIRC this was discussed recently in the context of the introduction of
bignums.  I didn't follow the thread in its entirety, but to me both
behaviors are valid and desirable, so we need some way for the
programmer to choose which one to use when (i.e. print the negative
number as ... well ... a negative number even when printed in a non-10
base, or print it as a "bitfield" where the leading ones of the two's
complement representation are just more bits rather than a negative
sign)

> So I'd prefer we fixed the reader, if that's possible without breaking
> something important.

Not sure what "fix the reader" would do here.  Do you mean read
#x2000000000000000 as a negative number?  When/why/where would that be
a good idea?

        Stefan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: integer overflow handling for most-negative-fixnum
  2018-07-21  9:47   ` Andy Moreton
  2018-07-21 10:14     ` Eli Zaretskii
  2018-07-21 12:42     ` Helmut Eller
@ 2018-07-21 17:46     ` Paul Eggert
  2018-07-23 11:49       ` Andy Moreton
  2 siblings, 1 reply; 29+ messages in thread
From: Paul Eggert @ 2018-07-21 17:46 UTC (permalink / raw)
  To: Andy Moreton, emacs-devel

Andy Moreton wrote:

> Bignums are not relevant to this discussion.

I'm afraid they are. Whatever solution we come up with in this problem, should 
be compatible with bignums. Our solution should not assume that integers are of 
fixed width.

> Non base10 representations
> of emacs fixnums are not treated consistently, and prevent round trip
> handling of the full range of valid fixnum values.

As Helmut mentioned, read and print work just fine. The problem is that if you 
use some formats, you don't get a round trip. Of course this problem is endemic 
to formats; e.g., (read (format "%g" X)) does not yield X for all floating-point 
values X, due to rounding. Still, it would be helpful if the usual kind of 
formatting hex integers were round-trip more often. I'll propose something along 
these lines in my next email.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: integer overflow handling for most-negative-fixnum
  2018-07-21 17:15       ` Stefan Monnier
@ 2018-07-21 17:48         ` Paul Eggert
  2018-07-21 18:12           ` Stefan Monnier
  2018-07-21 20:10           ` Helmut Eller
  2018-07-21 18:10         ` Eli Zaretskii
  1 sibling, 2 replies; 29+ messages in thread
From: Paul Eggert @ 2018-07-21 17:48 UTC (permalink / raw)
  To: Stefan Monnier, emacs-devel

Stefan Monnier wrote:
> The only problem I see above is how to *print*
> a value like #x-2000000000000000, which is currently done incorrectly.

Yes, this is the crux of the matter.

> to me both
> behaviors are valid and desirable, so we need some way for the
> programmer to choose which one to use when

How about this idea. First, we extend 'format' to support bitwidth modifiers for 
twos-complement representation. For example, (format "%/24x" -1) would format 
just the low-order 24 bits of the integer, and would return "ffffff" regardless 
of machine word size or whether bignums are used.

Second, we change (format "%x" -1) to return "-1" rather than a 
machine-dependent string like "3fffffffffffffff" as it does now. That would make 
Elisp be more machine-independent, would solve Andy's problem, and would solve 
other problems once we have bignums, and in hindsight it's what we should have 
done originally. However, it would be an incompatible change, so let's have the 
behavior depend on a compatibility variable, much as we already do for 
read-integer-overflow-as-float.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: integer overflow handling for most-negative-fixnum
  2018-07-21 17:15       ` Stefan Monnier
  2018-07-21 17:48         ` Paul Eggert
@ 2018-07-21 18:10         ` Eli Zaretskii
  2018-07-21 18:17           ` Paul Eggert
  2018-07-21 18:42           ` Stefan Monnier
  1 sibling, 2 replies; 29+ messages in thread
From: Eli Zaretskii @ 2018-07-21 18:10 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Sat, 21 Jul 2018 13:15:49 -0400
> 
> > So I'd prefer we fixed the reader, if that's possible without breaking
> > something important.
> 
> Not sure what "fix the reader" would do here.  Do you mean read
> #x2000000000000000 as a negative number?

Yes.  More importantly, I'd like it to read #x3fffffffffffffff as -1.

> When/why/where would that be a good idea?

When wouldn't it be a good idea?



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: integer overflow handling for most-negative-fixnum
  2018-07-21 17:48         ` Paul Eggert
@ 2018-07-21 18:12           ` Stefan Monnier
  2018-07-23 19:18             ` Paul Eggert
  2018-07-21 20:10           ` Helmut Eller
  1 sibling, 1 reply; 29+ messages in thread
From: Stefan Monnier @ 2018-07-21 18:12 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

> How about this idea. First, we extend 'format' to support bitwidth modifiers
> for twos-complement representation. For example, (format "%/24x" -1) would
> format just the low-order 24 bits of the integer, and would return "ffffff"
> regardless of machine word size or whether bignums are used.
>
> Second, we change (format "%x" -1) to return "-1" rather
> than a machine-dependent string like "3fffffffffffffff" as it does now.  That
> would make Elisp be more machine-independent, would solve Andy's problem,
> and would solve other problems once we have bignums, and in hindsight it's
> what we should have done originally. However, it would be an incompatible
> change, so let's have the behavior depend on a compatibility variable, much
> as we already do for read-integer-overflow-as-float.

Sounds great to me,


        Stefan



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: integer overflow handling for most-negative-fixnum
  2018-07-21 18:10         ` Eli Zaretskii
@ 2018-07-21 18:17           ` Paul Eggert
  2018-07-21 18:23             ` Eli Zaretskii
  2018-07-21 18:42           ` Stefan Monnier
  1 sibling, 1 reply; 29+ messages in thread
From: Paul Eggert @ 2018-07-21 18:17 UTC (permalink / raw)
  To: Eli Zaretskii, Stefan Monnier; +Cc: emacs-devel

Eli Zaretskii wrote:
> I'd like it to read #x3fffffffffffffff as -1.

That won't work once we have bignums. Even now, it'd be problematic behavior on 
32-bit machines. We should insulate Lisp programmers from fixnum width as much 
as possible.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: integer overflow handling for most-negative-fixnum
  2018-07-21 18:17           ` Paul Eggert
@ 2018-07-21 18:23             ` Eli Zaretskii
  0 siblings, 0 replies; 29+ messages in thread
From: Eli Zaretskii @ 2018-07-21 18:23 UTC (permalink / raw)
  To: Paul Eggert; +Cc: monnier, emacs-devel

> Cc: emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sat, 21 Jul 2018 11:17:24 -0700
> 
> Eli Zaretskii wrote:
> > I'd like it to read #x3fffffffffffffff as -1.
> 
> That won't work once we have bignums. Even now, it'd be problematic behavior on 
> 32-bit machines. We should insulate Lisp programmers from fixnum width as much 
> as possible.

Currently, this doesn't work on any machine.  Is that reasonable?



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: integer overflow handling for most-negative-fixnum
  2018-07-21 18:10         ` Eli Zaretskii
  2018-07-21 18:17           ` Paul Eggert
@ 2018-07-21 18:42           ` Stefan Monnier
  2018-07-21 18:51             ` Eli Zaretskii
  1 sibling, 1 reply; 29+ messages in thread
From: Stefan Monnier @ 2018-07-21 18:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

>> Not sure what "fix the reader" would do here.  Do you mean read
>> #x2000000000000000 as a negative number?
> Yes.  More importantly, I'd like it to read #x3fffffffffffffff as -1.

Hmm... me don't like that at all.

>> When/why/where would that be a good idea?
> When wouldn't it be a good idea?

When the user wrote #x3fffffffffffffff to mean "the positive number
written as 3fffffffffffffff in hexadecimal" (which as far as I know is
what "#x3fffffffffffffff" means in Elisp).

You can only read it as "-1" based on an assumption of fixed-width
two's-complement representation but the bitwidth of Emacs numbers is
something that can change between Emacs versions and
compilation options.  [ Especially since the leading digit is "3"
rather than "f", which means it can only be treated as -1 under the
assumption that the author really knew *exactly* how many bits his
particular Emacs build uses for integers.  ]

        Stefan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: integer overflow handling for most-negative-fixnum
  2018-07-21 18:42           ` Stefan Monnier
@ 2018-07-21 18:51             ` Eli Zaretskii
  2018-07-21 20:42               ` Stefan Monnier
  0 siblings, 1 reply; 29+ messages in thread
From: Eli Zaretskii @ 2018-07-21 18:51 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: emacs-devel@gnu.org
> Date: Sat, 21 Jul 2018 14:42:32 -0400
> 
> > Yes.  More importantly, I'd like it to read #x3fffffffffffffff as -1.
> 
> Hmm... me don't like that at all.
> 
> >> When/why/where would that be a good idea?
> > When wouldn't it be a good idea?
> 
> When the user wrote #x3fffffffffffffff to mean "the positive number
> written as 3fffffffffffffff in hexadecimal"

There's no such positive number, at least not as fixnum.

> You can only read it as "-1" based on an assumption of fixed-width
> two's-complement representation but the bitwidth of Emacs numbers is
> something that can change between Emacs versions and
> compilation options.

I'm talking about an Emacs with 64-bit EMACS_INT.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: integer overflow handling for most-negative-fixnum
  2018-07-21 17:48         ` Paul Eggert
  2018-07-21 18:12           ` Stefan Monnier
@ 2018-07-21 20:10           ` Helmut Eller
  2018-07-21 21:02             ` Helmut Eller
  2018-07-23 19:40             ` Paul Eggert
  1 sibling, 2 replies; 29+ messages in thread
From: Helmut Eller @ 2018-07-21 20:10 UTC (permalink / raw)
  To: emacs-devel

On Sat, Jul 21 2018, Paul Eggert wrote:

> How about this idea. First, we extend 'format' to support bitwidth
> modifiers for twos-complement representation. For example, (format
> "%/24x" -1) would format just the low-order 24 bits of the integer,
> and would return "ffffff" regardless of machine word size or whether
> bignums are used.

That sounds reasonable.

> Second, we change (format "%x" -1) to return "-1" rather than a
> machine-dependent string like "3fffffffffffffff" as it does now. That
> would make Elisp be more machine-independent, would solve Andy's
> problem, and would solve other problems once we have bignums, and in
> hindsight it's what we should have done originally.

Can't you simply use some other character for this, like %z (or %ℤ) and
leave %x alone.  BTW, %i also seems to be a valid format specifier but
where is it documented?

> However, it would
> be an incompatible change, so let's have the behavior depend on a
> compatibility variable, much as we already do for
> read-integer-overflow-as-float.

This sounds more like that other desaster: text-quoting-style.

Helmut




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: integer overflow handling for most-negative-fixnum
  2018-07-21 18:51             ` Eli Zaretskii
@ 2018-07-21 20:42               ` Stefan Monnier
  0 siblings, 0 replies; 29+ messages in thread
From: Stefan Monnier @ 2018-07-21 20:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

>> When the user wrote #x3fffffffffffffff to mean "the positive number
>> written as 3fffffffffffffff in hexadecimal"
> There's no such positive number, at least not as fixnum.

That's no excuse to turn around and return a negative number instead.
If I try to read too large a decimal number, Emacs signals an
`overflow-error` instead of returning a negative number, so it makes
sense to do the same for hexadecimal numbers.

>> You can only read it as "-1" based on an assumption of fixed-width
>> two's-complement representation but the bitwidth of Emacs numbers is
>> something that can change between Emacs versions and
>> compilation options.
> I'm talking about an Emacs with 64-bit EMACS_INT.

Right, but the reader is designed to `read` files which are usually not
specific to a given build (historically, there have been some
build-dependence on the result of `read`, but we've generally
considered them as bugs or misfeatures).

        Stefan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: integer overflow handling for most-negative-fixnum
  2018-07-21 20:10           ` Helmut Eller
@ 2018-07-21 21:02             ` Helmut Eller
  2018-07-23 19:40             ` Paul Eggert
  1 sibling, 0 replies; 29+ messages in thread
From: Helmut Eller @ 2018-07-21 21:02 UTC (permalink / raw)
  To: emacs-devel

>> Second, we change (format "%x" -1) to return "-1" rather than a
>> machine-dependent string like "3fffffffffffffff" as it does now. That
>> would make Elisp be more machine-independent, would solve Andy's
>> problem, and would solve other problems once we have bignums, and in
>> hindsight it's what we should have done originally.
>
> Can't you simply use some other character for this, like %z (or %ℤ) and
> leave %x alone.

%a also seems like a candidate: in C99 %a prints floating point numbers
in hexadecimal notation.  That might be useful so that Emacs can
write/read floats without rounding errors.  It seems fairly natural to
use the same format specifier for bignums too.

Helmut




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: integer overflow handling for most-negative-fixnum
  2018-07-21 17:46     ` Paul Eggert
@ 2018-07-23 11:49       ` Andy Moreton
  2018-07-23 17:30         ` Paul Eggert
  0 siblings, 1 reply; 29+ messages in thread
From: Andy Moreton @ 2018-07-23 11:49 UTC (permalink / raw)
  To: emacs-devel

On Sat 21 Jul 2018, Paul Eggert wrote:

> Andy Moreton wrote:
>
>> Bignums are not relevant to this discussion.
>
> I'm afraid they are. Whatever solution we come up with in this problem, should
> be compatible with bignums. Our solution should not assume that integers are
> of fixed width.
>
>> Non base10 representations
>> of emacs fixnums are not treated consistently, and prevent round trip
>> handling of the full range of valid fixnum values.
>
> As Helmut mentioned, read and print work just fine. The problem is that if you
> use some formats, you don't get a round trip. Of course this problem is
> endemic to formats; e.g., (read (format "%g" X)) does not yield X for all
> floating-point values X, due to rounding. Still, it would be helpful if the
> usual kind of formatting hex integers were round-trip more often. I'll propose
> something along these lines in my next email.

I see that you have pushed 57c4bc146b ("0x%x → %#x in elisp formats"),
which will cause breakage as format is not well behaved:

ELISP> (format "%#x" 1)
"0x1"
ELISP> (format "%#x" 0)
"0"                      ; Missing "0x" prefix (same misfeature as in C)

ELISP> (format "%#08x" 1)
"0x000001"               ; Wrong number of digits printed
ELISP> (format "%#08x" 0)
"00000000"

For both of the above reasons, this change is not a good idea.

    AndyM




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: integer overflow handling for most-negative-fixnum
  2018-07-23 11:49       ` Andy Moreton
@ 2018-07-23 17:30         ` Paul Eggert
  2018-07-23 21:11           ` Andy Moreton
  0 siblings, 1 reply; 29+ messages in thread
From: Paul Eggert @ 2018-07-23 17:30 UTC (permalink / raw)
  To: Andy Moreton, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1480 bytes --]

Andy Moreton wrote:
> I see that you have pushed 57c4bc146b ("0x%x → %#x in elisp formats"),
> which will cause breakage as format is not well behaved:
> 
> ELISP> (format "%#x" 1)
> "0x1"
> ELISP> (format "%#x" 0)
> "0"                      ; Missing "0x" prefix (same misfeature as in C)
> 
> ELISP> (format "%#08x" 1)
> "0x000001"               ; Wrong number of digits printed
> ELISP> (format "%#08x" 0)
> "00000000"
> 
> For both of the above reasons, this change is not a good idea.

Thanks for mentioning the issue, as I had forgotten that (format "%#x" 0) yields 
"0" not "0x0". However, I don't see how 57c4bc146b breaks anything. The 
generated strings are used as nonces or arbitrary labels and as far as I can see 
nobody cares whether 0 is printed as "0" or as "0x0". (As none of the changes 
involve anything like "%#08x" I don't see the relevance of your second example.)

As I understand it 57c4bc146b is merely a nicety; it's certainly not needed for 
correctness now, and it's not needed for correctness even if we change how 
negative integers are formatted with %x. If I'm wrong and you still see 
correctness problems please feel free to revert it (though I'd like to know what 
the problems are...).

I now notice that this wrinkle about 'format' isn't documented despite being 
longstanding behavior that mirrors the C standard. It should be documented, so I 
installed the attached patch into master to fix the oversight.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-format-x-0-yields-0-not-0x0.patch --]
[-- Type: text/x-patch; name="0001-format-x-0-yields-0-not-0x0.patch", Size: 1836 bytes --]

From 90256285e107641b064d6ec51a9c5bb03c3eee6a Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Mon, 23 Jul 2018 10:23:35 -0700
Subject: [PATCH] (format "%#x" 0) yields "0", not "0x0"

* doc/lispref/strings.texi (Formatting Strings):
* src/editfns.c (Fformat): Document this.
---
 doc/lispref/strings.texi | 2 +-
 src/editfns.c            | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi
index f68199e..2fff3c7 100644
--- a/doc/lispref/strings.texi
+++ b/doc/lispref/strings.texi
@@ -1025,7 +1025,7 @@ Formatting Strings
 
   The flag @samp{#} specifies an alternate form which depends on
 the format in use.  For @samp{%o}, it ensures that the result begins
-with a @samp{0}.  For @samp{%x} and @samp{%X}, it prefixes the result
+with a @samp{0}.  For @samp{%x} and @samp{%X}, it prefixes nonzero results
 with @samp{0x} or @samp{0X}.  For @samp{%e} and @samp{%f}, the
 @samp{#} flag means include a decimal point even if the precision is
 zero.  For @samp{%g}, it always includes a decimal point, and also
diff --git a/src/editfns.c b/src/editfns.c
index ccc0d27..09f836c 100644
--- a/src/editfns.c
+++ b/src/editfns.c
@@ -4202,7 +4202,7 @@ The - and 0 flags affect the width specifier, as described below.
 
 The # flag means to use an alternate display form for %o, %x, %X, %e,
 %f, and %g sequences: for %o, it ensures that the result begins with
-\"0\"; for %x and %X, it prefixes the result with \"0x\" or \"0X\";
+\"0\"; for %x and %X, it prefixes nonzero results with \"0x\" or \"0X\";
 for %e and %f, it causes a decimal point to be included even if the
 precision is zero; for %g, it causes a decimal point to be
 included even if the precision is zero, and also forces trailing
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: integer overflow handling for most-negative-fixnum
  2018-07-21 18:12           ` Stefan Monnier
@ 2018-07-23 19:18             ` Paul Eggert
  2018-07-23 19:57               ` Stefan Monnier
  0 siblings, 1 reply; 29+ messages in thread
From: Paul Eggert @ 2018-07-23 19:18 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Stefan Monnier wrote:
> Sounds great to me,

I implemented a form of that proposal, but it turns out that bitwidth modifiers 
are tricker than they look (not least: tricky to document). So I audited the GNU 
Emacs source code looking to see how much of the bitwidth-modifier functionality 
would be useful -- and I couldn't find any places where it would be. So for now 
I plan to implement that proposal without bitwidth modifiers. Please see Bug#32252.

I can add some form of bitwidth modifiers later if we find use cases that can 
help motivate what to do about the corner cases.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: integer overflow handling for most-negative-fixnum
  2018-07-21 20:10           ` Helmut Eller
  2018-07-21 21:02             ` Helmut Eller
@ 2018-07-23 19:40             ` Paul Eggert
  1 sibling, 0 replies; 29+ messages in thread
From: Paul Eggert @ 2018-07-23 19:40 UTC (permalink / raw)
  To: Helmut Eller, emacs-devel

Helmut Eller wrote:

> Can't you simply use some other character for this, like %z (or %ℤ) and
> leave %x alone.

It'd require three new letters, since %o is also affected. I'd rather not chew 
up so many letters for such an obscure feature. And as it doesn't appear that 
this change will break much if anything, I'd rather not chew up any letters at 
all; let's just fix %x etc. so that they're not machine-dependent.

>> However, it would
>> be an incompatible change, so let's have the behavior depend on a
>> compatibility variable, much as we already do for
>> read-integer-overflow-as-float.
> 
> This sounds more like that other desaster: text-quoting-style.

:-) Yes, I remember that well. However, this is a much smaller deal. With 
text-quoting style I had to change many uses and the magnitude of the task was 
known (most of it was in the patches I proposed). Here, I've audited the Emacs 
source code and have not found any need to change anything, except optionally 
for appearance to make a few strings look nicer, and in a couple of tests that 
stress Emacs with unlikely inputs, tests that I've updated in the patch I 
proposed. See Bug#32252.

> %a also seems like a candidate: in C99 %a prints floating point numbers
> in hexadecimal notation.
If we were to implement %a it should be reasonably consistent with C99 %a, and 
this is something quite different from %x and %X and %o. So we should use a 
different letter, if we're going to use any new letter which I'd rather not.

> That might be useful so that Emacs can write/read floats without rounding errors.

We already have that: (format "%s" N) outputs any number N in a format that can 
be read back without rounding errors. Similar functionality is available via 
(number-to-string N), (prin1-to-string N), etc. With all these, the only loss of 
floating-point info is with NaNs, something I'd like to fix even though they're 
not numbers.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: integer overflow handling for most-negative-fixnum
  2018-07-23 19:18             ` Paul Eggert
@ 2018-07-23 19:57               ` Stefan Monnier
  2018-07-23 23:09                 ` Paul Eggert
  0 siblings, 1 reply; 29+ messages in thread
From: Stefan Monnier @ 2018-07-23 19:57 UTC (permalink / raw)
  To: emacs-devel

> I can add some form of bitwidth modifiers later if we find use cases that
> can help motivate what to do about the corner cases.

Another option for the "bitwidth case" is to do it outside of `format`.
I.e. instead of

    (format "%/32x" n)

you'd use

    (format "%x" (truncate-to-bitwidth n 32))

where `truncate-to-bitwidth` would turn a negative number into its
positive equivalent (mod 2^bitwidth).  That shouldn't be too hard to
implement in Elisp once we have bignums.


        Stefan




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: integer overflow handling for most-negative-fixnum
  2018-07-23 17:30         ` Paul Eggert
@ 2018-07-23 21:11           ` Andy Moreton
  2018-07-24 12:14             ` Andreas Schwab
  0 siblings, 1 reply; 29+ messages in thread
From: Andy Moreton @ 2018-07-23 21:11 UTC (permalink / raw)
  To: emacs-devel

On Mon 23 Jul 2018, Paul Eggert wrote:

> Andy Moreton wrote:
>> I see that you have pushed 57c4bc146b ("0x%x → %#x in elisp formats"),
>> which will cause breakage as format is not well behaved:
>>
>> ELISP> (format "%#x" 1)
>> "0x1"
>> ELISP> (format "%#x" 0)
>> "0"                      ; Missing "0x" prefix (same misfeature as in C)
>>
>> ELISP> (format "%#08x" 1)
>> "0x000001"               ; Wrong number of digits printed
>> ELISP> (format "%#08x" 0)
>> "00000000"
>>
>> For both of the above reasons, this change is not a good idea.
>
> Thanks for mentioning the issue, as I had forgotten that (format "%#x" 0)
> yields "0" not "0x0". However, I don't see how 57c4bc146b breaks anything. The
> generated strings are used as nonces or arbitrary labels and as far as I can
> see nobody cares whether 0 is printed as "0" or as "0x0". (As none of the
> changes involve anything like "%#08x" I don't see the relevance of your second
> example.)

The breakage is that anything expected to line up nicely in columns is
broken when "0x%x" is replaced by "%#x" as any zero values are no longer
the same output width.

Also, if the width is specified as 8 I expect to see 8 digits of output,
not 6.

> I now notice that this wrinkle about 'format' isn't documented despite being
> longstanding behavior that mirrors the C standard. It should be documented, so
> I installed the attached patch into master to fix the oversight.

Thanks for the doc fix. While I think this is a misfeature in C and in elisp,
it is decades too late to change the behaviour.

    AndyM




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: integer overflow handling for most-negative-fixnum
  2018-07-23 19:57               ` Stefan Monnier
@ 2018-07-23 23:09                 ` Paul Eggert
  0 siblings, 0 replies; 29+ messages in thread
From: Paul Eggert @ 2018-07-23 23:09 UTC (permalink / raw)
  To: Stefan Monnier, emacs-devel

On 07/23/2018 12:57 PM, Stefan Monnier wrote:
> Another option for the "bitwidth case" is to do it outside of `format`.
> I.e. instead of
>
>      (format "%/32x" n)
>
> you'd use
>
>      (format "%x" (truncate-to-bitwidth n 32))
>
> where `truncate-to-bitwidth` would turn a negative number into its
> positive equivalent (mod 2^bitwidth).  That shouldn't be too hard to
> implement in Elisp once we have bignums.

Yes, with bignums it can be implemented this way, if I understand you 
aright:

(defun truncate-to-bitwidth (num bits)
   (logand num (- (lsh 1 bits) 1)))




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: integer overflow handling for most-negative-fixnum
  2018-07-23 21:11           ` Andy Moreton
@ 2018-07-24 12:14             ` Andreas Schwab
  2018-07-24 13:06               ` Andy Moreton
  0 siblings, 1 reply; 29+ messages in thread
From: Andreas Schwab @ 2018-07-24 12:14 UTC (permalink / raw)
  To: Andy Moreton; +Cc: emacs-devel

On Jul 23 2018, Andy Moreton <andrewjmoreton@gmail.com> wrote:

> Also, if the width is specified as 8 I expect to see 8 digits of output,
> not 6.

The width specifies the field width, not the number of digits.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: integer overflow handling for most-negative-fixnum
  2018-07-24 12:14             ` Andreas Schwab
@ 2018-07-24 13:06               ` Andy Moreton
  0 siblings, 0 replies; 29+ messages in thread
From: Andy Moreton @ 2018-07-24 13:06 UTC (permalink / raw)
  To: emacs-devel

On Tue 24 Jul 2018, Andreas Schwab wrote:

> On Jul 23 2018, Andy Moreton <andrewjmoreton@gmail.com> wrote:
>
>> Also, if the width is specified as 8 I expect to see 8 digits of output,
>> not 6.
>
> The width specifies the field width, not the number of digits.


Good point. "(format "%#0.8x" 1)" behaves as expected.

    AndyM




^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2018-07-24 13:06 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-19  2:39 integer overflow handling for most-negative-fixnum Andy Moreton
2018-07-20 22:10 ` Paul Eggert
2018-07-21  5:22   ` Helmut Eller
2018-07-21  9:47   ` Andy Moreton
2018-07-21 10:14     ` Eli Zaretskii
2018-07-21 13:06       ` Andy Moreton
2018-07-21 17:15       ` Stefan Monnier
2018-07-21 17:48         ` Paul Eggert
2018-07-21 18:12           ` Stefan Monnier
2018-07-23 19:18             ` Paul Eggert
2018-07-23 19:57               ` Stefan Monnier
2018-07-23 23:09                 ` Paul Eggert
2018-07-21 20:10           ` Helmut Eller
2018-07-21 21:02             ` Helmut Eller
2018-07-23 19:40             ` Paul Eggert
2018-07-21 18:10         ` Eli Zaretskii
2018-07-21 18:17           ` Paul Eggert
2018-07-21 18:23             ` Eli Zaretskii
2018-07-21 18:42           ` Stefan Monnier
2018-07-21 18:51             ` Eli Zaretskii
2018-07-21 20:42               ` Stefan Monnier
2018-07-21 12:42     ` Helmut Eller
2018-07-21 17:46     ` Paul Eggert
2018-07-23 11:49       ` Andy Moreton
2018-07-23 17:30         ` Paul Eggert
2018-07-23 21:11           ` Andy Moreton
2018-07-24 12:14             ` Andreas Schwab
2018-07-24 13:06               ` Andy Moreton
2018-07-21  9:49   ` Paul Eggert

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).