documentation of integers, fixnums and bignums

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* documentation of integers, fixnums and bignums
@ 2018-09-08 16:09 Paul Eggert
  2018-09-08 16:27 ` Eli Zaretskii
  0 siblings, 1 reply; 18+ messages in thread
From: Paul Eggert @ 2018-09-08 16:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Emacs Development

>  DEFUN ("encode-char", Fencode_char, Sencode_char, 2, 2, 0,
>         doc: /* Encode the character CH into a code-point of CHARSET.
> -Return nil if CHARSET doesn't include CH.  */)
> +Return the encoded code-point, a fixnum if its value is small enough,
> +otherwise a bignum.
> +Return nil if CHARSET doesn't support CH.  */)

As the intent is that Emacs should treat integers transparently, so that 
ordinary code needn't worry about the difference between bignums and fixnums, it 
would be better if documentation like this simply says something like "Return 
the encoded code-point, an integer", as this is more concise.

It's true that the current integer implementation is a bit different, in that eq 
and = now treat integers differently; but this is a global property that is best 
documented in the integer section of the Emacs manual. We shouldn't need to add 
a comment in each function returning an integer in effect saying "watch out! eq 
and = might act differently on these integers!" as the cost to users of this 
documentation complication will exceed its benefit in the long run.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: documentation of integers, fixnums and bignums
  2018-09-08 16:09 Paul Eggert
@ 2018-09-08 16:27 ` Eli Zaretskii
  2018-09-08 18:15   ` Stefan Monnier
  2018-09-08 20:05   ` Paul Eggert
  0 siblings, 2 replies; 18+ messages in thread
From: Eli Zaretskii @ 2018-09-08 16:27 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Emacs-devel

> From: Paul Eggert <eggert@cs.ucla.edu>
> Cc: Emacs Development <Emacs-devel@gnu.org>
> Date: Sat, 8 Sep 2018 09:09:03 -0700
> 
> >  DEFUN ("encode-char", Fencode_char, Sencode_char, 2, 2, 0,
> >         doc: /* Encode the character CH into a code-point of CHARSET.
> > -Return nil if CHARSET doesn't include CH.  */)
> > +Return the encoded code-point, a fixnum if its value is small enough,
> > +otherwise a bignum.
> > +Return nil if CHARSET doesn't support CH.  */)
> 
> As the intent is that Emacs should treat integers transparently, so that 
> ordinary code needn't worry about the difference between bignums and fixnums, it 
> would be better if documentation like this simply says something like "Return 
> the encoded code-point, an integer", as this is more concise.

Sorry, I disagree.  I think it's important for the Lisp programmers to
know what kind of objects they could get as return values.  Maybe in
some distant future we will no longer care about the difference
between fixnums and bignums, but as of now, we still do.

I've left the ELisp manual documentation which says "number" without
these details, but I think at least the doc strings should spell out
these details, for now.

> It's true that the current integer implementation is a bit different, in that eq 
> and = now treat integers differently; but this is a global property that is best 
> documented in the integer section of the Emacs manual. We shouldn't need to add 
> a comment in each function returning an integer in effect saying "watch out! eq 
> and = might act differently on these integers!" as the cost to users of this 
> documentation complication will exceed its benefit in the long run.

I think we should call out the functions that can return bignums
because of this and other peculiarities of bignums.  Otherwise, people
will have hard time writing robust programs that use these APIs.
E.g., it would take an expert in obscure character sets to know that
encode-char could potentially return a value that cannot be
represented as a fixnum.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: documentation of integers, fixnums and bignums
  2018-09-08 16:27 ` Eli Zaretskii
@ 2018-09-08 18:15   ` Stefan Monnier
  2018-09-08 20:15     ` Paul Eggert
  2018-09-08 20:05   ` Paul Eggert
  1 sibling, 1 reply; 18+ messages in thread
From: Stefan Monnier @ 2018-09-08 18:15 UTC (permalink / raw)
  To: emacs-devel

> encode-char could potentially return a value that cannot be
> represented as a fixnum.

Can this still happen?  When?


        Stefan




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: documentation of integers, fixnums and bignums
  2018-09-08 16:27 ` Eli Zaretskii
  2018-09-08 18:15   ` Stefan Monnier
@ 2018-09-08 20:05   ` Paul Eggert
  2018-09-08 21:07     ` Eli Zaretskii
  2018-09-08 21:58     ` Stefan Monnier
  1 sibling, 2 replies; 18+ messages in thread
From: Paul Eggert @ 2018-09-08 20:05 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Emacs-devel

Eli Zaretskii wrote:
> I think it's important for the Lisp programmers to
> know what kind of objects they could get as return values.  Maybe in
> some distant future we will no longer care about the difference
> between fixnums and bignums, but as of now, we still do.

When Lisp programmers care about object types, they should care only whether the 
objects are integers. From a type point of view programmers shouldn't care 
whether an integer is small or large, any more than they should care whether a 
vector is small or large. Occasionally for pragmatic reasons it may make sense 
to point out that an integer might be large or not, just as it occasionally may 
make sense to point out that a vector might be large or not. But this should be 
the exception, not the typical case.

What you see as "some distant future" I see as happening before the next 
release, by the way. Perhaps that explains why you're more in favor of 
documenting the current not-yet-finished situation, whereas I'm more in favor of 
keeping the documentation simple and implementing it that way.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: documentation of integers, fixnums and bignums
  2018-09-08 18:15   ` Stefan Monnier
@ 2018-09-08 20:15     ` Paul Eggert
  2018-09-08 22:03       ` Stefan Monnier
  0 siblings, 1 reply; 18+ messages in thread
From: Paul Eggert @ 2018-09-08 20:15 UTC (permalink / raw)
  To: Stefan Monnier, emacs-devel

Stefan Monnier wrote:
>> encode-char could potentially return a value that cannot be
>> represented as a fixnum.
> Can this still happen?  When?

When INDEX_TO_CODE_POINT returns a code point greater than most-positive-fixnum, 
which can happen (in theory, at least) on 32-bit platforms. Formerly, such a 
code point caused Emacs to return a negative fixnum or junk, depending on the 
code point. Now it causes Emacs to return an integer with the proper value.

I don't know of any charsets that actually do that. Possibly Emacs should simply 
report an error if it runs across one, as that would simplify the code point 
processing internals.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: documentation of integers, fixnums and bignums
  2018-09-08 20:05   ` Paul Eggert
@ 2018-09-08 21:07     ` Eli Zaretskii
  2018-09-08 21:58     ` Stefan Monnier
  1 sibling, 0 replies; 18+ messages in thread
From: Eli Zaretskii @ 2018-09-08 21:07 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Emacs-devel

> Cc: Emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sat, 8 Sep 2018 13:05:48 -0700
> 
> Eli Zaretskii wrote:
> > I think it's important for the Lisp programmers to
> > know what kind of objects they could get as return values.  Maybe in
> > some distant future we will no longer care about the difference
> > between fixnums and bignums, but as of now, we still do.
> 
> When Lisp programmers care about object types, they should care only whether the 
> objects are integers.

Ideally, yes.  But in practice dealing with very large integers is
something people don't assume naturally, and for now bignums and
fixnums don't even behave identically in Emacs Lisp.

> What you see as "some distant future" I see as happening before the next 
> release, by the way. Perhaps that explains why you're more in favor of 
> documenting the current not-yet-finished situation, whereas I'm more in favor of 
> keeping the documentation simple and implementing it that way.

Well, that changeset started with an attempt to fix a woefully
misleading documentation left behind, which still claimed we produce
cons cells in some situations.  We must keep the master branch
reasonably well documented, because it is being used by a lot of
people.  We cannot leave it in WIP state for longer than a few hours.
When code changes, documentation should follow immediately.  Yes, that
means additional work, which might in the end prove more than
absolutely necessary, but I see no other way when development is done
incrementally on the master branch (as opposed to a feature branch).

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: documentation of integers, fixnums and bignums
  2018-09-08 20:05   ` Paul Eggert
  2018-09-08 21:07     ` Eli Zaretskii
@ 2018-09-08 21:58     ` Stefan Monnier
  1 sibling, 0 replies; 18+ messages in thread
From: Stefan Monnier @ 2018-09-08 21:58 UTC (permalink / raw)
  To: emacs-devel

> When Lisp programmers care about object types, they should care only whether
> the objects are integers.

I agree, but as long as `eq` behaves differently, the difference
is important.


        Stefan




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: documentation of integers, fixnums and bignums
  2018-09-08 20:15     ` Paul Eggert
@ 2018-09-08 22:03       ` Stefan Monnier
  2018-09-08 23:37         ` Paul Eggert
  2018-09-09  5:42         ` Eli Zaretskii
  0 siblings, 2 replies; 18+ messages in thread
From: Stefan Monnier @ 2018-09-08 22:03 UTC (permalink / raw)
  To: emacs-devel

>>> encode-char could potentially return a value that cannot be
>>> represented as a fixnum.
>> Can this still happen?  When?
> When INDEX_TO_CODE_POINT returns a code point greater than
> most-positive-fixnum, which can happen (in theory, at least) on 32-bit
> platforms.

Can it, really?

> Formerly, such a code point caused Emacs to return a negative
> fixnum or junk, depending on the code point.

I get the impression that this possibility might have existed back in
Emacs-20 but has disappeared since.

AFAIK any Unicode codepoint fits in 22 (or even 21?) bits, and while we
may use a few extra codepoints IIUC in some corner cases, it should all
fit comfortably within our 28 bits of FIXNATs.


        Stefan




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: documentation of integers, fixnums and bignums
  2018-09-08 22:03       ` Stefan Monnier
@ 2018-09-08 23:37         ` Paul Eggert
  2018-09-09  2:33           ` Stefan Monnier
  2018-09-09  5:40           ` Eli Zaretskii
  2018-09-09  5:42         ` Eli Zaretskii
  1 sibling, 2 replies; 18+ messages in thread
From: Paul Eggert @ 2018-09-08 23:37 UTC (permalink / raw)
  To: Stefan Monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 600 bytes --]

Stefan Monnier wrote:
>>>> encode-char could potentially return a value that cannot be
>>>> represented as a fixnum.
>>> Can this still happen?  When?
>> When INDEX_TO_CODE_POINT returns a code point greater than
>> most-positive-fixnum, which can happen (in theory, at least) on 32-bit
>> platforms.
> 
> Can it, really?

I don't know of any way it could happen. So what you're saying is that we should 
install something like the attached patch?

Also, how about glyph-ids returned by font-variation-glyphs? Can they exceed 
fixnum range? If not, font-variation-glyphs could see a similar speedup.

[-- Attachment #2: encode-char.diff --]
[-- Type: text/x-patch, Size: 822 bytes --]

diff --git a/src/charset.c b/src/charset.c
index e11a8366d5..a5a5a944dc 100644
--- a/src/charset.c
+++ b/src/charset.c
@@ -1870,8 +1870,7 @@ although this usage is obsolescent.  */)
 
 DEFUN ("encode-char", Fencode_char, Sencode_char, 2, 2, 0,
        doc: /* Encode the character CH into a code-point of CHARSET.
-Return the encoded code-point, a fixnum if its value is small enough,
-otherwise a bignum.
+Return the encoded code-point, a fixnum.
 Return nil if CHARSET doesn't support CH.  */)
   (Lisp_Object ch, Lisp_Object charset)
 {
@@ -1886,7 +1885,8 @@ Return nil if CHARSET doesn't support CH.  */)
   code = ENCODE_CHAR (charsetp, c);
   if (code == CHARSET_INVALID_CODE (charsetp))
     return Qnil;
-  return INT_TO_INTEGER (code);
+  eassert (!FIXNUM_OVERFLOW_P (code));
+  return make_fixnum (code);
 }
 
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* RE: documentation of integers, fixnums and bignums
       [not found] ` <<83pnxorm37.fsf@gnu.org>
@ 2018-09-09  1:43   ` Drew Adams
  0 siblings, 0 replies; 18+ messages in thread
From: Drew Adams @ 2018-09-09  1:43 UTC (permalink / raw)
  To: Eli Zaretskii, Paul Eggert; +Cc: Emacs-devel

> Sorry, I disagree.  I think it's important for the Lisp programmers to
> know what kind of objects they could get as return values.  Maybe in
> some distant future we will no longer care about the difference
> between fixnums and bignums, but as of now, we still do.

+1. If there is a difference then users deserve to know about it
(without looking at the code). Whenever there is no difference
it can be enough to say "number".



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: documentation of integers, fixnums and bignums
  2018-09-08 23:37         ` Paul Eggert
@ 2018-09-09  2:33           ` Stefan Monnier
  2018-09-09  5:40           ` Eli Zaretskii
  1 sibling, 0 replies; 18+ messages in thread
From: Stefan Monnier @ 2018-09-09  2:33 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

>> Can it, really?
> I don't know of any way it could happen.  So what you're saying is that we
> should install something like the attached patch?

I'm not 100% sure, but I think so, yes.

> Also, how about glyph-ids returned by font-variation-glyphs? Can they exceed
> fixnum range? If not, font-variation-glyphs could see a similar speedup.

Sorry, not familiar enough with these to know for sure.


        Stefan



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: documentation of integers, fixnums and bignums
  2018-09-08 23:37         ` Paul Eggert
  2018-09-09  2:33           ` Stefan Monnier
@ 2018-09-09  5:40           ` Eli Zaretskii
  2018-09-10  0:09             ` Stefan Monnier
  1 sibling, 1 reply; 18+ messages in thread
From: Eli Zaretskii @ 2018-09-09  5:40 UTC (permalink / raw)
  To: Paul Eggert; +Cc: monnier, emacs-devel

> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sat, 8 Sep 2018 16:37:26 -0700
> 
> >> When INDEX_TO_CODE_POINT returns a code point greater than
> >> most-positive-fixnum, which can happen (in theory, at least) on 32-bit
> >> platforms.
> > 
> > Can it, really?
> 
> I don't know of any way it could happen. So what you're saying is that we should 
> install something like the attached patch?

The log message you made in commit 3c7649c says:

    Don't rely on undefined behavior with signed left shift overflow.
    Don't assume unsigned int fits into fixnum, or that fixnum fits
    into unsigned int.  Don't require max_code to be a valid fixnum;
    that's not true for gb10830 4-byte on a 32-bit host.

And indeed, etc/charsets/gb108304.map clearly shows codepoints like
0x81308130, which will overflow the 32-bit most-positive-fixnum.
(These codepoints are just a concatenation of the 4 bytes of the GB
10830 encoding, see https://en.wikipedia.org/wiki/GB_18030).

> Also, how about glyph-ids returned by font-variation-glyphs? Can they exceed 
> fixnum range? If not, font-variation-glyphs could see a similar speedup.

They are glyph IDs of some font.  Does anyone know what are the limits
for values of font glyph IDs?  I'm not an expert on fonts.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: documentation of integers, fixnums and bignums
  2018-09-08 22:03       ` Stefan Monnier
  2018-09-08 23:37         ` Paul Eggert
@ 2018-09-09  5:42         ` Eli Zaretskii
  2018-09-10  0:12           ` Stefan Monnier
  1 sibling, 1 reply; 18+ messages in thread
From: Eli Zaretskii @ 2018-09-09  5:42 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Sat, 08 Sep 2018 18:03:37 -0400
> 
> AFAIK any Unicode codepoint fits in 22 (or even 21?) bits, and while we
> may use a few extra codepoints IIUC in some corner cases, it should all
> fit comfortably within our 28 bits of FIXNATs.

Unicode codepoints have almost nothing to do with this, since nowadays
encode-char is mostly a no-op with Unicode character set.  Its main
use is with non-Unicode charsets, and there we cannot apply the
knowledge of the Unicode code-space, we cannot even assume the
code-space is populated densely as in Unicode.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: documentation of integers, fixnums and bignums
  2018-09-09  5:40           ` Eli Zaretskii
@ 2018-09-10  0:09             ` Stefan Monnier
  2018-09-10  6:43               ` Eli Zaretskii
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Monnier @ 2018-09-10  0:09 UTC (permalink / raw)
  To: emacs-devel

> And indeed, etc/charsets/gb108304.map clearly shows codepoints like
> 0x81308130, which will overflow the 32-bit most-positive-fixnum.
> (These codepoints are just a concatenation of the 4 bytes of the GB
> 10830 encoding, see https://en.wikipedia.org/wiki/GB_18030).

Aha!  Thanks.  This deserves a comment in the code.

According to Wikipedia, GB-10830 has a max of ~1.5M codepoints (of the
4 bytes, the first and third are >128 and the second and fourth span
[30..39]), so we could fit them all into our fixnums just fine, but we'd
need to change the way we turn the 4byte sequence into a codepoint in
order to do that (I doubt the benefit would be worth the trouble).

        Stefan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: documentation of integers, fixnums and bignums
  2018-09-09  5:42         ` Eli Zaretskii
@ 2018-09-10  0:12           ` Stefan Monnier
  0 siblings, 0 replies; 18+ messages in thread
From: Stefan Monnier @ 2018-09-10  0:12 UTC (permalink / raw)
  To: emacs-devel

>> AFAIK any Unicode codepoint fits in 22 (or even 21?) bits, and while we
>> may use a few extra codepoints IIUC in some corner cases, it should all
>> fit comfortably within our 28 bits of FIXNATs.
>
> Unicode codepoints have almost nothing to do with this, since nowadays

I used Unicode as a reference of the magnitude that can be expected.
While GB-10830 is larger than Unicode, it's not much larger (although
we encode it into codepoints that are sparsely populated, thus
needing all 32bits).


        Stefan




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: documentation of integers, fixnums and bignums
  2018-09-10  0:09             ` Stefan Monnier
@ 2018-09-10  6:43               ` Eli Zaretskii
  2018-09-10 12:11                 ` Stefan Monnier
  0 siblings, 1 reply; 18+ messages in thread
From: Eli Zaretskii @ 2018-09-10  6:43 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Sun, 09 Sep 2018 20:09:13 -0400
> 
> > And indeed, etc/charsets/gb108304.map clearly shows codepoints like
> > 0x81308130, which will overflow the 32-bit most-positive-fixnum.
> > (These codepoints are just a concatenation of the 4 bytes of the GB
> > 10830 encoding, see https://en.wikipedia.org/wiki/GB_18030).
> 
> Aha!  Thanks.  This deserves a comment in the code.

Where would you suggest to put this comment?



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: documentation of integers, fixnums and bignums
  2018-09-10  6:43               ` Eli Zaretskii
@ 2018-09-10 12:11                 ` Stefan Monnier
  2018-09-10 13:53                   ` Eli Zaretskii
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Monnier @ 2018-09-10 12:11 UTC (permalink / raw)
  To: emacs-devel

> Where would you suggest to put this comment?

I put it in encode-char.


        Stefan




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: documentation of integers, fixnums and bignums
  2018-09-10 12:11                 ` Stefan Monnier
@ 2018-09-10 13:53                   ` Eli Zaretskii
  0 siblings, 0 replies; 18+ messages in thread
From: Eli Zaretskii @ 2018-09-10 13:53 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Mon, 10 Sep 2018 08:11:50 -0400
> 
> > Where would you suggest to put this comment?
> 
> I put it in encode-char.

Thanks.



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-09-10 13:53 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <<0f632217-27ad-4f54-8ce0-480301fa2a86@cs.ucla.edu>
     [not found] ` <<83pnxorm37.fsf@gnu.org>
2018-09-09  1:43   ` documentation of integers, fixnums and bignums Drew Adams
2018-09-08 16:09 Paul Eggert
2018-09-08 16:27 ` Eli Zaretskii
2018-09-08 18:15   ` Stefan Monnier
2018-09-08 20:15     ` Paul Eggert
2018-09-08 22:03       ` Stefan Monnier
2018-09-08 23:37         ` Paul Eggert
2018-09-09  2:33           ` Stefan Monnier
2018-09-09  5:40           ` Eli Zaretskii
2018-09-10  0:09             ` Stefan Monnier
2018-09-10  6:43               ` Eli Zaretskii
2018-09-10 12:11                 ` Stefan Monnier
2018-09-10 13:53                   ` Eli Zaretskii
2018-09-09  5:42         ` Eli Zaretskii
2018-09-10  0:12           ` Stefan Monnier
2018-09-08 20:05   ` Paul Eggert
2018-09-08 21:07     ` Eli Zaretskii
2018-09-08 21:58     ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).