size_t vs EMACS

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* size_t vs EMACS_INT
@ 2011-07-15  6:42 Eli Zaretskii
  2011-07-15  7:15 ` Paul Eggert
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2011-07-15  6:42 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

Paul,

I think that part of your change in revision 105217 on the trunk, viz.

  -static size_t bidi_cache_size = 0;
  +static EMACS_INT bidi_cache_size = 0;

is not a good idea.  I understand the motivation for using a signed
type, but EMACS_INT isn't just a signed type, it's 3 bits narrower
than size_t, at least on some platforms.  By contrast, the bidi cache
should be able to support the longest Lisp string/buffer, and for that
it needs to have MOST_POSITIVE_FIXNUM _elements_, not bytes.  So the
net effect of the above change is to limit the cache to 1/8th of the
maximum size it could have before the change.  So I think we will have
to use size_t here, and deal with whatever complications that causes
with GCC 4.6.x.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: size_t vs EMACS_INT
  2011-07-15  6:42 size_t vs EMACS_INT Eli Zaretskii
@ 2011-07-15  7:15 ` Paul Eggert
  2011-07-15  8:10   ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: Paul Eggert @ 2011-07-15  7:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On 07/14/11 23:42, Eli Zaretskii wrote:

>   -static size_t bidi_cache_size = 0;
>   +static EMACS_INT bidi_cache_size = 0;
> 
> is not a good idea.  I understand the motivation for using a signed
> type, but EMACS_INT isn't just a signed type, it's 3 bits narrower
> than size_t, at least on some platforms.

But the code in question doesn't use tag bits.  So there's no issue
about EMACS_INT being 2 or 3 bits narrower than the machine integer.
On all practical Emacs porting targets, EMACS_INT is at least as wide
as size_t, and the above change does not impose any new limits.

> By contrast, the bidi cache should be able to support the longest
> Lisp string/buffer, and for that it needs to have
> MOST_POSITIVE_FIXNUM _elements_, not bytes

Not exactly.  Although the longest Lisp string/buffer indeed cannot
exceed MOST_POSITIVE_FIXNUM elements, there are two other constraints:
it cannot exceed SIZE_MAX elements, and it cannot exceed PTRDIFF_MAX
elements.  All three constraints are necessary to prevent various
disasters in the underlying C code.  And the Emacs allocators enforce
these constraints.  For more details, please see the definitions of
STRING_BYTES_BOUND in lisp.h, and of BUF_BYTES_MAX in buffer.h.

Therefore, EMACS_INT, ptrdiff_t, and size_t are all wide enough
to count a buffer size or string length.

We prefer signed types, so we avoid size_t.

We should also prefer ptrdiff_t to EMACS_INT for such purposes.  These
two are normally the same type, except on 32-bit hosts configured
--with-wide-int where the former is 32 bits and the latter is 64 bits.
In that case, ptrdiff_t is preferable because it is more efficient.
That is why the subsequent patch proposed in
<http://debbugs.gnu.org/cgi/bugreport.cgi?bug=9079#26> changes
bidi_cache_size from EMACS_INT to ptrdiff_t.

A goodly amount of existing Emacs code uses EMACS_INT where ptrdiff_t
would do.  I have several fixes for this, which I plan to submit at
some point.  But in the meantime, new code should prefer ptrdiff_t to
EMACS_INT where either type would do.

> So the net effect of the above change is to limit the cache to 1/8th
> of the maximum size it could have before the change.

I hope the above comments explain why the change does not place any
limits on the cache size that were not already there.

I'll follow up further at the bug report
<http://debbugs.gnu.org/cgi/bugreport.cgi?bug=9079#26>.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: size_t vs EMACS_INT
  2011-07-15  7:15 ` Paul Eggert
@ 2011-07-15  8:10   ` Eli Zaretskii
  2011-07-15 16:38     ` Paul Eggert
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2011-07-15  8:10 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

> Date: Fri, 15 Jul 2011 00:15:04 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: emacs-devel@gnu.org
> 
> But the code in question doesn't use tag bits.  So there's no issue
> about EMACS_INT being 2 or 3 bits narrower than the machine integer.

Today, no issue.  But some day the representation of EMACS_INT might
be changed, and when that happens, it would be good to have all the
variables that are related to that type be clearly marked in the
sources.  Otherwise, we will have to wade through all the places
trying to understand which variable is really EMACS_INT and which is
size_t or ptrdiff_t.  Having done that recently, when I worked on
editing buffers larger than 2GB on 64-bit hosts, I can tell you it's
no fun.

> We prefer signed types, so we avoid size_t.

Then let's use ssize_t.  I submit that EMACS_INT should not be used
for anything that is not directly related to buffer or string
positions.

> I hope the above comments explain why the change does not place any
> limits on the cache size that were not already there.

Practically speaking, with today's implementation of EMACS_INT, it
doesn't.  But that wasn't my point.  My point was that _conceptually_,
EMACS_INT is limited to MOST_POSITIVE_FIXNUM, which is way smaller
than SIZE_T_MAX on many platforms.  We should not IMO let assumptions
based on implementation details creep into the code, except where
strictly necessary.  This isn't a case of such a necessity, IMO.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: size_t vs EMACS_INT
  2011-07-15  8:10   ` Eli Zaretskii
@ 2011-07-15 16:38     ` Paul Eggert
  2011-07-15 17:14       ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: Paul Eggert @ 2011-07-15 16:38 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On 07/15/11 01:10, Eli Zaretskii wrote:
> EMACS_INT should not be used
> for anything that is not directly related to buffer or string
> positions.

No, EMACS_INT is for anything that is related to the Emacs
internal representation of integers.  It's not just buffer
and string positions.  There are lots of other uses: arithmetic,
calendars, time stamps, random numbers, image dimensions, etc.

> My point was that _conceptually_,
> EMACS_INT is limited to MOST_POSITIVE_FIXNUM

No, conceptually, EMACS_INT is limited to TYPE_MAXIMUM (EMACS_INT).
A lot of Emacs code assumes that MOST_POSITIVE_FIXNUM is much
less than TYPE_MAXIMUM (EMACS_INT), and any change to that assumption
would require a significant rewrite of the Emacs internals.

And even if this assumption *were* to change, it would not affect
this particular case.  Here, the value in question cannot possibly
exceed PTRDIFF_MAX because it is counting something (a C array size)
that cannot exceed PTRDIFF_MAX without breaking things at the C
level.  The fact that the value also cannot exceed
MOST_POSITIVE_FIXNUM does not affect the fact that the value must
always fit into ptrdiff_t.

> let's use ssize_t

No, because POSIX allows ssize_t to be 32 bits on the same
platform where size_t is 64 bits.  The motivation for
ssize_t is "this is a signed integer type such that syscalls
like read() and write() will never return values larger than that".
This is why there is a SSIZE_MAX in POSIX, but no SSIZE_MIN;
in the POSIX context, there's no use for negative ssize_t values
other than -1.

On a host with 32-bit ssize_t and 64-bit size_t, one can read()
a terabyte buffer, but read() will never return more than 2 GiB at a
time.  That's not what we want here: here, the quantities
have nothing to do with I/O buffers, and they should not have
an arbitrary 32-bit limit on these 64-bit hosts.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: size_t vs EMACS_INT
  2011-07-15 16:38     ` Paul Eggert
@ 2011-07-15 17:14       ` Eli Zaretskii
  2011-07-15 21:52         ` Paul Eggert
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2011-07-15 17:14 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

> Date: Fri, 15 Jul 2011 09:38:29 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: emacs-devel@gnu.org
> 
> On 07/15/11 01:10, Eli Zaretskii wrote:
> > EMACS_INT should not be used
> > for anything that is not directly related to buffer or string
> > positions.
> 
> No, EMACS_INT is for anything that is related to the Emacs
> internal representation of integers.  It's not just buffer
> and string positions.

Right, that too.

> > My point was that _conceptually_,
> > EMACS_INT is limited to MOST_POSITIVE_FIXNUM
> 
> No, conceptually, EMACS_INT is limited to TYPE_MAXIMUM (EMACS_INT).
> A lot of Emacs code assumes that MOST_POSITIVE_FIXNUM is much
> less than TYPE_MAXIMUM (EMACS_INT), and any change to that assumption
> would require a significant rewrite of the Emacs internals.

Can you show examples of these assumptions?

> > let's use ssize_t
> 
> No, because POSIX allows ssize_t to be 32 bits on the same
> platform where size_t is 64 bits.

But ptrdiff_t should do, right?  If so, let's use that.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: size_t vs EMACS_INT
  2011-07-15 17:14       ` Eli Zaretskii
@ 2011-07-15 21:52         ` Paul Eggert
  2011-07-16  7:13           ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: Paul Eggert @ 2011-07-15 21:52 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On 07/15/11 10:14, Eli Zaretskii wrote:

> ptrdiff_t should do, right?  If so, let's use that.

Yes, thanks, that sounds like the best way to go.

>> > A lot of Emacs code assumes that MOST_POSITIVE_FIXNUM is much
>> > less than TYPE_MAXIMUM (EMACS_INT), and any change to that assumption
>> > would require a significant rewrite of the Emacs internals.
> Can you show examples of these assumptions?

The most central examples are the integer-extraction macros
in lisp.h, e.g., make_number, XINT.  Presumably these could be changed if
we take the big step of adopting some other implementation strategy
for Emacs integers, such that XINT (foo) could yield
TYPE_MAXIMUM (EMACS_INT).   But then we'd have to deal with examples
like the following, in Fforward_char:

    EMACS_INT new_point = PT + XINT (n);

This code is currently safe, since C code can always safely add
two Emacs fixnums, and the addition can't possibly overflow at the C level.
But if fixnums could equal TYPE_MAXIMUM (EMACS_INT),
this code would be unsafe and we would have to add a run-time
check for integer overflow.

There are many more examples like this, not all of them as
obvious as the above.  Here's one, from Frem:

  XSETINT (val, XINT (x) % XINT (y));

If XINT (x) could equal TYPE_MINIMUM (EMACS_INT), then this
would dump core on an x86 when XINT (y) == -1, because
INT_MIN % -1 dumps core on the x86 (the C standard allows this,
alas).  However, since XINT (x) cannot possibly equal
TYPE_MINIMUM (EMACS_INT), Emacs is currently safe from
this problem, and we don't need to insert a run-time check
here.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: size_t vs EMACS_INT
  2011-07-15 21:52         ` Paul Eggert
@ 2011-07-16  7:13           ` Eli Zaretskii
  2011-07-16 11:02             ` Paul Eggert
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2011-07-16  7:13 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

> Date: Fri, 15 Jul 2011 14:52:56 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: emacs-devel@gnu.org
> 
>     EMACS_INT new_point = PT + XINT (n);
> 
> This code is currently safe, since C code can always safely add
> two Emacs fixnums, and the addition can't possibly overflow at the C level.
> But if fixnums could equal TYPE_MAXIMUM (EMACS_INT),
> this code would be unsafe and we would have to add a run-time
> check for integer overflow.

But this issue exists with any addition of two integer values of the
same type in a C program.  And yet gobs of C programs do that without
testing for overflow before each addition.  Why should Emacs be
different?

Also, the fact that the underlying C data type cannot overflow doesn't
save us from disasters, because calling make_number on the result
could still "kind of" overflow, when it bit-shifts the value.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: size_t vs EMACS_INT
  2011-07-16  7:13           ` Eli Zaretskii
@ 2011-07-16 11:02             ` Paul Eggert
  0 siblings, 0 replies; 8+ messages in thread
From: Paul Eggert @ 2011-07-16 11:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On 07/16/11 00:13, Eli Zaretskii wrote:
> gobs of C programs do that without testing for overflow before each
> addition.  Why should Emacs be different?

A short answer is that Emacs is supposed to be reliable.

A longer answer is that it depends on the context.  If a C program's
values are known to never overflow in practice, or if we know no
compiler will ever use anything but wraparound semantics and the
program works fine with wraparound, or if the program's behavior
doesn't matter all that much, then it's fine if the program does not
test for overflow.  But if the values might overflow, and if the
compiler might not use wraparound semantics (or the program does not
work with wraparound semantics), and if the program is supposed to be
reliable even when given large values, then the program needs to test
for overflow.

For some more details about this issue, please see
<http://www.gnu.org/s/gnulib/manual/html_node/Integer-Properties.html>
and <http://www.sei.cmu.edu/library/abstracts/reports/10tn008.cfm>.
Also, we discussed this in Bug#8545; see, for example,
<http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8545#105>.

> Also, the fact that the underlying C data type cannot overflow doesn't
> save us from disasters, because calling make_number on the result
> could still "kind of" overflow, when it bit-shifts the value.

Yes, and I plan to fix that porting problem at some point too.  It
won't be that hard, as shifting comes up less often than addition.
For now, though, I would rather focus on issues with standard integer
arithmetic (+ - * /), as these issues are more likely to cause
real-world issues such as the core dump I mentioned in Bug#9079.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-07-16 11:02 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-15  6:42 size_t vs EMACS_INT Eli Zaretskii
2011-07-15  7:15 ` Paul Eggert
2011-07-15  8:10   ` Eli Zaretskii
2011-07-15 16:38     ` Paul Eggert
2011-07-15 17:14       ` Eli Zaretskii
2011-07-15 21:52         ` Paul Eggert
2011-07-16  7:13           ` Eli Zaretskii
2011-07-16 11:02             ` Paul Eggert

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).