From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: David Kastrup <dak@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Making --with-wide-int the default
Date: Fri, 16 Oct 2015 16:03:09 +0200
Message-ID: <87zizisyg2.fsf@fencepost.gnu.org>
References: <CA+5B0FOuWbpBUTsrE4tzzoLxACPQ-mgxx7zJKyW2LR77QRM=Ug@mail.gmail.com>
	<CAArVCkTi9sNLzGPuZ5n49X9pkMWjnfM+oUzQoj+Ko=KVW9kUtA@mail.gmail.com>
	<5610ED13.1010406@dancol.org>
	<CAArVCkS2F4Vc2tx6hDPAjF6oYr3PTZFqpjx7gUORHBvyD-kMxQ@mail.gmail.com>
	<56117F37.9060808@dancol.org>
	<CAArVCkQtXhdEF2nD3RF_q59G8PsLOG2T-qd+fdNAgMiYzawkkA@mail.gmail.com>
	<CA+5B0FMwh8t_BfQ+wrqjuw3LytB52-J=UBVtg-4n57BiebsPFA@mail.gmail.com>
	<CAArVCkRxNx0zNr-iRho-MZj0fMh7cegwwLgkxXbPU_ffmhHPXw@mail.gmail.com>
	<CA+5B0FN0v3kjWvkxEes2RivUGDyLbitZDOup7tZpErBp0Uc0vA@mail.gmail.com>
	<CAArVCkSLFTJcPeTH7S6sNLQL3y7p-wNEJ__9ZhCsRt73p2VNrw@mail.gmail.com>
	<CA+5B0FPBCo0BKc1VgRMxVyU+3Pw6L4cOJP8Dzf8bRC9v0h_R-A@mail.gmail.com>
	<83oag087gs.fsf@gnu.org>
	<CAAeL0SQYsdWY1=FBZXqHgASin7eH4g8CriSnk1Xh48KDLxdi-w@mail.gmail.com>
	<83oafz70im.fsf@gnu.org> <5620AF43.4050401@cs.ucla.edu>
	<8737xbusz1.fsf@fencepost.gnu.org> <83d1wf6v47.fsf@gnu.org>
	<87pp0ftbmg.fsf@fencepost.gnu.org> <831tcv6s6f.fsf@gnu.org>
	<87d1wft8g7.fsf@fencepost.gnu.org> <83vba754rq.fsf@gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
X-Trace: ger.gmane.org 1445006029 14263 80.91.229.3 (16 Oct 2015 14:33:49 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Fri, 16 Oct 2015 14:33:49 +0000 (UTC)
Cc: lekktu@gmail.com, eggert@cs.ucla.edu, emacs-devel@gnu.org
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Oct 16 16:33:48 2015
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1Zn64b-0008FV-95
	for ged-emacs-devel@m.gmane.org; Fri, 16 Oct 2015 16:33:33 +0200
Original-Received: from localhost ([::1]:54054 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1Zn64a-0004UK-C0
	for ged-emacs-devel@m.gmane.org; Fri, 16 Oct 2015 10:33:32 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:46044)
	by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <dak@gnu.org>)
	id 1Zn64V-0004To-Eo
	for emacs-devel@gnu.org; Fri, 16 Oct 2015 10:33:28 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <dak@gnu.org>) id 1Zn64T-0001LB-JU
	for emacs-devel@gnu.org; Fri, 16 Oct 2015 10:33:27 -0400
Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:56642)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <dak@gnu.org>)
	id 1Zn64L-0001H8-F6; Fri, 16 Oct 2015 10:33:17 -0400
Original-Received: from localhost ([127.0.0.1]:42223 helo=lola)
	by fencepost.gnu.org with esmtp (Exim 4.82)
	(envelope-from <dak@gnu.org>)
	id 1Zn64J-00025S-2c; Fri, 16 Oct 2015 10:33:15 -0400
Original-Received: by lola (Postfix, from userid 1000)
	id D68EBEBEAB; Fri, 16 Oct 2015 16:03:09 +0200 (CEST)
In-Reply-To: <83vba754rq.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 16 Oct
	2015 16:20:25 +0300")
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux)
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
	(bad octet value).
X-Received-From: 2001:4830:134:3::e
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:191758
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/191758>

Eli Zaretskii <eliz@gnu.org> writes:

>> From: David Kastrup <dak@gnu.org>
>> Cc: eggert@cs.ucla.edu,  lekktu@gmail.com,  emacs-devel@gnu.org
>> Date: Fri, 16 Oct 2015 12:27:04 +0200
>>=20
>> >> > How can GMP help extend the maximum size of buffers and strings bey=
ond
>> >> > what a 32-bit EMACS_INT allows?
>> >>=20
>> >> By choosing an appropriate data type for representing buffer/string
>> >> sizes in C and converting back-and-forth from the Lisp type as needed.
>> >> Pretty much the same way we do it now.
>> >
>> > Sorry, I don't foollow: what "appropriate data type"?  Would that be
>> > 'long long'?  If so, that's exactly what we do now, which you say is
>> > "going to 64-bit unilaterally".  What am I missing?
>>=20
>> A change in the size of the Lisp data type.
>
> We are talking about the _implementation_ of Lisp data types,
> i.e. about the underlying C data types.  What am I missing on that
> level?

You are assuming that the C data type used for working with
buffer/string position is the same as the type of a Lisp cell.

>> >> I think it would be a reasonable restriction to keep to 2GB size of
>> >> strings and buffers when working with a 32bit executable.  That's
>> >> what people expect on a 32-bit architecture.
>> >
>> > That's what we have now in 32-bit builds --with-wide-int.  So I'm not
>> > sure why you mention that as some kind of change related to this
>> > discussion.
>>=20
>> Isn't that also changing the size of a Lisp cell?  And of integer
>> arithmetic?
>
> Part of it, yes.  But since a Lisp cell can hold buffer or string
> position, what else can you expect?

That buffer or string positions not representable in the 29 bits or so
available for integers in a Lisp cell are instead represented using a
gpm number for which there is a reference in the Lisp cell.

>> >> When you are editing gigabyte files, at some point of time, the
>> >> Lisp representation of the respective offsets in the high part of
>> >> the buffer will become the responsibility of GMP, yes.  I'm not
>> >> worried about that.
>> >
>> > I don't understand.  C doesn't have dynamic types.
>>=20
>> But Lisp does.
>
> I was talking about C implementation of Lisp types.

You are apparently unable to deal with the concept of a Lisp type being
represented by fewer bits than the integral type used in C for things
like buffer/string positions outside of the 29-bit integer range.

How do you think Emacs managed 64-bit doubles "inside" of a 32-bit
integral type used for representing Lisp cells?

>> > If the variable that holds buffer positions needs to support 61-bit
>> > offsets, it will have to be a 64-bit integral data type from the
>> > get-go.
>>=20
>> I repeat: we are talking about a 32-bit binary where restricting buffer
>> and string size to 32bit offsets would be reasonable and expected.
>
> You cannot have full 32-bit offsets without enlarging the width of a
> Lisp integer.

You most certainly can by transparent degradation to GMP numbers.
That's exactly how GUILE does it.

>> Choose a C type of your choosing for dealing with buffer offsets,
>> create aliases for its conversions, and you are good to go.
>
> We _have_ chosen: it's a 64-bit 'long long'.  What do you suggest to
> choose instead?  Any suggested type should be representable as an
> Emacs integer.

An "Emacs integer" is a Lisp data type.  You are confusing this with the
C type we use for Lisp cells, and with the C types we might use for
buffer and/or string offsets.

I can't believe we are having this conversation.  Emacs integers so far
are 29=A0bits, Emacs floats are 64=A0bits, Emacs cell types are 32=A0bits a=
nd
I don't understand how one can imagine that they are somehow all the
same.

>> > And having 61-bit integers for integer arithmetics is also a
>> > valuable feature.
>>=20
>> 61-bit is some arbitrary junk number.  Transparently degrading from
>> 29-bits to gmp numbers means that there are no arbitrary limits (or
>> at least you are unlikely to hit them) and the performance for the
>> vast majority of cases will be 29-bit performance.
>
> I don't understand how you intend to "transparently degrade" in the
> implementation of '+', for example.

If the numbers have different signs and are single-cell integers, the
result fits.

Otherwise you add them and if the result (after conversion to 29 bits)
has a different sign than the starting numbers, you convert the result
instead to a GMP number.

> Are you going to test each argument for whether it fits into 32-bit
> limits,

29-bit limits.  I don't need to test the arguments since they will be
single-cell integers only _when_ they fit.  I only need to test the
result.

> and if so, invoke 32-bit arithmetics, else 64-bit arithmetics?  If so,
> I'm quite sure the test will steal cycles that will make this slower
> than just always using 64-bit arithmetics.

Disregarding the memory requirements and bandwidth for always using
64-bit.  Why don't we use 256 bit for everything?  It makes even more
things work at the cost of eating even more memory and speed.

One has to choose a sensible cutoff point for the default number
representation, and whether one will admit more than that.  A sensible
cutoff point is what fits into a Lisp cell of natural processor size.
GMP allows to admit lots more in the rare case where it is needed.

This is what GUILE does and most Lisp implementations with "arbitrary
precision" arithmetic do.

> If you mean something else, can you show some C code that would
> demonstrate your ideas?

Any application using GUILE and its C API.  Read the GUILE manual for
all the details.  This is not hypothetical, and as I already said, there
is as far as I know a compile time option for SXEmacs at least and
possibly also standard XEmacs to get this behavior for integers.

This is not a pipe dream as you appear to insinuate.

>> > So the EMACS_INT type will have to be able to support that.
>>=20
>> I don't see why when one can use GMP (which effectively uses the same
>> kind arithmetic for 61-bit numbers as C does but does not stop there).
>
> Because calling a GMP function will most probably yield slower code
> than if we let the compiler emit a few instructions required for
> 64-bit arithmetics.

GMP needs to be called only when leaving the range of "small integers"
(which is all we even have right now).  64-bit arithmetic in your plan
would be required for every single operation.  Yes, when GMP kicks in,
it will be slower than operations using exactly 64-bit (not more, not
less).  But it's the exception rather than the rule.  So much the
exception that we could entirely make do without it so far.  It _will_
occur frequently when editing files larger than 1GB or so.  But only in
the _Lisp_ representations of those buffer positions.  Everything
implemented in C will use the integral data type we choose for that,
throwing an error when it gets exceeded.  And for a standard 32-bit
compilation, 32-bit integers seem like a reasonable cutoff point: you'll
not be able to load much more than 2G characters into a 32-bit address
space anyway.  Yes, not all of the possible offsets will be
representable in one Lisp cell.

--=20
David Kastrup