From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.emacs.devel Subject: Re: Making --with-wide-int the default Date: Fri, 16 Oct 2015 16:03:09 +0200 Message-ID: <87zizisyg2.fsf@fencepost.gnu.org> References: <5610ED13.1010406@dancol.org> <56117F37.9060808@dancol.org> <83oag087gs.fsf@gnu.org> <83oafz70im.fsf@gnu.org> <5620AF43.4050401@cs.ucla.edu> <8737xbusz1.fsf@fencepost.gnu.org> <83d1wf6v47.fsf@gnu.org> <87pp0ftbmg.fsf@fencepost.gnu.org> <831tcv6s6f.fsf@gnu.org> <87d1wft8g7.fsf@fencepost.gnu.org> <83vba754rq.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1445006029 14263 80.91.229.3 (16 Oct 2015 14:33:49 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 16 Oct 2015 14:33:49 +0000 (UTC) Cc: lekktu@gmail.com, eggert@cs.ucla.edu, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Oct 16 16:33:48 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Zn64b-0008FV-95 for ged-emacs-devel@m.gmane.org; Fri, 16 Oct 2015 16:33:33 +0200 Original-Received: from localhost ([::1]:54054 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zn64a-0004UK-C0 for ged-emacs-devel@m.gmane.org; Fri, 16 Oct 2015 10:33:32 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:46044) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zn64V-0004To-Eo for emacs-devel@gnu.org; Fri, 16 Oct 2015 10:33:28 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zn64T-0001LB-JU for emacs-devel@gnu.org; Fri, 16 Oct 2015 10:33:27 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:56642) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zn64L-0001H8-F6; Fri, 16 Oct 2015 10:33:17 -0400 Original-Received: from localhost ([127.0.0.1]:42223 helo=lola) by fencepost.gnu.org with esmtp (Exim 4.82) (envelope-from ) id 1Zn64J-00025S-2c; Fri, 16 Oct 2015 10:33:15 -0400 Original-Received: by lola (Postfix, from userid 1000) id D68EBEBEAB; Fri, 16 Oct 2015 16:03:09 +0200 (CEST) In-Reply-To: <83vba754rq.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 16 Oct 2015 16:20:25 +0300") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:191758 Archived-At: Eli Zaretskii writes: >> From: David Kastrup >> Cc: eggert@cs.ucla.edu, lekktu@gmail.com, emacs-devel@gnu.org >> Date: Fri, 16 Oct 2015 12:27:04 +0200 >>=20 >> >> > How can GMP help extend the maximum size of buffers and strings bey= ond >> >> > what a 32-bit EMACS_INT allows? >> >>=20 >> >> By choosing an appropriate data type for representing buffer/string >> >> sizes in C and converting back-and-forth from the Lisp type as needed. >> >> Pretty much the same way we do it now. >> > >> > Sorry, I don't foollow: what "appropriate data type"? Would that be >> > 'long long'? If so, that's exactly what we do now, which you say is >> > "going to 64-bit unilaterally". What am I missing? >>=20 >> A change in the size of the Lisp data type. > > We are talking about the _implementation_ of Lisp data types, > i.e. about the underlying C data types. What am I missing on that > level? You are assuming that the C data type used for working with buffer/string position is the same as the type of a Lisp cell. >> >> I think it would be a reasonable restriction to keep to 2GB size of >> >> strings and buffers when working with a 32bit executable. That's >> >> what people expect on a 32-bit architecture. >> > >> > That's what we have now in 32-bit builds --with-wide-int. So I'm not >> > sure why you mention that as some kind of change related to this >> > discussion. >>=20 >> Isn't that also changing the size of a Lisp cell? And of integer >> arithmetic? > > Part of it, yes. But since a Lisp cell can hold buffer or string > position, what else can you expect? That buffer or string positions not representable in the 29 bits or so available for integers in a Lisp cell are instead represented using a gpm number for which there is a reference in the Lisp cell. >> >> When you are editing gigabyte files, at some point of time, the >> >> Lisp representation of the respective offsets in the high part of >> >> the buffer will become the responsibility of GMP, yes. I'm not >> >> worried about that. >> > >> > I don't understand. C doesn't have dynamic types. >>=20 >> But Lisp does. > > I was talking about C implementation of Lisp types. You are apparently unable to deal with the concept of a Lisp type being represented by fewer bits than the integral type used in C for things like buffer/string positions outside of the 29-bit integer range. How do you think Emacs managed 64-bit doubles "inside" of a 32-bit integral type used for representing Lisp cells? >> > If the variable that holds buffer positions needs to support 61-bit >> > offsets, it will have to be a 64-bit integral data type from the >> > get-go. >>=20 >> I repeat: we are talking about a 32-bit binary where restricting buffer >> and string size to 32bit offsets would be reasonable and expected. > > You cannot have full 32-bit offsets without enlarging the width of a > Lisp integer. You most certainly can by transparent degradation to GMP numbers. That's exactly how GUILE does it. >> Choose a C type of your choosing for dealing with buffer offsets, >> create aliases for its conversions, and you are good to go. > > We _have_ chosen: it's a 64-bit 'long long'. What do you suggest to > choose instead? Any suggested type should be representable as an > Emacs integer. An "Emacs integer" is a Lisp data type. You are confusing this with the C type we use for Lisp cells, and with the C types we might use for buffer and/or string offsets. I can't believe we are having this conversation. Emacs integers so far are 29=A0bits, Emacs floats are 64=A0bits, Emacs cell types are 32=A0bits a= nd I don't understand how one can imagine that they are somehow all the same. >> > And having 61-bit integers for integer arithmetics is also a >> > valuable feature. >>=20 >> 61-bit is some arbitrary junk number. Transparently degrading from >> 29-bits to gmp numbers means that there are no arbitrary limits (or >> at least you are unlikely to hit them) and the performance for the >> vast majority of cases will be 29-bit performance. > > I don't understand how you intend to "transparently degrade" in the > implementation of '+', for example. If the numbers have different signs and are single-cell integers, the result fits. Otherwise you add them and if the result (after conversion to 29 bits) has a different sign than the starting numbers, you convert the result instead to a GMP number. > Are you going to test each argument for whether it fits into 32-bit > limits, 29-bit limits. I don't need to test the arguments since they will be single-cell integers only _when_ they fit. I only need to test the result. > and if so, invoke 32-bit arithmetics, else 64-bit arithmetics? If so, > I'm quite sure the test will steal cycles that will make this slower > than just always using 64-bit arithmetics. Disregarding the memory requirements and bandwidth for always using 64-bit. Why don't we use 256 bit for everything? It makes even more things work at the cost of eating even more memory and speed. One has to choose a sensible cutoff point for the default number representation, and whether one will admit more than that. A sensible cutoff point is what fits into a Lisp cell of natural processor size. GMP allows to admit lots more in the rare case where it is needed. This is what GUILE does and most Lisp implementations with "arbitrary precision" arithmetic do. > If you mean something else, can you show some C code that would > demonstrate your ideas? Any application using GUILE and its C API. Read the GUILE manual for all the details. This is not hypothetical, and as I already said, there is as far as I know a compile time option for SXEmacs at least and possibly also standard XEmacs to get this behavior for integers. This is not a pipe dream as you appear to insinuate. >> > So the EMACS_INT type will have to be able to support that. >>=20 >> I don't see why when one can use GMP (which effectively uses the same >> kind arithmetic for 61-bit numbers as C does but does not stop there). > > Because calling a GMP function will most probably yield slower code > than if we let the compiler emit a few instructions required for > 64-bit arithmetics. GMP needs to be called only when leaving the range of "small integers" (which is all we even have right now). 64-bit arithmetic in your plan would be required for every single operation. Yes, when GMP kicks in, it will be slower than operations using exactly 64-bit (not more, not less). But it's the exception rather than the rule. So much the exception that we could entirely make do without it so far. It _will_ occur frequently when editing files larger than 1GB or so. But only in the _Lisp_ representations of those buffer positions. Everything implemented in C will use the integral data type we choose for that, throwing an error when it gets exceeded. And for a standard 32-bit compilation, 32-bit integers seem like a reasonable cutoff point: you'll not be able to load much more than 2G characters into a 32-bit address space anyway. Yes, not all of the possible offsets will be representable in one Lisp cell. --=20 David Kastrup