From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Mark H Weaver Newsgroups: gmane.lisp.guile.devel Subject: Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3 Date: Tue, 11 Jun 2019 20:03:26 -0400 Message-ID: <87a7en7jpy.fsf@netris.org> References: <87zhmvaw5p.fsf@netris.org> <87h89233pp.fsf@web.de> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="24879"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux) Cc: guile-devel@gnu.org To: Arne Babenhauserheide Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Wed Jun 12 02:06:03 2019 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1haqmA-0006JQ-Sa for guile-devel@m.gmane.org; Wed, 12 Jun 2019 02:06:02 +0200 Original-Received: from localhost ([::1]:55918 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1haqm9-000466-P5 for guile-devel@m.gmane.org; Tue, 11 Jun 2019 20:06:01 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:40176) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1haqls-00045o-Lr for guile-devel@gnu.org; Tue, 11 Jun 2019 20:05:45 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1haqlr-0005HX-Et for guile-devel@gnu.org; Tue, 11 Jun 2019 20:05:44 -0400 Original-Received: from world.peace.net ([64.112.178.59]:43056) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1haqlr-0005HF-Bh for guile-devel@gnu.org; Tue, 11 Jun 2019 20:05:43 -0400 Original-Received: from mhw by world.peace.net with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1haqlp-0001xT-Vo; Tue, 11 Jun 2019 20:05:42 -0400 In-Reply-To: <87h89233pp.fsf@web.de> (Arne Babenhauserheide's message of "Thu, 06 Jun 2019 21:37:22 +0200") X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 64.112.178.59 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: "guile-devel" Xref: news.gmane.org gmane.lisp.guile.devel:19964 Archived-At: Hi Arne, Arne Babenhauserheide writes: > I don=E2=80=99t understand the implications of this, but I realized that = I would > love to have info about the sizes of different datastructures in Guile. Sure. > I recently started measuring how much memory is required for a record > compared to a list, so I could give people concrete guidelines which > datastructures to use for which algorithm, and for that such low-level > details would be great to have! (or just to know where to read them up = =E2=80=94 > or which keywords to look for) For core data structures implemented in C, the place to look is in the corresponding *.h and *.c files in libguile, for example struct.[ch], strings.[ch], numbers.[ch], etc. However, pairs receive special handling, and only part of their implementation is in pairs.[ch]. I will summarize the costs of some important structures below: In the following, a "word" is the size of a pointer (4 bytes on a 32-bit system, or 8 bytes on a 64-bit system). I will _not_ include the SCM word itself in the accounting of the size of the object represented by that SCM. That's because for heap-allocated objects, the SCM is simply a pointer to the object, and there can be many pointers to the same object. The cost of SCM words will instead be included in the cost of the aggregate objects that contain them. In this method of accounting, immediate objects cost nothing, because they are entirely stored within the SCM itself. Pairs cost 2 words. Lists cost 2 words per element. The empty list is immediate, and therefore costs nothing. In current 'master', records cost 1 word per field, plus another word, rounded up to the nearest multiple of 2 words. In 2.0 and 2.2, they cost 1 word per field, plus another _2_ words, rounded up Vectors cost 1 word per element, plus another word, rounded up to the nearest multiple of 2 words. Bytevectors cost 4 words plus the data itself, rounded up to the nearest multiple of 2 words. As a special case, empty bytevectors cost nothing. Non-immediate inexact real numbers cost 16 bytes. Non-immediate complex numbers cost 24 bytes on 32-bit systems, or 32 bytes on 64-bit systems. In the common case, strings currently cost 6 words plus the character data itself, rounded up to the nearest multiple of 2 words. The character data costs either 1 byte per character (if all code points are less than 256, i.e. the string is latin-1), or 4 bytes per character. (However, I may propose a new string representation for 3.0). As a special case, the empty string costs nothing. Finally, I should mention that the ability to share substructure in lists and pairs make them potentially more memory efficient than vectors, depending on the use case. For example, if you need a structure with 4 fields, and a common operation in your application will involve copying the structure except for one field which is changed, then you might be able save memory by using lists, even though a 4-element list requires more memory than a 4-element vector or struct. If you put the commonly-changed field in the CAR, then you can update that field by allocating only a single pair (2 words) and sharing the rest of the structure with the old version. This is not possible with vectors or structs, where you must allocate a fresh new object of 5 or 6 words to perform the same operation. Regards, Mark