From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Rob Browning Newsgroups: gmane.lisp.guile.devel Subject: gmp issues (long) Date: Mon, 24 Feb 2003 16:29:04 -0600 Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Message-ID: <87d6lhcnyn.fsf@raven.i.defaultvalue.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1046126116 5030 80.91.224.249 (24 Feb 2003 22:35:16 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 24 Feb 2003 22:35:16 +0000 (UTC) Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 18nRBp-0001Io-00 for ; Mon, 24 Feb 2003 23:35:13 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18nRBF-0002Yp-00 for guile-devel@m.gmane.org; Mon, 24 Feb 2003 17:34:37 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13) id 18nRAt-0002VQ-00 for guile-devel@gnu.org; Mon, 24 Feb 2003 17:34:15 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13) id 18nRAC-00028P-00 for guile-devel@gnu.org; Mon, 24 Feb 2003 17:34:03 -0500 Original-Received: from dsl093-098-016.wdc1.dsl.speakeasy.net ([66.93.98.16] helo=defaultvalue.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18nR5u-0000ji-00 for guile-devel@gnu.org; Mon, 24 Feb 2003 17:29:06 -0500 Original-Received: from raven.i.defaultvalue.org (raven.i.defaultvalue.org [192.168.1.7]) by defaultvalue.org (Postfix) with ESMTP id 1731913B87 for ; Mon, 24 Feb 2003 16:29:05 -0600 (CST) Original-Received: by raven.i.defaultvalue.org (Postfix, from userid 1000) id CD712CFF4C; Mon, 24 Feb 2003 16:29:04 -0600 (CST) Original-To: guile-devel@gnu.org User-Agent: Gnus/5.090008 (Oort Gnus v0.08) Emacs/21.2 (i386-pc-linux-gnu) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Developers list for Guile, the GNU extensibility library List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: main.gmane.org gmane.lisp.guile.devel:1963 X-Report-Spam: http://spam.gmane.org/gmane.lisp.guile.devel:1963 I have a version of guile working now using gmp for bignums, and now that I can test it, there are some issues worth discussing. Now that I've fixed some bugs confusing the issue (though there may well be others) it appears that at least in some fairly trivial testing (and I emphasize *trivial*), gmp doesn't appear faster than our old implementation. Given this code: (define (bench-random-int-op op) (let loop ((x-vals random-x-values) (y-vals random-y-values)) (if (pair? x-vals) (begin (op (car x-vals) (car y-vals)) (loop (cdr x-vals) (cdr y-vals)))))) and a set of 4000000 pre-generated random pairs of bignums, via (random 1000000000000000000), which all turned out to be bignums, I get the following results: $ ./run-benchmark guile-1.6 ./guile-1.7 ./guile-gmp foo.scm starting trials ("bench-random-op +" "guile-1.6" ((user . 201) (sys . 0) (gc . 90))) ("bench-random-op -" "guile-1.6" ((user . 179) (sys . 1) (gc . 57))) ("bench-random-op *" "guile-1.6" ((user . 219) (sys . 0) (gc . 99))) ("bench-random-op /" "guile-1.6" ((user . 262) (sys . 0) (gc . 114))) ("bench-random-op gcd" "guile-1.6" ((user . 121) (sys . 2) (gc . 44))) ("bench-random-op lcm" "guile-1.6" ((user . 275) (sys . 0) (gc . 116))) starting trials ("bench-random-op +" "./guile-1.7" ((user . 342) (sys . 1) (gc . 60))) ("bench-random-op -" "./guile-1.7" ((user . 315) (sys . 0) (gc . 59))) ("bench-random-op *" "./guile-1.7" ((user . 352) (sys . 0) (gc . 56))) ("bench-random-op /" "./guile-1.7" ((user . 380) (sys . 0) (gc . 55))) ("bench-random-op gcd" "./guile-1.7" ((user . 266) (sys . 1) (gc . 56))) ("bench-random-op lcm" "./guile-1.7" ((user . 381) (sys . 0) (gc . 55))) starting trials ("bench-random-op +" "./guile-gmp" ((user . 402) (sys . 1) (gc . 67))) ("bench-random-op -" "./guile-gmp" ((user . 386) (sys . 0) (gc . 63))) ("bench-random-op *" "./guile-gmp" ((user . 436) (sys . 0) (gc . 65))) ("bench-random-op /" "./guile-gmp" ((user . 422) (sys . 0) (gc . 65))) ("bench-random-op gcd" "./guile-gmp" ((user . 387) (sys . 0) (gc . 62))) ("bench-random-op lcm" "./guile-gmp" ((user . 463) (sys . 0) (gc . 64))) Bear in mind that there may well be mistakes in my code affecting this result. The second (perhaps related) issue involves memory use. It turns out that an mpz_t, which is the GMP integer type, takes up a minimum of 12 bytes (when an int is 4 bytes), and will always be more than that because that 12 bytes doesn't include any space for the actual digits. To make this clear, here is what an mpz_t looks like: typedef struct { int _mp_alloc; /* Number of *limbs* allocated and pointed to by the _mp_d field. */ int _mp_size; /* abs(_mp_size) is the number of limbs the last field points to. If _mp_size is negative this is a negative number. */ mp_limb_t *_mp_d; /* Pointer to the limbs. */ } __mpz_struct; Right now, in the new code a guile gmp bignum is a smob whose data pointer is an mpz_t*. Including the smob mpz_t pointer itself, this gives us a minimum size of 24-bytes for an 8-byte bignum on many 32-bit machines, and 40-bytes on many 64-bit machines :< It's also worth noting that GMP never shrinks its allocations for a given mpz_t, even if you change a giant integer to a much smaller integer via subtraction or similar. However, this may not really matter much to us since we essentially always create a new value for the result of an operation. Our old implementation, on the other hand, if I understand it correctly, would usually use a minimum of 4 bytes for a bignum, actually probably 6, and so would require 10 bytes for it's smallest bignum, including the smob pointer, and could grow in 2 byte increments (the increment size usually depended on sizeof (unsigned short)). So with the old implementation, an 8-byte bignum would probably require about 13-14 bytes, including the smob ptr. Perhaps more importantly, I believe the old implementation allowed bignums as small as 6-bytes (10 with ptr). With respect to storage use, this is a much nicer jump from a 4-byte fixnum than GMP's 20-byte likely minimum -- which would only get you a 4-byte integer. So the first question is "Is GMP's minimum usage unacceptable?". If so, then there are a variety of things we could consider (thanks to Dale for discussing this with me on irc at some length): 1) possibly scrap GMP altogether. 2) possibly try to use long-longs to provide an intermediate-sized rep where available, which could use "native" ops. This intermediate value would require 12-bytes including smob ptr and provide an 8-byte int (presuming long long == 64-bit). 3) possibly use mpz_t values only for computation, allocating them on the stack, and use the old "array of shorts" format in the heap. The big cost here would be conversions to/from mpz_t and the array of shorts, but we would be able to allocate in 2-byte increments, with the same minimum sizes we used to have. With respect to the costs, we'd have to know how much of the cost of (+ big-x big-y) is currently comprised of non-math operations i.e. how much would the conversion cost relative to the current interpretation overhead, etc. 4) possibly use GMP's lower-level ops, and manage the storage of the GMP mp_limb_t array ourselves. This should allow us to ditch mpz_t's _mp_alloc int field, and it would probably be safe to replace the _mp_size field with a short, since I doubt that we'd ever be likely to overflow long data[MAX_SHORT] digits. (What was the size limit, if any, on our old bignums?). The disadvantage here is that there would be much more code to write by hand -- we'd have to implement bignum addition/subtraction etc. ourselves from the GMP primitives, since these primitives only operate on individual _mp_limb_t values (i.e. digits). We'd have to handle carry, borrow etc., ourselves. The advantage is that these primitives are supposed to be very fast, since the common ops are likely to be coded in assembly on most platforms. This approach would get us a minimum bignum size, including the smob ptr, of 18-bytes for an 8-byte bignum (10-bytes of overhead), presuming 32-bit arch and 32-bit longs. The smallest bignum we could represent would be (still presuming 32-bit longs) 4-bytes, which including the smob ptr, would require 14-bytes of storage. 5) use more than one smob to represent bignums. While there are a variety of ways this could work, one would be to have something like: big-2: mp_limb_t data[2] (12-bytes incl smob ptr on 32-bit) big-4: mp_limb_t data[4] (16-bytes incl smob ptr on 32-bit) big-n: mp_limb_t data* (8-bytes + data, incl smob ptr on 32-bit) This would probably be the best use of space if using GMP, but it may well complicate the implementation of numbers.c too much and still has the same drawbacks as (4). 6) try a super-ugly-hack (super-ugly, presuming the GMP people consider it not-kosher to touch mpz_t internals) where we perform all our operations as mpz_t values, but then convert them to a representation like this: struct _our_mpz { mp_limb_t *data; short size; } in the heap, after just copying the _mp_d data pointer out of the result mpz_t, dropping the _mp_alloc value (after checking to be sure alloc == size -- if not, we'd have to mpz_realloc2), and then putting their _mp_size into our size (after checking to be sure it will fit). Then later, when we needed to perform computations with our representation, we'd copy this data back into a new mpz_t (mpz_t's are actually 1-element arrays of the _mpz_struct): mpz_t tmp; tmp[0]._mp_alloc = our_mpz->size; tmp[0]._mp_size = our_mpz->size; tmp[0]._mp_d = our_mpz->data; Though this mucks around in the gmp internals, it does have the distinct advantage of letting us continue to use the high-level mpz_ interface and still reduce the mpz_t storage overhead from 12-bytes minimum overhead (excluding the smob ptr) to 6 bytes on 32-bit machines, and it requires only *one* in-memory representation of the integer digits, with no conversions. If this was an option we had any interest in, it might be worth discussing with the GMP people. If they don't want people getting/setting mpz_t internals (even when you think you know what you're doing) then perhaps they would be willing to formalize something with similar effect, as part of the API, i.e. perhaps something like: void mpz_compact_and_get_data (mpz_t src, int *size, mpz_limb_t **data); or similar. This would first make sure that the limb data is only as large as the integer that the digits represent (i.e. it would realloc if needed, so that _mp_alloc becomes redundant b/c _mp_alloc == _mp_size) and then it would return the pointer to the limbs through *data, and the limb count through *size. This would then be sufficient for us to manage our conversion. All of this said, if there are others interested in this issue, I feel like whatever we do I should probably make the (still not quite finished -- i.e. random.c's bignum random is a dummy for now) gmp code available somehow, whether in a CVS branch, patch, or similar so that other people can test too. I'd rather not have our decision based solely on my evaluation since it's always possible I goofed somewhere, either in the implementation or the testing. Thanks -- Rob Browning rlb @defaultvalue.org, @linuxdevel.com, and @debian.org Previously @cs.utexas.edu GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel