From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mark H Weaver Newsgroups: gmane.lisp.guile.bugs Subject: bug#18295: Radix points in non-decimal numbers Date: Wed, 01 Oct 2014 01:19:39 -0400 Message-ID: <87y4t08iz8.fsf@yeeloong.lan> References: <87egwc6eaa.fsf@kagami.home> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: ger.gmane.org 1412140887 3512 80.91.229.3 (1 Oct 2014 05:21:27 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 1 Oct 2014 05:21:27 +0000 (UTC) Cc: 18295@debbugs.gnu.org To: Ian Price Original-X-From: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Wed Oct 01 07:21:20 2014 Return-path: Envelope-to: guile-bugs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XZCLo-0007yg-18 for guile-bugs@m.gmane.org; Wed, 01 Oct 2014 07:21:20 +0200 Original-Received: from localhost ([::1]:47808 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XZCLn-0004if-HG for guile-bugs@m.gmane.org; Wed, 01 Oct 2014 01:21:19 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:51207) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XZCLf-0004hU-GR for bug-guile@gnu.org; Wed, 01 Oct 2014 01:21:16 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XZCLX-0003Qz-3l for bug-guile@gnu.org; Wed, 01 Oct 2014 01:21:11 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:37005) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XZCLW-0003Qv-Vj for bug-guile@gnu.org; Wed, 01 Oct 2014 01:21:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1XZCLW-0003H7-2R for bug-guile@gnu.org; Wed, 01 Oct 2014 01:21:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Mark H Weaver Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Wed, 01 Oct 2014 05:21:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 18295 X-GNU-PR-Package: guile X-GNU-PR-Keywords: Original-Received: via spool by 18295-submit@debbugs.gnu.org id=B18295.141214081212518 (code B ref 18295); Wed, 01 Oct 2014 05:21:01 +0000 Original-Received: (at 18295) by debbugs.gnu.org; 1 Oct 2014 05:20:12 +0000 Original-Received: from localhost ([127.0.0.1]:56802 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XZCKh-0003Fm-1R for submit@debbugs.gnu.org; Wed, 01 Oct 2014 01:20:12 -0400 Original-Received: from world.peace.net ([96.39.62.75]:34902) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XZCKZ-0003FY-5j for 18295@debbugs.gnu.org; Wed, 01 Oct 2014 01:20:05 -0400 Original-Received: from c-24-62-95-23.hsd1.ma.comcast.net ([24.62.95.23] helo=yeeloong.lan) by world.peace.net with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1XZCKM-00007G-K6; Wed, 01 Oct 2014 01:19:50 -0400 In-Reply-To: <87egwc6eaa.fsf@kagami.home> (Ian Price's message of "Tue, 19 Aug 2014 10:04:45 +0100") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Original-Sender: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.bugs:7597 Archived-At: --=-=-= Content-Type: text/plain Hi Ian, Ian Price writes: > Occasionally it is handy to use a radix point in bases other than > decimal. i.e. 1.1 in binary is 1.5 decimal. I agree that this would be nice. Here's a preliminary patch to implement it, against stable-2.0. It works, but is not yet ready to push. It needs tests, but more importantly I'm undecided on how best to limit the exponents. The numbers are represented exactly until just before returning from string->number, so very large exponents could exhaust the available memory. Of course, a large integer can already exhaust the memory, but that's less likely to happen by accident when exponents are not involved. Also, if they _are_ converted to inexact at the end (which happens unless #e is given), large exponents will become infinite or zero, which is not as nice as an error message, although I think we already fail to do this in some cases. One easy solution would be to prohibit exponents unless radix == 10. Thoughts? If you want to work more on this, we could be co-authors of the commit. I should probably focus on other things for a while. Best, Mark --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=0001-PRELIMINARY-string-number-Support-digits-after-point.patch Content-Description: [PATCH] PRELIMINARY string->number: Support digits after point for non-decimals >From 9ed731c917d4dd9d73258d96e401da5a06bef77a Mon Sep 17 00:00:00 2001 From: Mark H Weaver Date: Mon, 29 Sep 2014 01:26:50 -0400 Subject: [PATCH] PRELIMINARY string->number: Support digits after point for non-decimals. NOTE: This limits the radix to 36. Previously, we would accept bogosities such as: (string->number "{" 37) => 36 (string->number "|" 38) => 37 TODO: Generalize code that places reasonable limits on exponents when using non-decimal radices. (search for XXX in the patch) TODO: Add tests. TODO: Split into multiple commits (use scm_t_wchar, avoid the word "decimal" except where base 10 is assumed, doc fixes, limit to radix <= 36, support digits after point for non-decimals) --- doc/ref/api-data.texi | 6 +-- libguile/numbers.c | 94 +++++++++++++++++++++---------------------- test-suite/tests/numbers.test | 6 +-- 3 files changed, 53 insertions(+), 53 deletions(-) diff --git a/doc/ref/api-data.texi b/doc/ref/api-data.texi index acdf9ca..e5880d2 100644 --- a/doc/ref/api-data.texi +++ b/doc/ref/api-data.texi @@ -1,7 +1,7 @@ @c -*-texinfo-*- @c This is part of the GNU Guile Reference Manual. -@c Copyright (C) 1996, 1997, 2000, 2001, 2002, 2003, 2004, 2006, 2007, -@c 2008, 2009, 2010, 2011, 2012, 2013, 2014 Free Software Foundation, Inc. +@c Copyright (C) 1996, 1997, 2000-2004, 2006-2014 +@c Free Software Foundation, Inc. @c See the file guile.texi for copying conditions. @node Simple Data Types @@ -1098,7 +1098,7 @@ inexact, a radix of 10 will be used. @deffnx {C Function} scm_string_to_number (string, radix) Return a number of the maximally precise representation expressed by the given @var{string}. @var{radix} must be an -exact integer, either 2, 8, 10, or 16. If supplied, @var{radix} +exact integer between 2 and 36. If supplied, @var{radix} is a default radix that may be overridden by an explicit radix prefix in @var{string} (e.g.@: "#o177"). If @var{radix} is not supplied, then the default radix is 10. If string is not a diff --git a/libguile/numbers.c b/libguile/numbers.c index c197eee..4748b51 100644 --- a/libguile/numbers.c +++ b/libguile/numbers.c @@ -1,6 +1,4 @@ -/* Copyright (C) 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, - * 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, - * 2014 Free Software Foundation, Inc. +/* Copyright (C) 1995-2014 Free Software Foundation, Inc. * * Portions Copyright 1990, 1991, 1992, 1993 by AT&T Bell Laboratories * and Bellcore. See scm_divide. @@ -5806,7 +5804,7 @@ enum t_exactness {NO_EXACTNESS, INEXACT, EXACT}; /* Caller is responsible for checking that the return value is in range for the given radix, which should be <= 36. */ static unsigned int -char_decimal_value (scm_t_uint32 c) +char_digit_value (scm_t_wchar c) { /* uc_decimal_value returns -1 on error. When cast to an unsigned int, that's certainly above any valid decimal, so we take advantage of @@ -5818,8 +5816,8 @@ char_decimal_value (scm_t_uint32 c) if (d >= 10U) { c = uc_tolower (c); - if (c >= (scm_t_uint32) 'a') - d = c - (scm_t_uint32)'a' + 10U; + if (c >= (scm_t_wchar)'a') + d = c - (scm_t_wchar)'a' + 10U; } return d; } @@ -5837,14 +5835,14 @@ mem2uinteger (SCM mem, unsigned int *p_idx, scm_t_bits add = 0; unsigned int digit_value; SCM result; - char c; + scm_t_wchar c; size_t len = scm_i_string_length (mem); if (idx == len) return SCM_BOOL_F; c = scm_i_string_ref (mem, idx); - digit_value = char_decimal_value (c); + digit_value = char_digit_value (c); if (digit_value >= radix) return SCM_BOOL_F; @@ -5862,9 +5860,9 @@ mem2uinteger (SCM mem, unsigned int *p_idx, break; else { - digit_value = char_decimal_value (c); - /* This check catches non-decimals in addition to out-of-range - decimals. */ + digit_value = char_digit_value (c); + /* This check catches non-digits in addition to out-of-range + digits. */ if (digit_value >= radix) break; } @@ -5899,18 +5897,17 @@ mem2uinteger (SCM mem, unsigned int *p_idx, } -/* R5RS, section 7.1.1, lexical structure of numbers: . Only - * covers the parts of the rules that start at a potential point. The value - * of the digits up to the point have been parsed by the caller and are given - * in variable result. The content of *p_exactness indicates, whether a hash - * has already been seen in the digits before the point. +/* R5RS, section 7.1.1, lexical structure of numbers: . We + * generalize this to support radices other than 10. Only covers the + * parts of the rules that start at a potential point. The value of the + * digits up to the point have been parsed by the caller and are given + * in variable result. The content of *p_exactness indicates, whether a + * hash has already been seen in the digits before the point. */ -#define DIGIT2UINT(d) (uc_numeric_value(d).numerator) - static SCM -mem2decimal_from_point (SCM result, SCM mem, - unsigned int *p_idx, enum t_exactness *p_exactness) +mem2real_from_point (SCM result, SCM mem, unsigned int *p_idx, + unsigned int radix, enum t_exactness *p_exactness) { unsigned int idx = *p_idx; enum t_exactness x = *p_exactness; @@ -5930,13 +5927,12 @@ mem2decimal_from_point (SCM result, SCM mem, while (idx != len) { scm_t_wchar c = scm_i_string_ref (mem, idx); - if (uc_is_property_decimal_digit ((scm_t_uint32) c)) + digit_value = char_digit_value (c); + if (digit_value < radix) { - if (x == INEXACT) - return SCM_BOOL_F; - else - digit_value = DIGIT2UINT (c); - } + if (x == INEXACT) + return SCM_BOOL_F; + } else if (c == '#') { x = INEXACT; @@ -5946,20 +5942,20 @@ mem2decimal_from_point (SCM result, SCM mem, break; idx++; - if (SCM_MOST_POSITIVE_FIXNUM / 10 < shift) + if (SCM_MOST_POSITIVE_FIXNUM / radix < shift) { big_shift = scm_product (big_shift, SCM_I_MAKINUM (shift)); result = scm_product (result, SCM_I_MAKINUM (shift)); if (add > 0) result = scm_sum (result, SCM_I_MAKINUM (add)); - shift = 10; + shift = radix; add = digit_value; } else { - shift = shift * 10; - add = add * 10 + digit_value; + shift = shift * radix; + add = add * radix + digit_value; } }; @@ -5981,6 +5977,7 @@ mem2decimal_from_point (SCM result, SCM mem, int sign = 1; unsigned int start; scm_t_wchar c; + unsigned int digit_value; int exponent; SCM e; @@ -6020,24 +6017,29 @@ mem2decimal_from_point (SCM result, SCM mem, else sign = 1; - if (!uc_is_property_decimal_digit ((scm_t_uint32) c)) + digit_value = char_digit_value (c); + if (digit_value >= radix) return SCM_BOOL_F; idx++; - exponent = DIGIT2UINT (c); + exponent = digit_value; while (idx != len) { scm_t_wchar c = scm_i_string_ref (mem, idx); - if (uc_is_property_decimal_digit ((scm_t_uint32) c)) + digit_value = char_digit_value (c); + if (digit_value < radix) { idx++; + /* XXX FIXME: This logic is not sufficient for + non-decimal numbers */ if (exponent <= SCM_MAXEXP) - exponent = exponent * 10 + DIGIT2UINT (c); + exponent = exponent * radix + digit_value; } else break; } + /* XXX FIXME: This logic is not sufficient for non-decimal numbers */ if (exponent > ((sign == 1) ? SCM_MAXEXP : SCM_MAXEXP + DBL_DIG + 1)) { size_t exp_len = idx - start; @@ -6046,7 +6048,7 @@ mem2decimal_from_point (SCM result, SCM mem, scm_out_of_range ("string->number", exp_num); } - e = scm_integer_expt (SCM_I_MAKINUM (10), SCM_I_MAKINUM (exponent)); + e = scm_integer_expt (SCM_I_MAKINUM (radix), SCM_I_MAKINUM (exponent)); if (sign == 1) result = scm_product (result, e); else @@ -6138,15 +6140,14 @@ mem2ureal (SCM mem, unsigned int *p_idx, if (scm_i_string_ref (mem, idx) == '.') { - if (radix != 10) - return SCM_BOOL_F; - else if (idx + 1 == len) + if (idx + 1 == len) return SCM_BOOL_F; - else if (!uc_is_property_decimal_digit ((scm_t_uint32) scm_i_string_ref (mem, idx+1))) + else if (char_digit_value (scm_i_string_ref (mem, idx+1)) + >= radix) return SCM_BOOL_F; else - result = mem2decimal_from_point (SCM_INUM0, mem, - p_idx, &implicit_x); + result = mem2real_from_point (SCM_INUM0, mem, p_idx, + radix, &implicit_x); } else { @@ -6173,14 +6174,13 @@ mem2ureal (SCM mem, unsigned int *p_idx, /* both are int/big here, I assume */ result = scm_i_make_ratio (uinteger, divisor); } - else if (radix == 10) + else { - result = mem2decimal_from_point (uinteger, mem, &idx, &implicit_x); + result = mem2real_from_point (uinteger, mem, &idx, + radix, &implicit_x); if (scm_is_false (result)) return SCM_BOOL_F; } - else - result = uinteger; *p_idx = idx; } @@ -6436,7 +6436,7 @@ SCM_DEFINE (scm_string_to_number, "string->number", 1, 1, 0, (SCM string, SCM radix), "Return a number of the maximally precise representation\n" "expressed by the given @var{string}. @var{radix} must be an\n" - "exact integer, either 2, 8, 10, or 16. If supplied, @var{radix}\n" + "exact integer between 2 and 36. If supplied, @var{radix}\n" "is a default radix that may be overridden by an explicit radix\n" "prefix in @var{string} (e.g. \"#o177\"). If @var{radix} is not\n" "supplied, then the default radix is 10. If string is not a\n" @@ -6451,7 +6451,7 @@ SCM_DEFINE (scm_string_to_number, "string->number", 1, 1, 0, if (SCM_UNBNDP (radix)) base = 10; else - base = scm_to_unsigned_integer (radix, 2, INT_MAX); + base = scm_to_unsigned_integer (radix, 2, 36); answer = scm_i_string_to_number (string, base); scm_remember_upto_here_1 (string); diff --git a/test-suite/tests/numbers.test b/test-suite/tests/numbers.test index 847f939..0acc3db 100644 --- a/test-suite/tests/numbers.test +++ b/test-suite/tests/numbers.test @@ -1,6 +1,6 @@ ;;;; numbers.test --- tests guile's numbers -*- scheme -*- -;;;; Copyright (C) 2000, 2001, 2003, 2004, 2005, 2006, 2009, 2010, 2011, -;;;; 2012, 2013 Free Software Foundation, Inc. +;;;; Copyright (C) 2000, 2001, 2003-2006, 2009-2014 +;;;; Free Software Foundation, Inc. ;;;; ;;;; This library is free software; you can redistribute it and/or ;;;; modify it under the terms of the GNU Lesser General Public @@ -1575,7 +1575,7 @@ (for-each (lambda (x) (if (string->number x) (throw 'fail))) '("" "q" "1q" "6+7iq" "8+9q" "10+11" "13+" "18@19q" "20@q" "23@" "+25iq" "26i" "-q" "-iq" "i" "5#.0" "8/" "10#11" ".#" "." - "#o.2" "3.4q" "15.16e17q" "18.19e+q" ".q" ".17#18" "10q" "#b2" + "3.4q" "15.16e17q" "18.19e+q" ".q" ".17#18" "10q" "#b2" "#b3" "#b4" "#b5" "#b6" "#b7" "#b8" "#b9" "#ba" "#bb" "#bc" "#bd" "#be" "#bf" "#q" "#b#b1" "#o#o1" "#d#d1" "#x#x1" "#e#e1" "#i#i1" "12@12+0i" "3/0" "0/0" "4+3/0i" "4/0-3i" "2+0/0i" -- 1.8.4 --=-=-=--