From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Ivan Sokolov Newsgroups: gmane.lisp.guile.bugs Subject: bug#48758: [PATCH] Elisp reader does not support non-decimal integers Date: Mon, 31 May 2021 15:04:42 +0300 Message-ID: <87lf7v6rdx.fsf@ya.ru> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="22695"; mail-complaints-to="usenet@ciao.gmane.io" To: 48758@debbugs.gnu.org Original-X-From: bug-guile-bounces+guile-bugs=m.gmane-mx.org@gnu.org Mon May 31 14:06:14 2021 Return-path: Envelope-to: guile-bugs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lnggO-0005cl-C8 for guile-bugs@m.gmane-mx.org; Mon, 31 May 2021 14:06:12 +0200 Original-Received: from localhost ([::1]:40424 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lnggN-00037I-EE for guile-bugs@m.gmane-mx.org; Mon, 31 May 2021 08:06:11 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:36098) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lnggE-00034a-Fl for bug-guile@gnu.org; Mon, 31 May 2021 08:06:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:49607) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lnggE-0006Oi-84 for bug-guile@gnu.org; Mon, 31 May 2021 08:06:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1lnggE-0006Un-3O for bug-guile@gnu.org; Mon, 31 May 2021 08:06:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Ivan Sokolov Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Mon, 31 May 2021 12:06:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 48758 X-GNU-PR-Package: guile X-GNU-PR-Keywords: patch X-Debbugs-Original-To: bug-guile@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.162246270624899 (code B ref -1); Mon, 31 May 2021 12:06:01 +0000 Original-Received: (at submit) by debbugs.gnu.org; 31 May 2021 12:05:06 +0000 Original-Received: from localhost ([127.0.0.1]:32920 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lngfJ-0006TW-Ey for submit@debbugs.gnu.org; Mon, 31 May 2021 08:05:06 -0400 Original-Received: from lists.gnu.org ([209.51.188.17]:54330) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lngfH-0006TO-7v for submit@debbugs.gnu.org; Mon, 31 May 2021 08:05:04 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:35906) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lngfF-0001nI-8p for bug-guile@gnu.org; Mon, 31 May 2021 08:05:01 -0400 Original-Received: from forward102o.mail.yandex.net ([37.140.190.182]:46096) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lngf6-0005ba-BN for bug-guile@gnu.org; Mon, 31 May 2021 08:05:01 -0400 Original-Received: from sas1-9b68b0b83e4d.qloud-c.yandex.net (sas1-9b68b0b83e4d.qloud-c.yandex.net [IPv6:2a02:6b8:c14:2706:0:640:9b68:b0b8]) by forward102o.mail.yandex.net (Yandex) with ESMTP id 75843668192B for ; Mon, 31 May 2021 15:04:45 +0300 (MSK) Original-Received: from sas2-1cbd504aaa99.qloud-c.yandex.net (sas2-1cbd504aaa99.qloud-c.yandex.net [2a02:6b8:c14:7101:0:640:1cbd:504a]) by sas1-9b68b0b83e4d.qloud-c.yandex.net (mxback/Yandex) with ESMTP id BSdnJiQ7PK-4jJeE9GG; Mon, 31 May 2021 15:04:45 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ya.ru; s=mail; t=1622462685; bh=HvfYUi+9NU4VjJreiBICFbmipupiJ/fgA9pykBHd4lY=; h=Date:Subject:To:From:Message-ID; b=UUpDvDvil0V39amvMRyN5QPiwP+fvB0gXCJOChAJHM00ofZPhhvdv/tN3kaM3s6kf VfCBaBicHtntgL6Zndmj6wZJZC6GE8v6W1GbxB8KuFWqNM37iOFZoL/6+CyRztWQM4 UUN6fLkB63an4n1IgPDCUCnuUh7w9Cdh3c06gNDA= Authentication-Results: sas1-9b68b0b83e4d.qloud-c.yandex.net; dkim=pass header.i=@ya.ru Original-Received: by sas2-1cbd504aaa99.qloud-c.yandex.net (smtp/Yandex) with ESMTPSA id m5Ni2yAbxW-4iLiJepW; Mon, 31 May 2021 15:04:44 +0300 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client certificate not present) Received-SPF: pass client-ip=37.140.190.182; envelope-from=ivan-p-sokolov@ya.ru; helo=forward102o.mail.yandex.net X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane-mx.org@gnu.org Original-Sender: "bug-guile" Xref: news.gmane.io gmane.lisp.guile.bugs:10113 Archived-At: --=-=-= Content-Type: text/plain This patch improves Elisp compatibility. I added support for binary, octal, hexadecimal and arbitrary radix integer literals as described in Elisp manual [1], except for the final period. It should not take long to add support for the final period, but in this patch I did not do this because Emacs itself does not support final period for non-decimal integers. [1]: https://www.gnu.org/software/emacs/manual/html_node/elisp/Integer-Basics.html --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=elisp-numbers.diff diff --git a/module/language/elisp/lexer.scm b/module/language/elisp/lexer.scm index 5a0e6b3ff..1066ed0c2 100644 --- a/module/language/elisp/lexer.scm +++ b/module/language/elisp/lexer.scm @@ -186,9 +186,22 @@ ;;; against regular expressions to determine if it is possibly an ;;; integer or a float. -(define integer-regex (make-regexp "^[+-]?[0-9]+\\.?$")) - -(define float-regex +(define make-integer-regexp + (let ((alphabet "0123456789abcdefghijklmnopqrstuvwxyz")) + (lambda (radix) + (unless (<= 2 radix 36) + (error "invalid radix" radix)) + (let ((pat (string-append "^[+-]?[" (string-take alphabet radix) "]+$"))) + (make-regexp pat regexp/icase))))) + +(define get-integer-regexp + (let ((ht (make-hash-table))) + (hash-set! ht 10 (make-regexp "^[+-]?[0-9]+\\.?$")) + (lambda (radix) + (or (hash-ref ht radix) + (hash-set! ht radix (make-integer-regexp radix)))))) + +(define float-regexp (make-regexp "^[+-]?([0-9]+\\.?[0-9]*|[0-9]*\\.?[0-9]+)(e[+-]?[0-9]+)?$")) @@ -197,7 +210,9 @@ (define no-escape-punctuation (string->char-set "-+=*/_~!@$%^&:<>{}?.")) -(define (get-symbol-or-number port) +(define (get-symbol-or-number port radix) + (define integer-regexp + (and=> radix get-integer-regexp)) (let iterate ((result-chars '()) (had-escape #f)) (let* ((c (read-char port)) @@ -207,10 +222,11 @@ (values (cond ((and (not had-escape) - (regexp-exec integer-regex result)) + (regexp? integer-regexp) + (regexp-exec integer-regexp result)) 'integer) ((and (not had-escape) - (regexp-exec float-regex result)) + (regexp-exec float-regexp result)) 'float) (else 'symbol)) result)))) @@ -228,11 +244,20 @@ (unread-char c port) (finish)))))) +(define (string->integer str radix) + (let* ((num (string->number str radix)) + (exact (and=> num inexact->exact))) + (unless (number? num) + (error "expected number" str)) + (unless (integer? exact) + (error "expected integer" str)) + exact)) + ;;; Parse a circular structure marker without the leading # (which was ;;; already read and recognized), that is, a number as identifier and ;;; then either = or #. -(define (get-circular-marker port) +(define (get-circular-marker-or-number port) (call-with-values (lambda () (let iterate ((result 0)) @@ -241,13 +266,20 @@ (let ((val (- (char->integer cur) (char->integer #\0)))) (iterate (+ (* result 10) val))) (values result cur))))) - (lambda (id type) - (case type - ((#\#) `(circular-ref . ,id)) - ((#\=) `(circular-def . ,id)) - (else (lexer-error port - "invalid circular marker character" - type)))))) + (lambda (result last-ch) + (case last-ch + ((#\#) `(circular-ref . ,result)) + ((#\=) `(circular-def . ,result)) + ((#\r) + (call-with-values + (lambda () + (get-symbol-or-number port result)) + (lambda (type str) + (unless (eq? type 'integer) + (error "invalid integer read syntax" type str result)) + `(integer . ,(string->integer str result))))) + (else + (lexer-error port "invalid circular marker character" last-ch)))))) ;;; Main lexer routine, which is given a port and does look for the next ;;; token. @@ -334,13 +366,21 @@ (case c ((#\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9) (unread-char c port) - (let ((mark (get-circular-marker port))) + (let ((mark (get-circular-marker-or-number port))) (return (car mark) (cdr mark)))) ((#\') (return 'function #f)) + ((#\b #\o #\x) + (let ((radix (case c ((#\b) 2) ((#\o) 8) ((#\x) 16)))) + (call-with-values + (lambda () (get-symbol-or-number port radix)) + (lambda (type str) + (unless (eq? type 'integer) + (error "invalid integer read syntax" type str radix)) + (return 'integer (string->integer str radix)))))) ((#\:) (call-with-values - (lambda () (get-symbol-or-number port)) + (lambda () (get-symbol-or-number port #f)) (lambda (type str) (return 'symbol (make-symbol str)))))))) ;; Parentheses and other special-meaning single characters. @@ -363,7 +403,7 @@ (else (unread-char c port) (call-with-values - (lambda () (get-symbol-or-number port)) + (lambda () (get-symbol-or-number port 10)) (lambda (type str) (case type ((symbol) @@ -384,12 +424,7 @@ ;; string->number returns an inexact real. Thus we need ;; a conversion here, but it should always result in an ;; integer! - (return - 'integer - (let ((num (inexact->exact (string->number str)))) - (if (not (integer? num)) - (error "expected integer" str num)) - num))) + (return 'integer (string->integer str 10))) ((float) (return 'float (let ((num (string->number str))) (if (exact? num) diff --git a/test-suite/tests/elisp-reader.test b/test-suite/tests/elisp-reader.test index 669c4d592..b84e4756f 100644 --- a/test-suite/tests/elisp-reader.test +++ b/test-suite/tests/elisp-reader.test @@ -75,21 +75,45 @@ (symbol . abc) (paren-open . #f) (symbol . def) (paren-close . #f) (symbol . ghi) (symbol . .e5)))) - ; Here we make use of the property that exact/inexact numbers are not equal? - ; even when they have the same numeric value! - (pass-if "integers (decimal)" - (equal? (lex-string "-1 1 1. +1 01234") - '((integer . -1) (integer . 1) (integer . 1) (integer . 1) - (integer . 1234)))) - (pass-if "integers (binary)" - (equal? (lex-string "#b-1 #b1 #b+1 #b10011010010") - '((integer . -1) (integer . 1) (integer . 1) (integer . 1234)))) - (pass-if "integers (octal)" - (equal? (lex-string "#o-1 #o1 #o+1 #o2322") - '((integer . -1) (integer . 1) (integer . 1) (integer . 1234)))) - (pass-if "integers (hexadecimal)" - (equal? (lex-string "#x-1 #x1 #x+1 #x4d2") - '((integer . -1) (integer . 1) (integer . 1) (integer . 1234)))) + (with-test-prefix "integers" + ;; Here we make use of the property that exact/inexact numbers are not equal? + ;; even when they have the same numeric value! + + (with-test-prefix "decimal" + (pass-if-equal "simple" '((integer . 1)) (lex-string "1")) + (pass-if-equal "positive" '((integer . 1)) (lex-string "+1")) + (pass-if-equal "negative" '((integer . -1)) (lex-string "-1")) + (pass-if-equal "trailing dot" '((integer . 1)) (lex-string "1.")) + (pass-if-equal "all digits" + '((integer . 123456789)) (lex-string "0123456789"))) + + (with-test-prefix "binary" + (pass-if-equal "simple" '((integer . 1)) (lex-string "#b1")) + (pass-if-equal "positive" '((integer . 1)) (lex-string "#b+1")) + (pass-if-equal "negative" '((integer . -1)) (lex-string "#b-1")) + (pass-if-equal "all digits" + '((integer . #b010101)) (lex-string "#b010101"))) + + (with-test-prefix "octal" + (pass-if-equal "simple" '((integer . 1)) (lex-string "#o1")) + (pass-if-equal "positive" '((integer . 1)) (lex-string "#o+1")) + (pass-if-equal "negative" '((integer . -1)) (lex-string "#o-1")) + (pass-if-equal "all digits" + '((integer . #o01234567)) (lex-string "#o01234567"))) + + (with-test-prefix "hexadecimal" + (pass-if-equal "simple" '((integer . 1)) (lex-string "#x1")) + (pass-if-equal "positive" '((integer . 1)) (lex-string "#x+1")) + (pass-if-equal "negative" '((integer . -1)) (lex-string "#x-1")) + (pass-if-equal "all digits" + '((integer . #x0123456789abcdef)) (lex-string "#x0123456789aBcDeF"))) + + (with-test-prefix "arbitrary radix" + (pass-if-equal "simple" '((integer . 44)) (lex-string "#24r1k")) + (pass-if-equal "positive" '((integer . 44)) (lex-string "#24r+1k")) + (pass-if-equal "negative" '((integer . -44)) (lex-string "#24r-1k")) + (pass-if-equal "max radix" '((integer . 35)) (lex-string "#36rz")))) + (pass-if "floats" (equal? (lex-string "1500.0 15e2 15.e2 1.5e3 .15e4 -.345e-2") '((float . 1500.0) (float . 1500.0) (float . 1500.0) --=-=-=--