unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* Including sjson (formerly (ice-9 json)) and fash.scm in guile proper?
@ 2017-05-08 23:17 Christopher Allan Webber
  2017-05-12 19:15 ` Mark H Weaver
  2017-06-03 13:35 ` Ludovic Courtès
  0 siblings, 2 replies; 10+ messages in thread
From: Christopher Allan Webber @ 2017-05-08 23:17 UTC (permalink / raw)
  To: Guile Devel

Hello!

So a while ago, David Thompson submitted (ice-9 json) to Guile proper.
A few changes were requested, so it hadn't made it in.  In the meanwhile
I began using it for a number of projects.  I also added some
modifications and extensions: #nil became 'null for the representation
of null values, the representation got a bit easier to read for deeply
nested lists-of-dicts-of-lists-of-dicts (moved from '(@ (key . val)) to
'(@ (key val)) after some discussion with David Thompson), I added a
pretty printer, and I also added fash support, snarfing fash.scm from
Andy Wingo (I had a number of cases where I had json documents with a
*lot* of key / value pairs and I was operating on them, and being able
to read/write from a constant time datastructure as an option was
needed).

I released this as an independed library, which is even now packaged in
Guix as guile-sjson.

However, I wonder if we should package this in Guile proper.  There is
still one issue to resolve iirc, I should add more specific exception
names.

The biggest problem to me seems that we would also want to include
Wingo's fash.scm in Guile proper.  Personally, I think this would be a
big win: highly performant immutable hashmaps are desirable (and though
we have vhashes, setting an existing value keeps the old value, and are
not as fast as fashes in my experience).

So:
 - Are Guile's developers open to having an (ice-9 fash) module?
 - And should I submit (ice-9 json), with my changes?

Thanks!
 - Chris




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Including sjson (formerly (ice-9 json)) and fash.scm in guile proper?
  2017-05-08 23:17 Including sjson (formerly (ice-9 json)) and fash.scm in guile proper? Christopher Allan Webber
@ 2017-05-12 19:15 ` Mark H Weaver
  2017-05-14  1:30   ` Christopher Allan Webber
  2017-06-03 13:35 ` Ludovic Courtès
  1 sibling, 1 reply; 10+ messages in thread
From: Mark H Weaver @ 2017-05-12 19:15 UTC (permalink / raw)
  To: Christopher Allan Webber; +Cc: Guile Devel

Hi Chris,

Christopher Allan Webber <cwebber@dustycloud.org> writes:
> So a while ago, David Thompson submitted (ice-9 json) to Guile proper.
> A few changes were requested, so it hadn't made it in.  In the meanwhile
> I began using it for a number of projects.  I also added some
> modifications and extensions: #nil became 'null for the representation
> of null values,

Most of the modifications you've made are good, but I'm very
uncomfortable with the use of #nil in this API.  #nil is a terrible hack
which may not even be adequate for its intended use case.  Its existence
in any data structure is likely to cause misbehavior in other Scheme
code that is exposed to it, because it violates a longstanding fact in
Scheme that there is only one value that is treated as "false".  It
would also make it difficult or impossible to port this library, and
thus anything that depends on this library, to other Scheme systems.  We
should not promote its use by incorporating it into new APIs.

What do you think?

Otherwise, I'm generally in favor of incorporating this library into
Guile, after we make sure that it is robust against malicious inputs.

    Regards,
      Mark



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Including sjson (formerly (ice-9 json)) and fash.scm in guile proper?
  2017-05-12 19:15 ` Mark H Weaver
@ 2017-05-14  1:30   ` Christopher Allan Webber
  2017-05-15 18:35     ` Mark H Weaver
  0 siblings, 1 reply; 10+ messages in thread
From: Christopher Allan Webber @ 2017-05-14  1:30 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Guile Devel

Mark H Weaver writes:

> Hi Chris,
>
> Christopher Allan Webber <cwebber@dustycloud.org> writes:
>> So a while ago, David Thompson submitted (ice-9 json) to Guile proper.
>> A few changes were requested, so it hadn't made it in.  In the meanwhile
>> I began using it for a number of projects.  I also added some
>> modifications and extensions: #nil became 'null for the representation
>> of null values,
>
> Most of the modifications you've made are good, but I'm very
> uncomfortable with the use of #nil in this API.  #nil is a terrible hack
> which may not even be adequate for its intended use case.  Its existence
> in any data structure is likely to cause misbehavior in other Scheme
> code that is exposed to it, because it violates a longstanding fact in
> Scheme that there is only one value that is treated as "false".  It
> would also make it difficult or impossible to port this library, and
> thus anything that depends on this library, to other Scheme systems.  We
> should not promote its use by incorporating it into new APIs.
>
> What do you think?

Oh!  No you got it backwards, the library *was* using #nil initially,
and I modified it to use 'null now instead. :)

So I think you'd be probably pretty happy!

> Otherwise, I'm generally in favor of incorporating this library into
> Guile, after we make sure that it is robust against malicious inputs.

Okay, cool!  The other thing is to add more specific error messages, as
discussed.

What examples of malicious inputs should we test against?



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Including sjson (formerly (ice-9 json)) and fash.scm in guile proper?
  2017-05-14  1:30   ` Christopher Allan Webber
@ 2017-05-15 18:35     ` Mark H Weaver
  2017-05-15 19:53       ` Christopher Allan Webber
  0 siblings, 1 reply; 10+ messages in thread
From: Mark H Weaver @ 2017-05-15 18:35 UTC (permalink / raw)
  To: Christopher Allan Webber; +Cc: Guile Devel

I wrote:
> Most of the modifications you've made are good, but I'm very
> uncomfortable with the use of #nil in this API.  [...]

Christopher Allan Webber <cwebber@dustycloud.org> writes:
> Oh!  No you got it backwards, the library *was* using #nil initially,
> and I modified it to use 'null now instead. :)

Ah, my mistake.  Excellent!

Having now looked more closely, I'm mostly happy with the API, except
for one issue: I don't like the way fash support was hacked in, with the
'use-fash' flag and the (if use-fash [fash-code] [alist-code]) sprinkled
around.  If this truly needs to be done within the json library itself
(which I wonder), then I'd prefer to generalize it to support any
dictionary data structure, and thereby remove the dependency on fashes.

My main concern about fashes, besides the fact that Andy hasn't yet
proposed adding them to Guile himself, is that the implementation is
very complex, and I'd like to achieve some degree of confidence in its
correctness before adding it.  I'd also tend to favor adding a simpler,
truly immutable dictionary data structure based on Phil Bagwell's HAMTs
(Hash Array Mapped Tries) to eliminate the need for thread
synchronization, but I'm open to suggestions.

Anyway, since writing my previous message in this thread, I've started
carefully reviewing the code, making modifications as I go.  At this
point, my proposed modifications have become quite extensive.  So far,
I've reworked the code to greatly reduce heap allocations, support
arbitrary dictionary types (removing the fash dependency, while still
allowing its use), and fix various bugs (e.g. relying on unspecified
evaluation order, failure to handle 12-character hex escapes properly,
producing and accepting invalid JSON in some cases, etc).

I'll followup with another message when I've completed my proposed
revisions.  Feel free to ping me if it takes more than a week.

>> Otherwise, I'm generally in favor of incorporating this library into
>> Guile, after we make sure that it is robust against malicious inputs.
>
> Okay, cool!  The other thing is to add more specific error messages, as
> discussed.

Indeed, better error messages would be a good thing.

> What examples of malicious inputs should we test against?

I'm mostly trying to address that by careful code review.

     Regards,
       Mark



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Including sjson (formerly (ice-9 json)) and fash.scm in guile proper?
  2017-05-15 18:35     ` Mark H Weaver
@ 2017-05-15 19:53       ` Christopher Allan Webber
  2017-06-20 22:47         ` Mark H Weaver
  0 siblings, 1 reply; 10+ messages in thread
From: Christopher Allan Webber @ 2017-05-15 19:53 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Guile Devel

Mark H Weaver writes:

> I wrote:
>> Most of the modifications you've made are good, but I'm very
>> uncomfortable with the use of #nil in this API.  [...]
>
> Christopher Allan Webber <cwebber@dustycloud.org> writes:
>> Oh!  No you got it backwards, the library *was* using #nil initially,
>> and I modified it to use 'null now instead. :)
>
> Ah, my mistake.  Excellent!
>
> Having now looked more closely, I'm mostly happy with the API, except
> for one issue: I don't like the way fash support was hacked in, with the
> 'use-fash' flag and the (if use-fash [fash-code] [alist-code]) sprinkled
> around.  If this truly needs to be done within the json library itself
> (which I wonder), then I'd prefer to generalize it to support any
> dictionary data structure, and thereby remove the dependency on fashes.

I agree that it's pretty hacky.  Allowing other dictionary structures is
fine by me.

> My main concern about fashes, besides the fact that Andy hasn't yet
> proposed adding them to Guile himself, is that the implementation is
> very complex, and I'd like to achieve some degree of confidence in its
> correctness before adding it.  I'd also tend to favor adding a simpler,
> truly immutable dictionary data structure based on Phil Bagwell's HAMTs
> (Hash Array Mapped Tries) to eliminate the need for thread
> synchronization, but I'm open to suggestions.

I don't really understand enough of the field to really know what the
right direction is.  I do know that I need something that's not O(n) for
json-ld processing, though I guess one option always would have been to
read in the sexp structure and transform it before doing all that
processing.   I've long wanted a better immutable dictionary
structure in Guile though, but am open to what it would be.

> Anyway, since writing my previous message in this thread, I've started
> carefully reviewing the code, making modifications as I go.  At this
> point, my proposed modifications have become quite extensive.  So far,
> I've reworked the code to greatly reduce heap allocations, support
> arbitrary dictionary types (removing the fash dependency, while still
> allowing its use), and fix various bugs (e.g. relying on unspecified
> evaluation order, failure to handle 12-character hex escapes properly,
> producing and accepting invalid JSON in some cases, etc).
>
> I'll followup with another message when I've completed my proposed
> revisions.  Feel free to ping me if it takes more than a week.

Wow, exciting!

>>> Otherwise, I'm generally in favor of incorporating this library into
>>> Guile, after we make sure that it is robust against malicious inputs.
>>
>> Okay, cool!  The other thing is to add more specific error messages, as
>> discussed.
>
> Indeed, better error messages would be a good thing.
>
>> What examples of malicious inputs should we test against?
>
> I'm mostly trying to address that by careful code review.

Yay!  Thank you for doing it.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Including sjson (formerly (ice-9 json)) and fash.scm in guile proper?
  2017-05-08 23:17 Including sjson (formerly (ice-9 json)) and fash.scm in guile proper? Christopher Allan Webber
  2017-05-12 19:15 ` Mark H Weaver
@ 2017-06-03 13:35 ` Ludovic Courtès
  1 sibling, 0 replies; 10+ messages in thread
From: Ludovic Courtès @ 2017-06-03 13:35 UTC (permalink / raw)
  To: guile-devel

Hello!

(A bit late, but hey!)

Christopher Allan Webber <cwebber@dustycloud.org> skribis:

> So:
>  - Are Guile's developers open to having an (ice-9 fash) module?
>  - And should I submit (ice-9 json), with my changes?

FWIW I’m definitely in favor of (ice-9 json), especially if the #nil
hack has disappeared.  :-)

I’m not too familiar with (ice-9 fash) but with proper tests and
documentation it would be great.  In general, I think we need to add
more persistent data structures—lists, vlists, and vhashes are nice but
don’t cover that many use cases…

Ludo’.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Including sjson (formerly (ice-9 json)) and fash.scm in guile proper?
  2017-05-15 19:53       ` Christopher Allan Webber
@ 2017-06-20 22:47         ` Mark H Weaver
  2017-06-20 23:18           ` Mark H Weaver
                             ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Mark H Weaver @ 2017-06-20 22:47 UTC (permalink / raw)
  To: Christopher Allan Webber; +Cc: Guile Devel

[-- Attachment #1: Type: text/plain, Size: 1216 bytes --]

Hi Chris,

I'm terribly sorry for the long delay on this.  For better or worse,
I've become extremely concerned about computer security, and so I feel a
heavy responsibility to be extra careful about code that is expected to
parse hostile data.

I was also looking for a cleaner way to express this parser, and to add
better error reporting, while allowing flexibility for users to
customize the Scheme representation.  Toward these ends, I ended up
re-implementing the parser from scratch.

I've attached my current draft of the new parser.  By default, JSON
objects are represented as (@ . <alist>) where <alist> has dotted pairs,
but it's easy to ask for your preferred two-element lists using
'make-json-parser'.

The json writer is mostly okay, but it also needs to be generalized to
support the customizable representation (and maybe I went too far here
with the parser, dunno).  Also, there are a few cases where it will
generate invalid JSON, notably: if ASCII control characters are present
in strings, hex escapes must be printed instead of the raw character,
and real numbers cannot simply be printed using 'display' because of
infinities and NaNs.

Are you okay with this general direction?

      Mark


[-- Attachment #2: First draft of proposed JSON parser --]
[-- Type: text/plain, Size: 17724 bytes --]

(define-module (ice-9 json)
  #:use-module (ice-9 match)
  #:use-module (srfi srfi-11)  ; let-values
  #:export (read-json make-json-reader))

;; XXX Consider using conditions and exceptions from SRFI-34 + SRFI-35
;; or R6RS (they should be merged!)
(define (json-error port message . args)
  (throw 'json-error port (peek-char port) message args))

(define-syntax match-next
  (syntax-rules (else)
    "Like 'match' from (ice-9 match), but with a few differences.  The
first operand is PORT, which must be a variable bound to an open input
port.  The value to be matched is the result of (peek-char PORT).  If
a pattern from one of the clauses matches the peeked character (or
eof-object), then (read-char PORT) is implicitly performed before
evaluating the associated body.  If none of the patterns match, then
leave the port position unchanged, and evaluate the body of the final
'else' clause if present, or else raise a JSON error."
    ((match-next port
       (c body0 body ...)
       ...
       (else ebody0 ebody ...))
     (match (peek-char port)
       (c (read-char port) body0 body ...)
       ...
       (_ ebody0 ebody ...)))

    ((match-next port
       (c body0 body ...)
       ...)
     (match-next port
       (c body0 body ...)
       ...
       (else (json-error port "expected" '(c ...)))))))

(define-syntax match-next*
  (syntax-rules (else)
    "Like 'match-next', but with an implicit final clause that consumes
JSON whitespace characters and tries again.  Assuming that none of the
explicit patterns match JSON whitespace, this is equivalent to
consuming JSON whitespace before 'match-next', but it is more
efficient when JSON whitespace is not present."
    ((match-next* port
       (c body0 body ...)
       ...
       (else ebody0 ebody ...))
     (let loop ()
       (match-next port
         (c body0 body ...) ...
         ((or #\space #\tab #\newline #\return)
          (loop))
         (else ebody0 ebody ...))))

    ((match-next* port
       (c body0 body ...)
       ...)
     (match-next* port
       (c body0 body ...)
       ...
       (else (json-error port "expected" '(c ...)))))))

(define-syntax read-literal
  (syntax-rules ()
    "Read the given characters C C* ... from PORT and return VALUE.
Raise a JSON error if the expected characters are not found."
    ((read-literal port value c c* ...)
     (match-next port
       (c (read-literal port value c* ...))
       (else (json-error port "expected" c
                         "while reading" 'value))))
    ((read-literal port value)
     value)))

(define* (make-json-reader
          #:key
          (true           #t)
          (false          #f)
          (null           'null)
          (make-string    (lambda (s) s))
          (make-number    (lambda (x) x))
          (obj-knil       '())
          (obj-kons       (lambda (k v seed) (cons (cons k v) seed)))
          (obj-finalize   (lambda (seed) (cons '@ (reverse! seed))))
          (array-knil     '())
          (array-kons     cons)
          (array-finalize reverse!))

  ;; ws = *( %x20 /              ; Space
  ;;         %x09 /              ; Horizontal tab
  ;;         %x0A /              ; Line feed or New line
  ;;         %x0D )              ; Carriage return
  (define (consume-whitespace port)
    "Consume zero or more JSON whitespace characters (#\\space #\\tab
#\\newline or #\\return) from PORT."
    (match-next* port
      (else #t)))

  ;; DIGIT
  (define (try-read-digit port)
    "Peeks at the next character to be read from PORT.  If it's a
decimal digit, then read it and return its value.  Otherwise, leave
the port position unchanged and return false."
    (match-next port
      (#\0 0)
      (#\1 1)
      (#\2 2)
      (#\3 3)
      (#\4 4)
      (#\5 5)
      (#\6 6)
      (#\7 7)
      (#\8 8)
      (#\9 9)
      (else #f)))

  ;; *DIGIT
  (define (read-digits n width port)
    "Read the remaining zero or more decimal digits of a non-negative
integer from PORT.  Assuming that N and WIDTH are the value and width
of the digits previously read, two values are returned: the value and
width of the non-negative integer."
    (match (try-read-digit port)
      (#f (values n width))
      (digit (read-digits (+ digit (* n 10))
                          (+ width 1)
                          port))))

  ;; 1*DIGIT
  (define (read-integer port)
    "Read one or more decimal digits from PORT, and return two values:
the non-negative integer represented by those digits, and the
width (number of digits read)."
    (match (try-read-digit port)
      (#f (json-error port "expected digit"))
      (n (read-digits n 1 port))))

  ;; int = zero / ( digit1-9 *DIGIT )
  ;;   where:
  ;;     digit1-9 = %x31-39         ; 1-9
  (define (read-integer-part port)
    "Read the non-negative integer part of a JSON number, and return two
values: its value and width (number of digits read).  This procedure
differs from 'read-integer' in only one respect: if the initial digit
is 0, it returns immediately without accepting more digits."
    (match (try-read-digit port)
      (#f (json-error port "expected digit"))
      (0 (values 0 1))
      (n (read-digits n 1 port))))

  ;; [ frac ]
  ;;   where:
  ;;     frac = decimal-point 1*DIGIT
  (define (read-frac-part port)
    "Read the optional fractional part of a JSON number, and return two
values: the value represented by the digits as a non-negative integer,
and the width (number of digits read).  For example, if \".25\" is
read, then return 25 and 2.  If a decimal point is not found, leave
the port position unchanged and return 0 and 0."
    (match-next port
      (#\. (read-integer port))
      (else (values 0 0))))

  ;; [ exp ]
  ;;   where:
  ;;     e = %x65 / %x45            ; e E
  ;;     exp = e [ minus / plus ] 1*DIGIT
  (define (read-exp-part port)
    "Read the optional exponent part of a JSON number, and return the
exponent as an exact integer.  If neither e nor E are found, leave the
port position unchanged and return 0."
    (match-next port
      ((or #\e #\E) (match-next port
                      (#\-  (- (read-integer port)))
                      ;; Use unary + here to discard the second value
                      ;; returned by 'read-integer'.
                      (#\+  (+ (read-integer port)))
                      (else (+ (read-integer port)))))
      (else 0)))

  ;; number = [ minus ] int [ frac ] [ exp ]
  (define (read-non-negative-number port)
    "Read the portion of a JSON number following the optional initial
minus sign from PORT, and return its value."
    (let*-values (((int  int-width)  (read-integer-part port))
                  ((frac frac-width) (read-frac-part port))
                  ((exp)             (read-exp-part port)))
      ;; We compute the value in a way that avoids allocating
      ;; intermediate exact rationals where possible.  To achieve this,
      ;; we effectively pretend that the decimal point were moved to the
      ;; right FRAC-WIDTH places (i.e. to the right of the final digit),
      ;; and we adjust the exponent to compensate for this.  SIGNIFICAND
      ;; is the integer value of the digits after moving the decimal
      ;; point, and EFFECTIVE-EXP is the adjusted exponent.
      (let* ((significand   (+ frac (* int (expt 10 frac-width))))
             (effective-exp (- exp frac-width))
             ;; XXX if the exponent is very large (regardless of its
             ;; sign), this could result in a huge amount of memory
             ;; being allocated for VALUE and for the result of EXPT.
             ;; We should consider handling these cases more gracefully.
             ;; However, it should be noted that incorrect rounding will
             ;; occur if we perform the following computation using
             ;; inexact arithmetic.
             ;;
             ;; We handle the positive and negative EFFECTIVE-EXP cases
             ;; separately to avoid allocating an intermediate exact
             ;; rational for the result of EXPT.
             (value (if (negative? effective-exp)
                        (/ significand (expt 10 (- effective-exp)))
                        (* significand (expt 10 effective-exp)))))
        ;; If the value is an integer, return its exact value, otherwise
        ;; convert it to inexact.  In the future, we might consider
        ;; returning inexacts in more cases, e.g. for huge integers, in
        ;; order to limit the memory usage.
        (if (integer? value)
            value
            (exact->inexact value)))))

  (define (read-number port)
    "Read a JSON number from PORT and return the result of applying
MAKE-NUMBER to the number."
    (make-number (match-next* port
                   (#\- (- (read-non-negative-number port)))
                   (else (read-non-negative-number port)))))

  (define (high-surrogate? char)
    "Return true if CHAR is a UTF-16 high surrogate, otherwise return
false."
    (and (char? char)
         (<= #xD800 (char->integer char) #xDBFF)))

  (define (low-surrogate? char)
    "Return true if CHAR is a UTF-16 low surrogate, otherwise return
false."
    (and (char? char)
         (<= #xDC00 (char->integer char) #xDFFF)))

  (define (reduce-surrogate-pair high low)
    "Return the character represented by the two UTF-16 surrogate code
points (characters) HIGH and LOW."
    (let ((high (- (char->integer high) #xD800))
          (low  (- (char->integer low)  #xDC00)))
      (integer->char (+ #x10000 low (* 1024 high)))))

  ;; HEXDIG
  (define (read-hex-digit port)
    "Read a hexadecimal digit from PORT and return its value."
    (match-next port
      (#\0          #x0)
      (#\1          #x1)
      (#\2          #x2)
      (#\3          #x3)
      (#\4          #x4)
      (#\5          #x5)
      (#\6          #x6)
      (#\7          #x7)
      (#\8          #x8)
      (#\9          #x9)
      ((or #\A #\a) #xA)
      ((or #\B #\b) #xB)
      ((or #\C #\c) #xC)
      ((or #\D #\d) #xD)
      ((or #\E #\e) #xE)
      ((or #\F #\f) #xF)))

  ;; nHEXDIG
  (define (read-hex n port)
    "Read exactly N hexadecimal digits from PORT and return the
represented value."
    (if (zero? n)
        0
        (let* ((initial-digits (read-hex (- n 1) port))
               (final-digit (read-hex-digit port)))
          (+ final-digit (* 16 initial-digits)))))

  ;; unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
  (define char-set:unescaped
    (char-set-union (ucs-range->char-set #x20 #x22)
                    (ucs-range->char-set #x23 #x5C)
                    (ucs-range->char-set #x5D #x110000)))

  (define (unescaped? c)
    "Return true if C is a character that is allowed to appear unescaped
within a JSON string, otherwise return false."
    (and (char? c) (char-set-contains? char-set:unescaped c)))

  ;; [ char ]
  ;;   where:
  ;;     escape = %x5C          ; \
  ;;     char = unescaped /
  ;;        escape (
  ;;            %x22 /          ; "    quotation mark  U+0022
  ;;            %x5C /          ; \    reverse solidus U+005C
  ;;            %x2F /          ; /    solidus         U+002F
  ;;            %x62 /          ; b    backspace       U+0008
  ;;            %x66 /          ; f    form feed       U+000C
  ;;            %x6E /          ; n    line feed       U+000A
  ;;            %x72 /          ; r    carriage return U+000D
  ;;            %x74 /          ; t    tab             U+0009
  ;;            %x75 4HEXDIG )  ; uXXXX                U+XXXX
  (define (try-read-string-char port)
    "Try to read one character of a JSON string from PORT, either an
unescaped character or a backslash-introduced string escape, and
return the represented character.  If the next character from PORT
does not introduce a valid JSON string char (e.g. if the closing quote
is found), then leave the port position unchanged and return false.

Note that if a 12-character hex escape is found, only the first 6
characters (the high surrogate) is read by this procedure.  Surrogate
pairs are handled by 'read-string-chars'."
    (match-next port
      ((? unescaped? c) c)
      (#\\ (match-next port
             (#\" #\")
             (#\\ #\\)
             (#\/ #\/)
             (#\b #\backspace)
             (#\f #\page)
             (#\n #\newline)
             (#\r #\return)
             (#\t #\tab)
             (#\u (integer->char (read-hex 4 port)))))
      (else #f)))

  (define (read-string-chars port)
    "Read the contents of a JSON string from PORT, not including the
quotes, and return the represented string."
    ;; Use an output string port to accumulate the represented string as
    ;; we read it.  This entails far less heap allocation than
    ;; accumulating a list of characters.
    (let ((out (open-output-string)))
      (let loop ((char-or-false (try-read-string-char port)))
        (match char-or-false
          (#f (get-output-string out))
          ((? high-surrogate? high)
           ;; A high-surrogate was found.  Try to combine it with the
           ;; following low-surrogate.
           (match (try-read-string-char port)
             ((? low-surrogate? low)
              (write-char (reduce-surrogate-pair high low) out)
              (loop (try-read-string-char port)))
             (char-or-false
              ;; The high-surrogate was not followed by a low-surrogate.
              ;; Replace the unpaired high-surrogate with the Unicode
              ;; replacement character and continue.
              (write-char #\xFFFD out)
              (loop char-or-false))))
          ;; Write the represented character to OUT, but convert
          ;; unpaired low-surrogates to the Unicode replacement
          ;; character.
          (char (write-char (if (low-surrogate? char) #\xFFFD char)
                            out)
                (loop (try-read-string-char port)))))))

  (define (read-string port)
    "Read a JSON string from PORT and return the result of applying
MAKE-STRING to the string."
    (match-next* port
      (#\" (let ((s (make-string (read-string-chars port))))
             (match-next port
               (#\" s))))))

  (define (read-true port)
    "Read the characters \"true\" from PORT and return TRUE."
    (read-literal port true #\t #\r #\u #\e))

  (define (read-false port)
    "Read the characters \"false\" from PORT and return FALSE."
    (read-literal port false #\f #\a #\l #\s #\e))

  (define (read-null port)
    "Read the characters \"null\" from PORT and return NULL."
    (read-literal port null #\n #\u #\l #\l))

  (define (read-array-elements seed port)
    "Read the remaining elements of a JSON array from PORT, where SEED
is the result of the last call to ARRAY-KONS, or ARRAY-KNIL if no
elements have previously been read.  Returns the final seed."
    (let* ((element (read-value port))
           (seed (array-kons element seed)))
      (match-next* port
        (#\, (read-array-elements seed port))
        (else seed))))

  (define (read-array port)
    "Read a JSON array from PORT and return the result of calling
ARRAY-FINALIZE on the final seed produced using ARRAY-KNIL and
ARRAY-KONS."
    (match-next* port
      (#\[ (match-next* port
             (#\] array-knil)
             (else (let ((seed (read-array-elements array-knil port)))
                     (match-next* port
                       (#\] (array-finalize seed)))))))))

  (define (read-member port)
    "Read a JSON object member (a colon-separated string-value pair) from PORT,
and return two values: the string and the value."
    (let ((key (read-string port)))
      (match-next* port
        (#\: (let ((value (read-value port)))
               (values key value))))))

  (define (read-members seed port)
    "Read the remaining members of a JSON object from PORT, where SEED
is the result of the last call to OBJ-KONS, or OBJ-KNIL if no members
have previously been read.  Return the final seed."
    (let-values (((k v) (read-member port)))
      (let ((seed (obj-kons k v seed)))
        (match-next* port
          (#\, (read-members seed port))
          (else seed)))))

  (define (read-object port)
    "Read a JSON object from PORT and return the result of calling
OBJ-FINALIZE on the final seed produced using OBJ-KNIL and OBJ-KONS."
    (obj-finalize (match-next* port
                    (#\{ (match-next* port
                           (#\} obj-knil)
                           (else (let ((seed (read-members obj-knil port)))
                                   (match-next* port
                                     (#\} seed)))))))))

  (define (read-value port)
    "Read a JSON value from PORT and return it."
    ;; We can't use 'match-next*' here, because we don't want to consume
    ;; any non-JSON-whitespace characters before dispatching to the
    ;; appropriate reader.
    (consume-whitespace port)
    (match (peek-char port)
      (#\" (read-string port))
      (#\{ (read-object port))
      (#\[ (read-array port))
      (#\t (read-true port))
      (#\f (read-false port))
      (#\n (read-null port))
      ((or #\- #\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9)
       (read-number port))
      (_ (json-error port "expected a JSON value"))))

  (define (read-json-text port)
    "Read a JSON value, optional JSON whitespace, and EOF from PORT.
Return the represented value.  If non-JSON-whitespace characters are
found after the JSON value, raise a JSON error."
    (let ((value (read-value port)))
      (match-next* port
        ((? eof-object?) value)
        (else (json-error port "expected EOF")))))

  read-json-text)

(define read-json (make-json-reader))

;;; Local Variables:
;;; mode: scheme
;;; eval: (put 'match-next 'scheme-indent-function 1)
;;; eval: (put 'match-next* 'scheme-indent-function 1)
;;; End:

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Including sjson (formerly (ice-9 json)) and fash.scm in guile proper?
  2017-06-20 22:47         ` Mark H Weaver
@ 2017-06-20 23:18           ` Mark H Weaver
  2017-06-20 23:50           ` Mark H Weaver
  2017-06-22 16:26           ` Christopher Allan Webber
  2 siblings, 0 replies; 10+ messages in thread
From: Mark H Weaver @ 2017-06-20 23:18 UTC (permalink / raw)
  To: Christopher Allan Webber; +Cc: Guile Devel

I wrote:

> I was also looking for a cleaner way to express this parser, and to add
> better error reporting, while allowing flexibility for users to
> customize the Scheme representation.

I forgot to mention that another goal was to minimize heap allocations,
e.g. eliminating the allocation of intermediate lists of characters or
digits.

>   (define (read-array port)
>     "Read a JSON array from PORT and return the result of calling
> ARRAY-FINALIZE on the final seed produced using ARRAY-KNIL and
> ARRAY-KONS."
>     (match-next* port
>       (#\[ (match-next* port
>              (#\] array-knil)
>              (else (let ((seed (read-array-elements array-knil port)))
>                      (match-next* port
>                        (#\] (array-finalize seed)))))))))

Sorry, I forgot to apply 'array-finalize' in the empty-array case.
Here's a corrected version:

  (define (read-array port)
    "Read a JSON array from PORT and return the result of calling
ARRAY-FINALIZE on the final seed produced using ARRAY-KNIL and
ARRAY-KONS."
    (array-finalize
     (match-next* port
       (#\[ (match-next* port
              (#\] array-knil)
              (else (let ((seed (read-array-elements array-knil port)))
                      (match-next* port
                        (#\] seed)))))))))

      Mark



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Including sjson (formerly (ice-9 json)) and fash.scm in guile proper?
  2017-06-20 22:47         ` Mark H Weaver
  2017-06-20 23:18           ` Mark H Weaver
@ 2017-06-20 23:50           ` Mark H Weaver
  2017-06-22 16:26           ` Christopher Allan Webber
  2 siblings, 0 replies; 10+ messages in thread
From: Mark H Weaver @ 2017-06-20 23:50 UTC (permalink / raw)
  To: Christopher Allan Webber; +Cc: Guile Devel

I wrote:
> and real numbers cannot simply be printed using 'display' because of
> infinities and NaNs.

... and exact rationals.

      Mark



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Including sjson (formerly (ice-9 json)) and fash.scm in guile proper?
  2017-06-20 22:47         ` Mark H Weaver
  2017-06-20 23:18           ` Mark H Weaver
  2017-06-20 23:50           ` Mark H Weaver
@ 2017-06-22 16:26           ` Christopher Allan Webber
  2 siblings, 0 replies; 10+ messages in thread
From: Christopher Allan Webber @ 2017-06-22 16:26 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Guile Devel

Mark H Weaver writes:

> Hi Chris,
>
> I'm terribly sorry for the long delay on this.  For better or worse,
> I've become extremely concerned about computer security, and so I feel a
> heavy responsibility to be extra careful about code that is expected to
> parse hostile data.

No worries.  I also know you've been working hard to keep us secure in
your work on Guix and etc, and all that work is valued / appreciated!

> I was also looking for a cleaner way to express this parser, and to add
> better error reporting, while allowing flexibility for users to
> customize the Scheme representation.  Toward these ends, I ended up
> re-implementing the parser from scratch.
>
> I've attached my current draft of the new parser.  By default, JSON
> objects are represented as (@ . <alist>) where <alist> has dotted pairs,
> but it's easy to ask for your preferred two-element lists using
> 'make-json-parser'.

I like the interface to make-json-reader.  It looks like it would work
well by default, but it's also obvious to me how to extend it.  It also
looks far less disasterous than my attempt to be flexible on using
fashes or s-exps via an overdose of if clauses. ;)  Looks good!

> The json writer is mostly okay, but it also needs to be generalized to
> support the customizable representation (and maybe I went too far here
> with the parser, dunno).

I don't see the writer included below?

> Also, there are a few cases where it will generate invalid JSON,
> notably: if ASCII control characters are present in strings, hex
> escapes must be printed instead of the raw character, and real numbers
> cannot simply be printed using 'display' because of infinities and
> NaNs.
>
> Are you okay with this general direction?

I'm good with it!  It's a bit more complicated to read than David's
original design but it looks well formed (and I followed it well still),
and I'm confident in your reasoning behind its design.  I feel like the
flexibility of plugging in different datastructures as needed is a big
win, anyway.

Speaking of security issues, I've wondered before about someone giving a
json structure so large or deeply nested that it's hard to fit in memory
(maybe a structure that even just looks like "[[[[[[[[[[[[[[[[[[[[[[...").
Maybe a recursive limit, or some other type of limit, would be useful?
but I imagine that could complexify the design considerably, and maybe
it's not worth it.

I tested this on a few examples from
https://www.w3.org/TR/activitystreams-vocabulary/
and they all seemed to work fine.

Thanks for all your hard work on this!  Looking forward to seeing this land!

 - Chris

PS:

> ;; XXX Consider using conditions and exceptions from SRFI-34 + SRFI-35
> ;; or R6RS (they should be merged!)

Yeah I've wondered about this... I've come to the point where I needed
the kind of condition subtyping that srfi-35 and r6rs conditions
provide, but I'm not sure which one to use.  Advice?



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-06-22 16:26 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-05-08 23:17 Including sjson (formerly (ice-9 json)) and fash.scm in guile proper? Christopher Allan Webber
2017-05-12 19:15 ` Mark H Weaver
2017-05-14  1:30   ` Christopher Allan Webber
2017-05-15 18:35     ` Mark H Weaver
2017-05-15 19:53       ` Christopher Allan Webber
2017-06-20 22:47         ` Mark H Weaver
2017-06-20 23:18           ` Mark H Weaver
2017-06-20 23:50           ` Mark H Weaver
2017-06-22 16:26           ` Christopher Allan Webber
2017-06-03 13:35 ` Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).