unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
From: Matt Wette <matthew.wette@verizon.net>
To: guile-user@gnu.org
Subject: [potluck dish] the module (potluck struct)
Date: Tue, 16 Feb 2016 05:28:26 -0800	[thread overview]
Message-ID: <2CCB4762-E825-4DA8-AC3A-79D9594C872F@verizon.net> (raw)
In-Reply-To: <87mvr8ood1.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 2102 bytes --]

If you have used the Python struct module then this will look
familiar.  Otherwise, check out
https://docs.python.org/2/library/struct.html

Attached are three files: * struct.scm: the source code * struct.texi:
documentation * struct.test: test code


Struct Module =============

The '(potluck struct)' module provides procedures for packing and
unpacking scheme data to and from bytevectors based on a format
string.

     (use-modules (potluck struct))

     ;; pack two unsigned shorts and a double float in big endian
     order (define data (pack ">2Hd" 3 22 34.0)) (write data)
     (newline) ==> #vu8(0 3 0 22 64 65 0 0 0 0 0 0)

     ;; verify using unpack (write (unpack ">2Hd" data)) (newline) ==>
     (3 22 34.0)

 -- Scheme Procedure: pack format vals ...  Return a bytevector that
     contains encoded data from VALS, based on the string FORMAT.

 -- Scheme Procedure: unpack format bvec Return a list of scheme
     objects decoded from the bytevector BVEC, based on the string
     FORMAT.

 -- Scheme Procedure: packed-size format Return the number of bytes
     represented by the string FORMAT.

   The _format_ string used for PACK and UNPACK is constructed as a
sequence of digits, representing a repeat count, and codes,
representing the binary content.

The string may optionally begin with a special character that
represents the endianness: = native endianness < little-endian >
big-endian !  network order -- i.e., big-endian

Type codes used in the format string are interpreted as follows: x
    blank byte c 8-bit character ?  boolean b signed 8-bit integer B
    unsigned 8-bit integer h signed 16-bit integer H unsigned 16-bit
    integer i signed 32-bit integer I unsigned 32-bit integer l signed
    32-bit integer L unsigned 32-bit integer q signed 64-bit integer Q
    unsigned 64-bit integer f 32-bit IEEE floating point d 64-bit IEEE
    floating point s string

   The following issues remain to be addressed: string padding 'pack'
assumes that the string length in the format is the same as in the
passed string.  Non-conformance is not trapped as an error.



[-- Attachment #2.1: Type: text/html, Size: 5664 bytes --]

[-- Attachment #2.2: struct.scm --]
[-- Type: application/octet-stream, Size: 7550 bytes --]

;;; potluck/struct.scm - byte pack/unpack, like the Python struct module
;;;
;;; Copyright (C) 2016 Matthew R. Wette
;;;
;;; This library is free software; you can redistribute it and/or
;;; modify it under the terms of the GNU Lesser General Public
;;; License as published by the Free Software Foundation; either
;;; version 3 of the License, or (at your option) any later version.

(define-module (potluck struct)
  #:export (unpack pack packed-size)
  #:use-module (rnrs bytevectors))


;; @deffn ctoi-at str ix => integer
;; Return integer value of the character in string @var{str} at index @var{ix}.
(define (ctoi-at str ix) (- (char->integer (string-ref str ix)) 48))

;; character codes used to indicate endianness:
(define cs:md (string->char-set "=<>!"))

;; @deffn get-nd code => endianness
;; Return endianness given the character code @var{code}.
(define (get-nd code)
  (case code
    ((#\!) (endianness big))		; network
    ((#\>) (endianness big))
    ((#\<) (endianness little))
    ((#\=) (native-endianness))
    ((#\@) (error "alignment not supported"))
    (else  (native-endianness))))

;; character codes used to indicate type:
(define cs:df (string->char-set "xcbB?hHiIlLqQfdsp")) ; type char

;; @deffn bv-size ct ch => byte count
;; Return the size in bytes for data indicated by format count and type.
;; @example
;; (get-size 12 #\s) => 12
;; (get-size 12 #\i) => 4
(define (bv-size ct ch)
  (case ch
    ((#\x) 1) ((#\c) 1) ((#\b) 1) ((#\B) 1) ((#\?) 1)
    ((#\h) 2) ((#\H) 2) ((#\i) 4) ((#\I) 4) ((#\l) 4) ((#\L) 4)
    ((#\q) 8) ((#\Q) 8) ((#\f) 4) ((#\d) 8) ((#\s #\p) ct)
    (else (error "unknown code"))))

;; @deffn fmt-cnt ct ch => datum count
;; Return the number of datums indicated by the format count and type.
(define (fmt-cnt ct ch)
  (case ch
    ((#\s #\p) ct)
    (else 1)))
  

;; set value, return number bytes written
;; This is a helper for @code{pack}.
(define (set-value! bv ix nd ct ch val)
  (case ch
    ((#\x) (if #f #f))
    ((#\c) ;; todo: check for 8-bit char
     (bytevector-u8-set! bv ix (char->integer val) nd))
    ((#\b) (bytevector-s8-set! bv ix val))
    ((#\B) (bytevector-u8-set! bv ix val))
    ((#\?) (bytevector-u8-set! bv ix (if val 1 0) nd))
    ((#\h) (bytevector-s16-set! bv ix val nd))
    ((#\H) (bytevector-u16-set! bv ix val nd))
    ((#\i #\l) (bytevector-s32-set! bv ix val nd))
    ((#\I #\L) (bytevector-u32-set! bv ix val nd))
    ((#\q) (bytevector-s64-set! bv ix val nd))
    ((#\Q) (bytevector-u64-set! bv ix val nd))
    ((#\f) (bytevector-ieee-single-set! bv ix val nd))
    ((#\d) (bytevector-ieee-double-set! bv ix val nd))
    ((#\s #\p)
     (bytevector-copy!
      (u8-list->bytevector (map char->integer (string->list val))) 0 bv ix sz))
    (else
     (scm-error 'misc-error "unpack"
		"bad type code: ~A" '(ch) #f))))

;; @deffn pack format datum ... => bytevector
;; Pack the datums into a bytevector.
(define (pack format . args)
  (cond
   ((zero? (string-length format)) (make-bytevector 0))
   (else
    (let* ((char-at (lambda (ix) (string-ref format ix)))
	   (f0 (if (char-set-contains? cs:md (char-at 0)) 1 0))
	   (nd (get-nd (char-at 0)))
	   (ln (string-length format))
	   (bvec (make-bytevector (packed-size format))))
      
      (let iter ((bx 0)		     ; index into resulting bytevector
		 (fx f0)	     ; index into format
		 (vals args)	     ; values to add
		 (ct 0)		     ; count from format
		 (ch #f))	     ; char from format
	;;(simple-format #t "bx=~S fx=~S ct=~S ch=~S\n" bx fx ct ch)
	(cond
	 ((positive? ct)		; encode a value
	  (set-value! bvec bx nd ct ch (car vals))
	  (iter (+ bx (bv-size ct ch)) fx (cdr vals) (- ct (fmt-cnt ct ch)) ch))
	 ((= fx ln)			; done
	  bvec)
	 ((null? vals)
	  (scm-error 'misc-error "pack"
		     "format size larger than input size" '() #f))
	 ((char-numeric? (string-ref format fx))
	  (iter bx (1+ fx) vals (- (* 10 ct) (ctoi-at format fx)) ch))
	 ((zero? ct)
	  (iter bx fx vals -1 ch))
	 ((char-set-contains? cs:df (string-ref format fx))
	  (iter bx (1+ fx) vals (- ct) (string-ref format fx)))
	 (else
	  (scm-error 'misc-error "pack"
		     "pack error" '() #f))))))))


;; @deffn cons-value bv ix nd cd tail => list
;; Cons the datum indicated by the data with @var{tail}, where
;; @itemize
;; @item @var{bv} is the bytevector
;; @item @var{ix} is the index into the bytevector
;; @item @var{nd} is the endianness
;; @item @var{cd} is the code
;; @end itemize
;; This is a helper for @code{unpack}.
(define cons-value
  (let ((sbuf (make-bytevector 128)))
    (lambda (bv ix nd sz ch tail)
      (case ch
	((#\x) tail)
	((#\c) (cons (integer->char (bytevector-u8-ref bv ix)) tail))
	((#\b) (cons (bytevector-s8-ref bv ix) tail))
	((#\B) (cons (bytevector-u8-ref bv ix) tail))
	((#\?) (cons (if (zero? (bytevector-u8-ref bv ix)) #f #t) tail))
	((#\h) (cons (bytevector-s16-ref bv ix nd) tail))
	((#\H) (cons (bytevector-u16-ref bv ix nd) tail))
	((#\i) (cons (bytevector-s32-ref bv ix nd) tail))
	((#\I) (cons (bytevector-u32-ref bv ix nd) tail))
	((#\l) (cons (bytevector-s32-ref bv ix nd) tail))
	((#\L) (cons (bytevector-u32-ref bv ix nd) tail))
	((#\q) (cons (bytevector-s64-ref bv ix nd) tail))
	((#\Q) (cons (bytevector-u64-ref bv ix nd) tail))
	((#\f) (cons (bytevector-ieee-single-ref bv ix nd) tail))
	((#\d) (cons (bytevector-ieee-double-ref bv ix nd) tail))
	((#\s #\p)
	 (set! sbuf (make-bytevector sz))
	 (bytevector-copy! bv ix sbuf 0 sz)
	 (cons (utf8->string sbuf) tail))
	(else
	 (scm-error 'misc-error "unpack"
		    "bad type code: ~A" '(ch) #f))))))

;; @deffn unpack format bytevec => list
;; Unpack datums from the bytevector into a list.
(define (unpack format bytevec)
  (cond
   ((zero? (string-length format)) '())
   (else
    (let* ((char-at (lambda (ix) (string-ref format ix)))
	   (f0 (if (char-set-contains? cs:md (char-at 0)) 1 0))
	   (nd (get-nd (char-at 0)))
	   (ln (string-length format)))
      (let iter ((rz '())			; result, list of bytevectors
		 (bx 0)			; index into input bytevector
		 (fx f0)			; index into format string
		 (ct 0)			; format count
		 (ch #f))			; format char
	;;(simple-format #t "bx=~S fx=~S ct=~S ch=~S\n" bx fx ct ch)
	(cond
	 ((> fx ln)
	  (error "format size larger than input bv size"))
	 ((positive? ct)
	  (iter (cons-value bytevec bx nd ct ch rz)
		(+ bx (bv-size ct ch)) fx (- ct (fmt-cnt ct ch)) ch))
	 ((= fx ln)
	  ;;(if (not (= bx (bytevector-length bytevec))) (error "error"))
	  (reverse rz))
	 ((char-numeric? (string-ref format fx))
	  (iter rz bx (1+ fx) (- (* 10 ct) (ctoi-at format fx)) ch))
	 ((zero? ct)
	  (iter rz bx fx -1 ch))
	 ((char-set-contains? cs:df (string-ref format fx))
	  (iter rz bx (1+ fx) (- ct) (string-ref format fx)))
	 (else
	  (scm-error 'misc-error "unpack" "format error" '() #f))))))))


;; @deffn packed-size format => size
;; In the Python struct module this is called "calcsize".
(define (packed-size format)
  (cond
   ((zero? (string-length format)) 0)
   (else
    (let* ((char-at (lambda (ix) (string-ref format ix)))
	   (f0 (if (char-set-contains? cs:md (char-at 0)) 1 0))
	   (ln (string-length format)))
      (let iter ((sz 0) (fx f0) (ct 0))	; sz: result, fx: inddx; ct: count
	(cond
	 ((= fx ln)
	  sz)
	 ((char-numeric? (string-ref format fx))
	  (iter sz (1+ fx) (+ (* 10 ct) (ctoi-at format fx))))
	 ((zero? ct)
	  (iter sz fx 1))
	 ((char-set-contains? cs:df (string-ref format fx))
	  (iter (+ sz (* ct (bv-size 1 (string-ref format fx)))) (1+ fx) 0))
	 (else
	  (scm-error 'misc-error "unpack" "format error" '() #f))))))))
  
;;; --- last line ---

[-- Attachment #2.3: Type: text/html, Size: 133 bytes --]

[-- Attachment #2.4: struct.test --]
[-- Type: application/octet-stream, Size: 3448 bytes --]

;; struct.test				-*- scheme -*-
;;
;; Copyright (C) 2015 Matthew R. Wette
;; 
;; Copying and distribution of this file, with or without modification,
;; are permitted in any medium without royalty provided the copyright
;; notice and this notice are preserved.  This file is offered as-is,
;; without any warranty.

(use-modules (potluck struct))		; pack, unpack, packed-size
(use-modules (rnrs bytevectors))
(use-modules (srfi srfi-2))		; and-let*

(set! *random-state* (random-state-from-platform))

(define (test1)
  (if (not (and-let*
	       (((= (packed-size "33s") 33))
		((= (packed-size "4I") 16))
		((= (packed-size "3B") 3))
		((= (packed-size "i") 4))
		)))
      (error "packed-size broken")))

(define (test2)
  (let* ((bv04 (make-bytevector 4)))
    ;; i
    (bytevector-s32-set! bv04 0 -1234 (native-endianness))
    (or (= (car (unpack "i" bv04)) -1234) (error "error"))
    ;; I
    (bytevector-s32-set! bv04 0 1234 (native-endianness))
    (or (= (car (unpack "I" bv04)) 1234) (error "error"))
    ))

(define (test3)
  (let* ((data (pack "I" 1234))
	 (vals (unpack "I" data)))
    (if (eqv? (car vals) 1234) #t (error "failed"))
    #t))

(define (test4)
  (define data (pack ">2Hd" 3 22 34.0)) ; pack two unsigned and a double
  (write data)(newline)
  (write (unpack ">2Hd" data))(newline))


(define (test5)

  ;; make a truncated copy of a bytevector
  (define (mk-bvec bv0 len)
    (let ((bv1 (make-bytevector len)))
      (bytevector-copy! bv0 0 bv1 0 len)
      bv1))

  ;; check pack/unpack consistency given
  ;;   format string, binary data, and list of datums
  (define (do-test format data vals)
    (when #f
      (simple-format #t "fm=~S\n" format)
      (simple-format #t "xv=~S\n" vals)
      (simple-format #t "us=~S\n" (unpack format data)))
    ;; Test pack:
    (if (not (equal? data (apply pack format vals)))
	(error "pack not working"))
    ;; Test unpack:
    (let iter ((xvals vals) (svals (unpack format data)))
      (cond
       ((and (null? xvals) (pair? svals))
	(error "mismatched count"))
       ((and (pair? xvals) (null? svals))
	(error "mismatched count"))
       ((null? xvals)
	#t)
       ((not (eqv? (car xvals) (car svals)))
	(error "value mismatch"))
       (else
	(iter (cdr xvals) (cdr svals)))))
    #f)

  (define (r-ct) (random 5))
  (define (r-ty) (random 8))
  
  (let ((bv (make-bytevector 1024))
	(nd (native-endianness))
	)
    (let iter ((fl '()) (bx 0) (xpt '()) (rc (r-ct)) (rt (r-ty)))
      ;;(simple-format #t "fl=~S bx=~S xpt=~S\n" fl bx xpt)
      (case rt
	((0)
	 (do-test (string-join (reverse fl) "") (mk-bvec bv bx) (reverse xpt)))
	((1)
	 (bytevector-s8-set! bv bx -123)
	 (iter (cons "1b" fl) (+ 1 bx) (cons -123 xpt) (r-ct) (r-ty)))
	((2)
	 (bytevector-s16-set! bv bx -1234 nd)
	 (iter (cons "1h" fl) (+ 2 bx) (cons -1234 xpt) (r-ct) (r-ty)))
	((3)
	 (bytevector-s32-set! bv bx -9123 nd)
	 (iter (cons "1i" fl) (+ 4 bx) (cons -9123 xpt) (r-ct) (r-ty)))
	((4)
	 (bytevector-s32-set! bv bx -3991123 nd)
	 (iter (cons "1l" fl) (+ 4 bx) (cons -3991123 xpt) (r-ct) (r-ty)))
	((5)
	 (bytevector-s64-set! bv bx -3339123 nd)
	 (iter (cons "1q" fl) (+ 8 bx) (cons -3339123 xpt) (r-ct) (r-ty)))
	((6)
	 (bytevector-ieee-single-set! bv bx 1.32e4 nd)
	 (iter (cons "1f" fl) (+ 4 bx) (cons 1.32e4 xpt) (r-ct) (r-ty)))
	((7)
	 (bytevector-ieee-double-set! bv bx 1.32e4 nd)
	 (iter (cons "1d" fl) (+ 8 bx) (cons 1.32e4 xpt) (r-ct) (r-ty)))
	))))

(test5)

;; --- last line ---

[-- Attachment #2.5: Type: text/html, Size: 133 bytes --]

[-- Attachment #2.6: struct.texi --]
[-- Type: application/octet-stream, Size: 2614 bytes --]

@c -*-texinfo-*-

@c Copyright (C) 2016 Matthew R. Wette
@c
@c Permission is granted to copy, distribute and/or modify this document
@c under the terms of the GNU Free Documentation License, Version 1.3 or
@c any later version published by the Free Software Foundation; with no
@c Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.

@node Struct
@section Struct Module

The @code{(potluck struct)} module provides procedures for packing and
unpacking scheme data to and from bytevectors based on a format string.
@c (@ref{Bytevectors})

@example
(use-modules (potluck struct))

;; pack two unsigned shorts and a double float in big endian order
(define data (pack ">2Hd" 3 22 34.0))
(write data) (newline)
==>
#vu8(0 3 0 22 64 65 0 0 0 0 0 0)

;; verify using unpack
(write (unpack ">2Hd" data)) (newline)
==>
(3 22 34.0)
@end example

@deffn {Scheme Procedure} pack format vals @dots{}
Return a bytevector that contains encoded data from @var{vals}, based on
the string @var{format}.
@end deffn

@deffn {Scheme Procedure} unpack format bvec
Return a list of scheme objects decoded from the bytevector
@var{bvec}, based on the string @var{format}.
@end deffn

@deffn {Scheme Procedure} packed-size format
Return the number of bytes represented by the string @var{format}.
@end deffn

The @emph{format} string used for @var{pack} and @var{unpack} is
constructed as a sequence of digits, representing a repeat count, and codes,
representing the binary content.

@noindent
The string may optionally begin with a special character that
represents the endianness:
@verbatim
    =        native endianness
    <        little-endian 
    >        big-endian 
    !        network order -- i.e., big-endian
@end verbatim

@noindent
Type codes used in the format string are interpreted as follows:
@verbatim
    x        blank byte
    c        8-bit character
    ?        boolean
    b        signed 8-bit integer
    B        unsigned 8-bit integer
    h        signed 16-bit integer
    H        unsigned 16-bit integer
    i        signed 32-bit integer
    I        unsigned 32-bit integer
    l        signed 32-bit integer
    L        unsigned 32-bit integer
    q        signed 64-bit integer
    Q        unsigned 64-bit integer
    f        32-bit IEEE floating point
    d        64-bit IEEE floating point
    s        string
@end verbatim

The following issues remain to be addressed:
@table @asis
@item string padding
@code{pack} assumes that the string length in the format is the same
as in the passed string.  Non-conformance is not trapped as an error.
@end table

@c --- last line ---

[-- Attachment #2.7: Type: text/html, Size: 151 bytes --]

  parent reply	other threads:[~2016-02-16 13:28 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-10 10:23 Potluck time! Ludovic Courtès
2016-02-10 17:35 ` Christopher Allan Webber
2016-02-10 20:17 ` Stefan Israelsson Tampe
2016-02-13 14:36 ` Stefan Israelsson Tampe
2016-02-13 20:32   ` Stefan Israelsson Tampe
2016-02-16 13:28 ` Matt Wette [this message]
2016-02-16 13:30 ` [potluck dish] the module (potluck regexc) Matt Wette
2016-02-16 13:45 ` [potluck dish] the (potluck struct) module Matt Wette
2016-02-16 13:53   ` Matt Wette
2016-05-08  9:59     ` Nala Ginrut
2016-05-08 14:34       ` Matt Wette

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2CCB4762-E825-4DA8-AC3A-79D9594C872F@verizon.net \
    --to=matthew.wette@verizon.net \
    --cc=guile-user@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).