From: Matt Wette <matthew.wette@verizon.net>
To: guile-user@gnu.org
Subject: [potluck dish] the module (potluck struct)
Date: Tue, 16 Feb 2016 05:28:26 -0800 [thread overview]
Message-ID: <2CCB4762-E825-4DA8-AC3A-79D9594C872F@verizon.net> (raw)
In-Reply-To: <87mvr8ood1.fsf@gnu.org>
[-- Attachment #1: Type: text/plain, Size: 2102 bytes --]
If you have used the Python struct module then this will look
familiar. Otherwise, check out
https://docs.python.org/2/library/struct.html
Attached are three files: * struct.scm: the source code * struct.texi:
documentation * struct.test: test code
Struct Module =============
The '(potluck struct)' module provides procedures for packing and
unpacking scheme data to and from bytevectors based on a format
string.
(use-modules (potluck struct))
;; pack two unsigned shorts and a double float in big endian
order (define data (pack ">2Hd" 3 22 34.0)) (write data)
(newline) ==> #vu8(0 3 0 22 64 65 0 0 0 0 0 0)
;; verify using unpack (write (unpack ">2Hd" data)) (newline) ==>
(3 22 34.0)
-- Scheme Procedure: pack format vals ... Return a bytevector that
contains encoded data from VALS, based on the string FORMAT.
-- Scheme Procedure: unpack format bvec Return a list of scheme
objects decoded from the bytevector BVEC, based on the string
FORMAT.
-- Scheme Procedure: packed-size format Return the number of bytes
represented by the string FORMAT.
The _format_ string used for PACK and UNPACK is constructed as a
sequence of digits, representing a repeat count, and codes,
representing the binary content.
The string may optionally begin with a special character that
represents the endianness: = native endianness < little-endian >
big-endian ! network order -- i.e., big-endian
Type codes used in the format string are interpreted as follows: x
blank byte c 8-bit character ? boolean b signed 8-bit integer B
unsigned 8-bit integer h signed 16-bit integer H unsigned 16-bit
integer i signed 32-bit integer I unsigned 32-bit integer l signed
32-bit integer L unsigned 32-bit integer q signed 64-bit integer Q
unsigned 64-bit integer f 32-bit IEEE floating point d 64-bit IEEE
floating point s string
The following issues remain to be addressed: string padding 'pack'
assumes that the string length in the format is the same as in the
passed string. Non-conformance is not trapped as an error.
[-- Attachment #2.1: Type: text/html, Size: 5664 bytes --]
[-- Attachment #2.2: struct.scm --]
[-- Type: application/octet-stream, Size: 7550 bytes --]
;;; potluck/struct.scm - byte pack/unpack, like the Python struct module
;;;
;;; Copyright (C) 2016 Matthew R. Wette
;;;
;;; This library is free software; you can redistribute it and/or
;;; modify it under the terms of the GNU Lesser General Public
;;; License as published by the Free Software Foundation; either
;;; version 3 of the License, or (at your option) any later version.
(define-module (potluck struct)
#:export (unpack pack packed-size)
#:use-module (rnrs bytevectors))
;; @deffn ctoi-at str ix => integer
;; Return integer value of the character in string @var{str} at index @var{ix}.
(define (ctoi-at str ix) (- (char->integer (string-ref str ix)) 48))
;; character codes used to indicate endianness:
(define cs:md (string->char-set "=<>!"))
;; @deffn get-nd code => endianness
;; Return endianness given the character code @var{code}.
(define (get-nd code)
(case code
((#\!) (endianness big)) ; network
((#\>) (endianness big))
((#\<) (endianness little))
((#\=) (native-endianness))
((#\@) (error "alignment not supported"))
(else (native-endianness))))
;; character codes used to indicate type:
(define cs:df (string->char-set "xcbB?hHiIlLqQfdsp")) ; type char
;; @deffn bv-size ct ch => byte count
;; Return the size in bytes for data indicated by format count and type.
;; @example
;; (get-size 12 #\s) => 12
;; (get-size 12 #\i) => 4
(define (bv-size ct ch)
(case ch
((#\x) 1) ((#\c) 1) ((#\b) 1) ((#\B) 1) ((#\?) 1)
((#\h) 2) ((#\H) 2) ((#\i) 4) ((#\I) 4) ((#\l) 4) ((#\L) 4)
((#\q) 8) ((#\Q) 8) ((#\f) 4) ((#\d) 8) ((#\s #\p) ct)
(else (error "unknown code"))))
;; @deffn fmt-cnt ct ch => datum count
;; Return the number of datums indicated by the format count and type.
(define (fmt-cnt ct ch)
(case ch
((#\s #\p) ct)
(else 1)))
;; set value, return number bytes written
;; This is a helper for @code{pack}.
(define (set-value! bv ix nd ct ch val)
(case ch
((#\x) (if #f #f))
((#\c) ;; todo: check for 8-bit char
(bytevector-u8-set! bv ix (char->integer val) nd))
((#\b) (bytevector-s8-set! bv ix val))
((#\B) (bytevector-u8-set! bv ix val))
((#\?) (bytevector-u8-set! bv ix (if val 1 0) nd))
((#\h) (bytevector-s16-set! bv ix val nd))
((#\H) (bytevector-u16-set! bv ix val nd))
((#\i #\l) (bytevector-s32-set! bv ix val nd))
((#\I #\L) (bytevector-u32-set! bv ix val nd))
((#\q) (bytevector-s64-set! bv ix val nd))
((#\Q) (bytevector-u64-set! bv ix val nd))
((#\f) (bytevector-ieee-single-set! bv ix val nd))
((#\d) (bytevector-ieee-double-set! bv ix val nd))
((#\s #\p)
(bytevector-copy!
(u8-list->bytevector (map char->integer (string->list val))) 0 bv ix sz))
(else
(scm-error 'misc-error "unpack"
"bad type code: ~A" '(ch) #f))))
;; @deffn pack format datum ... => bytevector
;; Pack the datums into a bytevector.
(define (pack format . args)
(cond
((zero? (string-length format)) (make-bytevector 0))
(else
(let* ((char-at (lambda (ix) (string-ref format ix)))
(f0 (if (char-set-contains? cs:md (char-at 0)) 1 0))
(nd (get-nd (char-at 0)))
(ln (string-length format))
(bvec (make-bytevector (packed-size format))))
(let iter ((bx 0) ; index into resulting bytevector
(fx f0) ; index into format
(vals args) ; values to add
(ct 0) ; count from format
(ch #f)) ; char from format
;;(simple-format #t "bx=~S fx=~S ct=~S ch=~S\n" bx fx ct ch)
(cond
((positive? ct) ; encode a value
(set-value! bvec bx nd ct ch (car vals))
(iter (+ bx (bv-size ct ch)) fx (cdr vals) (- ct (fmt-cnt ct ch)) ch))
((= fx ln) ; done
bvec)
((null? vals)
(scm-error 'misc-error "pack"
"format size larger than input size" '() #f))
((char-numeric? (string-ref format fx))
(iter bx (1+ fx) vals (- (* 10 ct) (ctoi-at format fx)) ch))
((zero? ct)
(iter bx fx vals -1 ch))
((char-set-contains? cs:df (string-ref format fx))
(iter bx (1+ fx) vals (- ct) (string-ref format fx)))
(else
(scm-error 'misc-error "pack"
"pack error" '() #f))))))))
;; @deffn cons-value bv ix nd cd tail => list
;; Cons the datum indicated by the data with @var{tail}, where
;; @itemize
;; @item @var{bv} is the bytevector
;; @item @var{ix} is the index into the bytevector
;; @item @var{nd} is the endianness
;; @item @var{cd} is the code
;; @end itemize
;; This is a helper for @code{unpack}.
(define cons-value
(let ((sbuf (make-bytevector 128)))
(lambda (bv ix nd sz ch tail)
(case ch
((#\x) tail)
((#\c) (cons (integer->char (bytevector-u8-ref bv ix)) tail))
((#\b) (cons (bytevector-s8-ref bv ix) tail))
((#\B) (cons (bytevector-u8-ref bv ix) tail))
((#\?) (cons (if (zero? (bytevector-u8-ref bv ix)) #f #t) tail))
((#\h) (cons (bytevector-s16-ref bv ix nd) tail))
((#\H) (cons (bytevector-u16-ref bv ix nd) tail))
((#\i) (cons (bytevector-s32-ref bv ix nd) tail))
((#\I) (cons (bytevector-u32-ref bv ix nd) tail))
((#\l) (cons (bytevector-s32-ref bv ix nd) tail))
((#\L) (cons (bytevector-u32-ref bv ix nd) tail))
((#\q) (cons (bytevector-s64-ref bv ix nd) tail))
((#\Q) (cons (bytevector-u64-ref bv ix nd) tail))
((#\f) (cons (bytevector-ieee-single-ref bv ix nd) tail))
((#\d) (cons (bytevector-ieee-double-ref bv ix nd) tail))
((#\s #\p)
(set! sbuf (make-bytevector sz))
(bytevector-copy! bv ix sbuf 0 sz)
(cons (utf8->string sbuf) tail))
(else
(scm-error 'misc-error "unpack"
"bad type code: ~A" '(ch) #f))))))
;; @deffn unpack format bytevec => list
;; Unpack datums from the bytevector into a list.
(define (unpack format bytevec)
(cond
((zero? (string-length format)) '())
(else
(let* ((char-at (lambda (ix) (string-ref format ix)))
(f0 (if (char-set-contains? cs:md (char-at 0)) 1 0))
(nd (get-nd (char-at 0)))
(ln (string-length format)))
(let iter ((rz '()) ; result, list of bytevectors
(bx 0) ; index into input bytevector
(fx f0) ; index into format string
(ct 0) ; format count
(ch #f)) ; format char
;;(simple-format #t "bx=~S fx=~S ct=~S ch=~S\n" bx fx ct ch)
(cond
((> fx ln)
(error "format size larger than input bv size"))
((positive? ct)
(iter (cons-value bytevec bx nd ct ch rz)
(+ bx (bv-size ct ch)) fx (- ct (fmt-cnt ct ch)) ch))
((= fx ln)
;;(if (not (= bx (bytevector-length bytevec))) (error "error"))
(reverse rz))
((char-numeric? (string-ref format fx))
(iter rz bx (1+ fx) (- (* 10 ct) (ctoi-at format fx)) ch))
((zero? ct)
(iter rz bx fx -1 ch))
((char-set-contains? cs:df (string-ref format fx))
(iter rz bx (1+ fx) (- ct) (string-ref format fx)))
(else
(scm-error 'misc-error "unpack" "format error" '() #f))))))))
;; @deffn packed-size format => size
;; In the Python struct module this is called "calcsize".
(define (packed-size format)
(cond
((zero? (string-length format)) 0)
(else
(let* ((char-at (lambda (ix) (string-ref format ix)))
(f0 (if (char-set-contains? cs:md (char-at 0)) 1 0))
(ln (string-length format)))
(let iter ((sz 0) (fx f0) (ct 0)) ; sz: result, fx: inddx; ct: count
(cond
((= fx ln)
sz)
((char-numeric? (string-ref format fx))
(iter sz (1+ fx) (+ (* 10 ct) (ctoi-at format fx))))
((zero? ct)
(iter sz fx 1))
((char-set-contains? cs:df (string-ref format fx))
(iter (+ sz (* ct (bv-size 1 (string-ref format fx)))) (1+ fx) 0))
(else
(scm-error 'misc-error "unpack" "format error" '() #f))))))))
;;; --- last line ---
[-- Attachment #2.3: Type: text/html, Size: 133 bytes --]
[-- Attachment #2.4: struct.test --]
[-- Type: application/octet-stream, Size: 3448 bytes --]
;; struct.test -*- scheme -*-
;;
;; Copyright (C) 2015 Matthew R. Wette
;;
;; Copying and distribution of this file, with or without modification,
;; are permitted in any medium without royalty provided the copyright
;; notice and this notice are preserved. This file is offered as-is,
;; without any warranty.
(use-modules (potluck struct)) ; pack, unpack, packed-size
(use-modules (rnrs bytevectors))
(use-modules (srfi srfi-2)) ; and-let*
(set! *random-state* (random-state-from-platform))
(define (test1)
(if (not (and-let*
(((= (packed-size "33s") 33))
((= (packed-size "4I") 16))
((= (packed-size "3B") 3))
((= (packed-size "i") 4))
)))
(error "packed-size broken")))
(define (test2)
(let* ((bv04 (make-bytevector 4)))
;; i
(bytevector-s32-set! bv04 0 -1234 (native-endianness))
(or (= (car (unpack "i" bv04)) -1234) (error "error"))
;; I
(bytevector-s32-set! bv04 0 1234 (native-endianness))
(or (= (car (unpack "I" bv04)) 1234) (error "error"))
))
(define (test3)
(let* ((data (pack "I" 1234))
(vals (unpack "I" data)))
(if (eqv? (car vals) 1234) #t (error "failed"))
#t))
(define (test4)
(define data (pack ">2Hd" 3 22 34.0)) ; pack two unsigned and a double
(write data)(newline)
(write (unpack ">2Hd" data))(newline))
(define (test5)
;; make a truncated copy of a bytevector
(define (mk-bvec bv0 len)
(let ((bv1 (make-bytevector len)))
(bytevector-copy! bv0 0 bv1 0 len)
bv1))
;; check pack/unpack consistency given
;; format string, binary data, and list of datums
(define (do-test format data vals)
(when #f
(simple-format #t "fm=~S\n" format)
(simple-format #t "xv=~S\n" vals)
(simple-format #t "us=~S\n" (unpack format data)))
;; Test pack:
(if (not (equal? data (apply pack format vals)))
(error "pack not working"))
;; Test unpack:
(let iter ((xvals vals) (svals (unpack format data)))
(cond
((and (null? xvals) (pair? svals))
(error "mismatched count"))
((and (pair? xvals) (null? svals))
(error "mismatched count"))
((null? xvals)
#t)
((not (eqv? (car xvals) (car svals)))
(error "value mismatch"))
(else
(iter (cdr xvals) (cdr svals)))))
#f)
(define (r-ct) (random 5))
(define (r-ty) (random 8))
(let ((bv (make-bytevector 1024))
(nd (native-endianness))
)
(let iter ((fl '()) (bx 0) (xpt '()) (rc (r-ct)) (rt (r-ty)))
;;(simple-format #t "fl=~S bx=~S xpt=~S\n" fl bx xpt)
(case rt
((0)
(do-test (string-join (reverse fl) "") (mk-bvec bv bx) (reverse xpt)))
((1)
(bytevector-s8-set! bv bx -123)
(iter (cons "1b" fl) (+ 1 bx) (cons -123 xpt) (r-ct) (r-ty)))
((2)
(bytevector-s16-set! bv bx -1234 nd)
(iter (cons "1h" fl) (+ 2 bx) (cons -1234 xpt) (r-ct) (r-ty)))
((3)
(bytevector-s32-set! bv bx -9123 nd)
(iter (cons "1i" fl) (+ 4 bx) (cons -9123 xpt) (r-ct) (r-ty)))
((4)
(bytevector-s32-set! bv bx -3991123 nd)
(iter (cons "1l" fl) (+ 4 bx) (cons -3991123 xpt) (r-ct) (r-ty)))
((5)
(bytevector-s64-set! bv bx -3339123 nd)
(iter (cons "1q" fl) (+ 8 bx) (cons -3339123 xpt) (r-ct) (r-ty)))
((6)
(bytevector-ieee-single-set! bv bx 1.32e4 nd)
(iter (cons "1f" fl) (+ 4 bx) (cons 1.32e4 xpt) (r-ct) (r-ty)))
((7)
(bytevector-ieee-double-set! bv bx 1.32e4 nd)
(iter (cons "1d" fl) (+ 8 bx) (cons 1.32e4 xpt) (r-ct) (r-ty)))
))))
(test5)
;; --- last line ---
[-- Attachment #2.5: Type: text/html, Size: 133 bytes --]
[-- Attachment #2.6: struct.texi --]
[-- Type: application/octet-stream, Size: 2614 bytes --]
@c -*-texinfo-*-
@c Copyright (C) 2016 Matthew R. Wette
@c
@c Permission is granted to copy, distribute and/or modify this document
@c under the terms of the GNU Free Documentation License, Version 1.3 or
@c any later version published by the Free Software Foundation; with no
@c Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
@node Struct
@section Struct Module
The @code{(potluck struct)} module provides procedures for packing and
unpacking scheme data to and from bytevectors based on a format string.
@c (@ref{Bytevectors})
@example
(use-modules (potluck struct))
;; pack two unsigned shorts and a double float in big endian order
(define data (pack ">2Hd" 3 22 34.0))
(write data) (newline)
==>
#vu8(0 3 0 22 64 65 0 0 0 0 0 0)
;; verify using unpack
(write (unpack ">2Hd" data)) (newline)
==>
(3 22 34.0)
@end example
@deffn {Scheme Procedure} pack format vals @dots{}
Return a bytevector that contains encoded data from @var{vals}, based on
the string @var{format}.
@end deffn
@deffn {Scheme Procedure} unpack format bvec
Return a list of scheme objects decoded from the bytevector
@var{bvec}, based on the string @var{format}.
@end deffn
@deffn {Scheme Procedure} packed-size format
Return the number of bytes represented by the string @var{format}.
@end deffn
The @emph{format} string used for @var{pack} and @var{unpack} is
constructed as a sequence of digits, representing a repeat count, and codes,
representing the binary content.
@noindent
The string may optionally begin with a special character that
represents the endianness:
@verbatim
= native endianness
< little-endian
> big-endian
! network order -- i.e., big-endian
@end verbatim
@noindent
Type codes used in the format string are interpreted as follows:
@verbatim
x blank byte
c 8-bit character
? boolean
b signed 8-bit integer
B unsigned 8-bit integer
h signed 16-bit integer
H unsigned 16-bit integer
i signed 32-bit integer
I unsigned 32-bit integer
l signed 32-bit integer
L unsigned 32-bit integer
q signed 64-bit integer
Q unsigned 64-bit integer
f 32-bit IEEE floating point
d 64-bit IEEE floating point
s string
@end verbatim
The following issues remain to be addressed:
@table @asis
@item string padding
@code{pack} assumes that the string length in the format is the same
as in the passed string. Non-conformance is not trapped as an error.
@end table
@c --- last line ---
[-- Attachment #2.7: Type: text/html, Size: 151 bytes --]
next prev parent reply other threads:[~2016-02-16 13:28 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-10 10:23 Potluck time! Ludovic Courtès
2016-02-10 17:35 ` Christopher Allan Webber
2016-02-10 20:17 ` Stefan Israelsson Tampe
2016-02-13 14:36 ` Stefan Israelsson Tampe
2016-02-13 20:32 ` Stefan Israelsson Tampe
2016-02-16 13:28 ` Matt Wette [this message]
2016-02-16 13:30 ` [potluck dish] the module (potluck regexc) Matt Wette
2016-02-16 13:45 ` [potluck dish] the (potluck struct) module Matt Wette
2016-02-16 13:53 ` Matt Wette
2016-05-08 9:59 ` Nala Ginrut
2016-05-08 14:34 ` Matt Wette
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2CCB4762-E825-4DA8-AC3A-79D9594C872F@verizon.net \
--to=matthew.wette@verizon.net \
--cc=guile-user@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).