From: Thien-Thi Nguyen <ttn@gnuvola.org>
To: guile-user@gnu.org
Subject: Re: survey: string external representation
Date: Fri, 27 Jan 2012 11:27:30 +0100 [thread overview]
Message-ID: <87k44dbfu5.fsf@gnuvola.org> (raw)
In-Reply-To: <87wr8edhac.fsf@gnuvola.org> (Thien-Thi Nguyen's message of "Thu, 26 Jan 2012 09:00:59 +0100")
[-- Attachment #1: Type: text/plain, Size: 627 bytes --]
Thanks to everyone who responded. Based on the collected
information, i've cobbled together a runtime check for
‘sql-quote’. It and some tests are in the attached program.
To play:
guile -s normalize.scm
guile -s normalize.scm stupid
The code assumes Guile 2 DTRT, but if you have doubts, you can
sed -i 's/guile-2/&-not-really/' normalize.scm
to disable that assumption. In any case, the program should exit
successfully, indicating smooth ‘write’ / ‘read’ round-tripping.
This is so (both w/ and w/o "stupid") for Guile 1.4.1.124 and 1.8.7.
___________________________________________
[-- Attachment #2: normalize.scm --]
[-- Type: text/x-scheme, Size: 4055 bytes --]
;; -*- mode: scheme; coding: utf-8 -*-
(define EXIT-VALUE #t) ; optimism
(define STUPID? (false-if-exception (string=? "stupid" (cadr (command-line)))))
;; PostgreSQL groks ‘\xXX’ as an octet w/ hex value XX.
;; It also groks raw octets. This is all fine and good.
;; The problem arises when there is a mix of contiguous
;; raw and \x representations, intended to represent a
;; UTF-8 (say) encoded character.
;;
;; It seems Guile
;; - 1.4 DTRT by doing nothing;
;; - 1.6 ???;
;; - 1.8 fails by \x-escaping inconsistently;
;; - 2.0 doesn't have this problem.
(cond-expand
(guile-2
(define normalize identity))
(else
(use-modules
(srfi srfi-13)
(srfi srfi-14))
(define normalize
(or (let* ((ego (char-set
;; These are not strictly necessary for
;; PostgreSQL, but we include them for
;; (Scheme-only) round-trip testing.
;; Doubtlessly, what doubtful ego!
#\" #\\))
(ugh (ucs-range->char-set #o177 #o400 #t ego)))
(and (not (char-set-every
(lambda (ch)
;; Does the octet xrep unmolested?
(char=? ch (string-ref (object->string (string ch)) 1)))
(char-set-difference ugh ego)))
(or (not STUPID?)
(begin (set! ugh ego)
#t))
;; Lame.
(lambda (s)
(define backslash-x
(let ((v (make-vector 256)))
(char-set-for-each
(lambda (ch)
(let ((i (char->integer ch)))
(vector-set!
v i (string-append
"\\x" (number->string i 16)))))
ugh)
;; backslash-x
(lambda (ch)
(vector-ref v (char->integer ch)))))
(let loop ((start 0) (acc '()))
(cond ((string-index s ugh start)
=> (lambda (idx)
(loop (1+ idx)
(cons* (backslash-x (string-ref s idx))
(substring/shared s start idx)
acc))))
((zero? start)
s)
(else
(string-concatenate-reverse
acc (substring/shared s start))))))))
;; Cool.
identity))))
(define (try s)
(simple-format
#t "ORIG:\t~S~%NORM:\t~S~%=>\t~A~%~%"
s (normalize s)
(let ((round (with-input-from-string
(with-output-to-string
(lambda ()
(if (eq? identity normalize)
(write s)
(begin (display #\")
(display (normalize s))
(display #\")))))
read)))
(cond ((equal? s round) 'SAME)
(else
(set! EXIT-VALUE #f) ;-O
(string-append
"DIFF: [" (number->string (string-length round))
"]|" round "|"))))))
(simple-format #t "Guile ~A~% LANG: ~S~% normalize: ~S~A~%~%"
(version) (getenv "LANG") (procedure-name normalize)
(if (and STUPID? (not (eq? normalize identity)))
" (but we stupidly revert to degeneracy)"
""))
(try "")
(try (list->string (map integer->char (iota 256))))
(try "U+2002: | | (utf-8: E2 80 82)")
(try "U+232C: |⌬| (utf-8: E2 80 82)")
(try "U+1D7FF: |𝟿| (utf-8: F0 9D 9F BF)")
(try "U+2F9B2: |䕫| (utf-8: F0 AF A6 B2)")
(try "U+2F9BC: |蜨| (utf-8: F0 AF A6 BC)")
(exit EXIT-VALUE)
next prev parent reply other threads:[~2012-01-27 10:27 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-26 8:00 survey: string external representation Thien-Thi Nguyen
2012-01-26 8:38 ` Andy Wingo
2012-01-26 14:11 ` Mike Gran
2012-01-27 10:27 ` Thien-Thi Nguyen [this message]
2012-02-05 9:32 ` Thien-Thi Nguyen
2012-02-07 8:58 ` Andy Wingo
2012-02-07 9:52 ` David Pirotte
2012-01-27 15:32 ` David Pirotte
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87k44dbfu5.fsf@gnuvola.org \
--to=ttn@gnuvola.org \
--cc=guile-user@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).