unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
From: Thien-Thi Nguyen <ttn@gnuvola.org>
To: guile-user@gnu.org
Subject: Re: survey: string external representation
Date: Fri, 27 Jan 2012 11:27:30 +0100	[thread overview]
Message-ID: <87k44dbfu5.fsf@gnuvola.org> (raw)
In-Reply-To: <87wr8edhac.fsf@gnuvola.org> (Thien-Thi Nguyen's message of "Thu,  26 Jan 2012 09:00:59 +0100")

[-- Attachment #1: Type: text/plain, Size: 627 bytes --]

Thanks to everyone who responded.  Based on the collected
information, i've cobbled together a runtime check for
‘sql-quote’.  It and some tests are in the attached program.
To play:

 guile -s normalize.scm
 guile -s normalize.scm stupid

The code assumes Guile 2 DTRT, but if you have doubts, you can

 sed -i 's/guile-2/&-not-really/' normalize.scm

to disable that assumption.  In any case, the program should exit
successfully, indicating smooth ‘write’ / ‘read’ round-tripping.
This is so (both w/ and w/o "stupid") for Guile 1.4.1.124 and 1.8.7.

___________________________________________

[-- Attachment #2: normalize.scm --]
[-- Type: text/x-scheme, Size: 4055 bytes --]

;; -*- mode: scheme; coding: utf-8 -*-

(define EXIT-VALUE #t)                  ; optimism

(define STUPID? (false-if-exception (string=? "stupid" (cadr (command-line)))))

;; PostgreSQL groks ‘\xXX’ as an octet w/ hex value XX.
;; It also groks raw octets.  This is all fine and good.
;; The problem arises when there is a mix of contiguous
;; raw and \x representations, intended to represent a
;; UTF-8 (say) encoded character.
;;
;; It seems Guile
;; - 1.4 DTRT by doing nothing;
;; - 1.6 ???;
;; - 1.8 fails by \x-escaping inconsistently;
;; - 2.0 doesn't have this problem.

(cond-expand
 (guile-2
  (define normalize identity))
 (else
  (use-modules
   (srfi srfi-13)
   (srfi srfi-14))
  (define normalize
    (or (let* ((ego (char-set
                     ;; These are not strictly necessary for
                     ;; PostgreSQL, but we include them for
                     ;; (Scheme-only) round-trip testing.
                     ;; Doubtlessly, what doubtful ego!
                     #\" #\\))
               (ugh (ucs-range->char-set #o177 #o400 #t ego)))
          (and (not (char-set-every
                     (lambda (ch)
                       ;; Does the octet xrep unmolested?
                       (char=? ch (string-ref (object->string (string ch)) 1)))
                     (char-set-difference ugh ego)))
               (or (not STUPID?)
                   (begin (set! ugh ego)
                          #t))
               ;; Lame.
               (lambda (s)
                 (define backslash-x
                   (let ((v (make-vector 256)))
                     (char-set-for-each
                      (lambda (ch)
                        (let ((i (char->integer ch)))
                          (vector-set!
                           v i (string-append
                                "\\x" (number->string i 16)))))
                      ugh)
                     ;; backslash-x
                     (lambda (ch)
                       (vector-ref v (char->integer ch)))))
                 (let loop ((start 0) (acc '()))
                   (cond ((string-index s ugh start)
                          => (lambda (idx)
                               (loop (1+ idx)
                                     (cons* (backslash-x (string-ref s idx))
                                            (substring/shared s start idx)
                                            acc))))
                         ((zero? start)
                          s)
                         (else
                          (string-concatenate-reverse
                           acc (substring/shared s start))))))))
        ;; Cool.
        identity))))

(define (try s)
  (simple-format
   #t "ORIG:\t~S~%NORM:\t~S~%=>\t~A~%~%"
   s (normalize s)
   (let ((round (with-input-from-string
                    (with-output-to-string
                      (lambda ()
                        (if (eq? identity normalize)
                            (write s)
                            (begin (display #\")
                                   (display (normalize s))
                                   (display #\")))))
                  read)))
     (cond ((equal? s round) 'SAME)
           (else
            (set! EXIT-VALUE #f)        ;-O
            (string-append
             "DIFF: [" (number->string (string-length round))
             "]|" round "|"))))))

(simple-format #t "Guile ~A~% LANG: ~S~% normalize: ~S~A~%~%"
               (version) (getenv "LANG") (procedure-name normalize)
               (if (and STUPID? (not (eq? normalize identity)))
                   " (but we stupidly revert to degeneracy)"
                   ""))

(try "")
(try (list->string (map integer->char (iota 256))))
(try "U+2002: | | (utf-8: E2 80 82)")
(try "U+232C: |⌬| (utf-8: E2 80 82)")
(try "U+1D7FF: |𝟿| (utf-8: F0 9D 9F BF)")
(try "U+2F9B2: |䕫| (utf-8: F0 AF A6 B2)")
(try "U+2F9BC: |蜨| (utf-8: F0 AF A6 BC)")

(exit EXIT-VALUE)

  parent reply	other threads:[~2012-01-27 10:27 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-26  8:00 survey: string external representation Thien-Thi Nguyen
2012-01-26  8:38 ` Andy Wingo
2012-01-26 14:11 ` Mike Gran
2012-01-27 10:27 ` Thien-Thi Nguyen [this message]
2012-02-05  9:32   ` Thien-Thi Nguyen
2012-02-07  8:58     ` Andy Wingo
2012-02-07  9:52     ` David Pirotte
2012-01-27 15:32 ` David Pirotte

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k44dbfu5.fsf@gnuvola.org \
    --to=ttn@gnuvola.org \
    --cc=guile-user@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).