unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* Optimization & factorization of ‘write’
@ 2010-09-14 14:21 Ludovic Courtès
  2010-09-14 18:31 ` Mike Gran
  0 siblings, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2010-09-14 14:21 UTC (permalink / raw)
  To: guile-devel

Hi!

In a profile ‘write’ was showing up unreasonably high, so I decided to
bite the bullet, so to speak.

I ended up factorizing character and string writing, and removing heap
allocations from the path.  The end result is ~2.6x faster when writing
strings.

Feedback welcome, especially from Mike.  :-)

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Optimization & factorization of ‘write’
  2010-09-14 14:21 Optimization & factorization of ‘write’ Ludovic Courtès
@ 2010-09-14 18:31 ` Mike Gran
  2010-09-14 19:12   ` Ludovic Courtès
  0 siblings, 1 reply; 8+ messages in thread
From: Mike Gran @ 2010-09-14 18:31 UTC (permalink / raw)
  To: Ludovic Courtès, guile-devel

> From: Ludovic Courtès <ludo@gnu.org>
> 
> Hi!
> 
> In a profile ‘write’ was showing up unreasonably high, so I decided to
> bite the bullet, so to speak.
> 
> I ended up factorizing character and string writing, and removing heap
> allocations from the path.  The end result is ~2.6x faster when writing
> strings.

Nice.

> 
> Feedback welcome, especially from Mike.  :-)

I haven't tried it out yet, but,...

I noticed you stripped out the code that prints combining accents
as attached to dotted circles (#\◌̀ instead of #\̀). I thought that
was a rather nice feature.

I have a script that I used to check string and char input/output.
I'll try to test it out in the next couple of days.

Thanks,

Mike




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Optimization & factorization of ‘write’
  2010-09-14 18:31 ` Mike Gran
@ 2010-09-14 19:12   ` Ludovic Courtès
  2010-09-14 23:04     ` Ludovic Courtès
  2010-09-15  4:19     ` Mike Gran
  0 siblings, 2 replies; 8+ messages in thread
From: Ludovic Courtès @ 2010-09-14 19:12 UTC (permalink / raw)
  To: guile-devel

Hi Mike!

Mike Gran <spk121@yahoo.com> writes:

> I noticed you stripped out the code that prints combining accents
> as attached to dotted circles (#\◌̀ instead of #\̀). I thought that
> was a rather nice feature.

Oh yes, but I didn’t understand that code and noticed it was not covered
by the test suite [0].

Now that I understand, yeah, it’s probably a nice feature that I could
add back.  :-)

> I have a script that I used to check string and char input/output.

You mean in addition to the ‘.test’ files?

Thanks,
Ludo’.

[0] http://hydra.nixos.org/build/631443/download/2/coverage/libguile/print.c.gcov.html




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Optimization & factorization of ‘write’
  2010-09-14 19:12   ` Ludovic Courtès
@ 2010-09-14 23:04     ` Ludovic Courtès
  2010-09-15  4:19     ` Mike Gran
  1 sibling, 0 replies; 8+ messages in thread
From: Ludovic Courtès @ 2010-09-14 23:04 UTC (permalink / raw)
  To: guile-devel

ludo@gnu.org (Ludovic Courtès) writes:

> Mike Gran <spk121@yahoo.com> writes:
>
>> I noticed you stripped out the code that prints combining accents
>> as attached to dotted circles (#\◌̀ instead of #\̀). I thought that
>> was a rather nice feature.
>
> Oh yes, but I didn’t understand that code and noticed it was not covered
> by the test suite [0].
>
> Now that I understand, yeah, it’s probably a nice feature that I could
> add back.  :-)

Since I was in a Unicode mood, I just added it back.

Thanks!

Ludo’.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Optimization & factorization of ‘write’
  2010-09-14 19:12   ` Ludovic Courtès
  2010-09-14 23:04     ` Ludovic Courtès
@ 2010-09-15  4:19     ` Mike Gran
  2010-09-15 11:40       ` Ludovic Courtès
  1 sibling, 1 reply; 8+ messages in thread
From: Mike Gran @ 2010-09-15  4:19 UTC (permalink / raw)
  To: Ludovic Courtès, guile-devel

[-- Attachment #1: Type: text/plain, Size: 520 bytes --]

> From: Ludovic Courtès <ludo@gnu.org>

> 
> > I have a script that I used to check string and char  input/output.
> 
> You mean in addition to the ‘.test’  files?

I have the attached.  It is not really a test because it doesn't
have criteria for right and wrong, but, you can diff the changes
of the output files after committing to see if output formats
have stayed the same.

But I tried it just now on HEAD, and it triggered a 'glibc detected
corrupteddouble-linked list' error.  Curious.

-Mike

[-- Attachment #2: iotest.scm --]
[-- Type: application/octet-stream, Size: 12605 bytes --]

;;; iotest.scm -- a script to generate many types of output
;;;               for regression testing
;;;
;;; Copyright (C) Michael L. Gran
;;; License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
;;; This is free software: you are free to change and redistribute it.
;;; There is NO WARRANTY, to the extent permitted by law.

(define *n* 1)
(define *port* #f)
(define *label* #f)

;; Output a list that contains the integers from START to END inclusive
(define (range start end)
  (let loop ((i start)
             (lst '()))
    (if (<= i end)
        (loop (+ i 1) (append lst (list i)))
        lst)))

;; Given two lists of characters CHARS1 and CHARS2, return a list of
;; two-character strings that has all combinations of CHARS1 and
;; CHARS2
(define (string-product chars1 chars2)
  (let ((len1 (length chars1))
        (len2 (length chars2)))
    (map (lambda (c)
           (string (list-ref chars1 (quotient c len2))
                   (list-ref chars2 (remainder c len2))))
         (range 0 (- (* len1 len2) 1)))))

;; Remove all elements of LIST2 from LIST1
(define (filter-list list1 list2)
  (filter (lambda (c)
            (not (member c list2)))
          list1))

(define (integer-list->char-list vals)
  (map integer->char vals))

(define (char-list->string-list chr)
  (map string chr))

(define (string-list->symbol-list str)
  (map string->symbol str))


;; Write out the elements of the VALUES list using WRITE-FUNC, one
;; value per line.  Prepend each line with a line number and a
;; CATEGORY string.
(define (output-test category type values write-func)
  (if (not (null? values))
      (let test-loop ((val (car values))
                      (rest (cdr values)))
        (let ((n-string (number->string *n*)))
          (display (make-string (- 6 (string-length n-string)) #\space) *port*)
          (display *n* *port*)
          (set! *n* (+ *n* 1))
          (display ": " *port*)
          (if *label*
              (begin
                (display *label* *port*)
                (display " : " *port*)))
          (if category
              (begin
                (display category *port*)
                (display " : " *port*)))
          (if type
              (begin
                (display type *port*)
                (display " : " *port*)))
          (write-func val *port*)
          (newline *port*)
          (if (not (null? rest))
              (test-loop (car rest) (cdr rest)))))))

;; Run OUTPUT-TEST using the WRITE procedure
(define (write-test category list)
  (output-test category "write" list write))

;; Run OUTPUT-TEST using the DISPLAY procedure
(define (display-test category list)
  (output-test category "display" list display))

;; First write and then display the elements in LIST
(define (test category list)
  (write-test category list)
  (display-test category list))

;;-----------------------------------------------------
;; R5RS numeric types
(define (test-real-numbers)
  (test "integers" '(-3 -2 -1 -0 0 1 2 3))

  (test "rationals" '(-3/2 -2/2 -1/2 -0/2 0/2 1/2 2/2 3/3))

  (test "reals" '(-1.5 -1.0 -0.5 -0.0 0.0 0.5 1.0 1.5)))

(define (test-complex-numbers)
  (test "complex integers" 
        '( -2-2i -2-1i -2-i -2-0i -2+0i -2+i -2+1i -2+2i
           -1-2i -1-1i -1-i -1-0i -1+0i -1+i -1+1i -1+2i
           -0-2i -0-1i -0-i -0-0i -0+0i -0+i -0+1i -0+2i
             -2i   -1i   -i   -0i    0i    i    1i    2i
            0-2i  0-1i  0-i  0-0i  0+0i  0+i  0+1i  0+2i
            1-2i  1-1i  1-i  1-0i  1+0i  1+i  1+1i  1+2i
            2-2i  2-1i  2-i  2-0i  2+0i  2+i  2+1i  2+2i))

  (test "complex rationals"
        '( -2-2/2i -2-1/2i -2-0/2i -2+0/2i -2+1/2i -2+2/2i
           -1-2/2i -1-1/2i -1-0/2i -1+0/2i -1+1/2i -1+2/2i
             -2/2i   -1/2i   -0/2i    0/2i    1/2i    2/2i
            0-2/2i  0-1/2i  0-0/2i  0+0/2i  0+1/2i  0+2/2i
            1-2/2i  1-1/2i  1-0/2i  1+0/2i  1+1/2i  1+2/2i
            2-2/2i  2-1/2i  2-0/2i  2+0/2i  2+1/2i  2+2/2i))

  (test "complex-reals"
        '(
          -1.5-1.5i -1.5-1.0i -1.5-0.5i -1.5-0.0i -1.5+0.0i -1.5+0.5i -1.5+1.0i -1.5+1.5i
          -1.0-1.5i -1.0-1.0i -1.0-0.5i -1.0-0.0i -1.0+0.0i -1.0+0.5i -1.0+1.0i -1.0+1.5i
          -0.5-1.5i -0.5-1.0i -0.5-0.5i -0.5-0.0i -0.5+0.0i -0.5+0.5i -0.5+1.0i -0.5+1.5i
          -0.0-1.5i -0.0-1.0i -0.0-0.5i -0.0-0.0i -0.0+0.0i -0.0+0.5i -0.0+1.0i -0.0+1.5i
           0.0-1.5i  0.0-1.0i  0.0-0.5i  0.0-0.0i  0.0+0.0i  0.0+0.5i  0.0+1.0i  0.0+1.5i
           0.5-1.5i  0.5-1.0i  0.5-0.5i  0.5-0.0i  0.5+0.0i  0.5+0.5i  0.5+1.0i  0.5+1.5i
           1.0-1.5i  1.0-1.0i  1.0-0.5i  1.0-0.0i  1.0+0.0i  1.0+0.5i  1.0+1.0i  1.0+1.5i
           1.5-1.5i  1.5-1.0i  1.5-0.5i  1.5-0.0i  1.5+0.0i  1.5+0.5i  1.5+1.0i  1.5+1.5i)))

;;-----------------------------------------------------
;; R5RS boolean
(define (test-booleans)
  (test "booleans" (list #t #f '#t '#f)))

;;-----------------------------------------------------
;; R5RS pairs and lists

(define (test-pairs-and-lists)
  (test "pairs and lists" (list (list 'a 'b 'c)
                                '(a . b)
                                '(a b c)
                                '()
                                '#(a b)
                                (cons 'a '())
                                (cons '(a) '(b c d))
                                (cons "a" '(b c))
                                (cons 'a 3)
                                (cons '(a b) 'c)
                                (list 'a (+ 3 4) 'c)
                                (list)
                                '(a (b) (c d e))
                                (append '(a (b)) '((c))))))


;;-----------------------------------------------------
;; R5RS Symbols

;; VALID
(define (test-r5rs-symbols)
  (let* ((initial-chars
          '(#\a #\z #\! #\$ #\% #\& #\* #\/ #\: #\< #\= #\> #\? #\^ #\_ #\~))
         (subsequent-chars
          (append initial-chars'(#\a #\z #\0 #\9 #\+ #\- #\. #\@)))
         (r5rs-one-char-symbols
          (string-list->symbol-list (char-list->string-list initial-chars)))
         (r5rs-two-char-symbols
          (string-list->symbol-list (string-product initial-chars
                                                    subsequent-chars))))
    (test "R5RS one-char symbols" r5rs-one-char-symbols)
    (test "R5RS two-char symbols" r5rs-two-char-symbols)))

;;-----------------------------------------------------
;; 7-bit chars, symbols, and strings

(define ascii-list
  (append
   (range #x01 #x1F)                    ; C0 controls
   (range #x20 #x2F)                    ; ASCII punct
   (list #x30 #x39)                     ; Numbers '0' and '9'
   (range #x3A #x40)                    ; More ASCII punct
   (list #x41 #x5A)                     ; letter 'A' and 'Z'
   (range #x5B #x60)                    ; More ASCII punct
   (list #x61 #x7A)                     ; letter 'a' and 'z'
   (range #x7b #x73)                    ; More ASCII punct
   (list #x7f)))                        ; delete

(define ascii-chars 
  (integer-list->char-list ascii-list))

(define ascii-one-char-strings
  (char-list->string-list ascii-chars))

(define ascii-two-char-strings
  (string-product ascii-chars ascii-chars))

;; VALID: should suceed on any platform
(define (test-ascii-chars-strings)
  (test "ASCII chars" ascii-chars)
  (test "ASCII one-char strings" ascii-one-char-strings)
  (test "ASCII two-char strings" ascii-two-char-strings))

;; INVALID: many of these symbols aren't r5rs
(define (test-ascii-symbols)
  (let ((ascii-one-char-symbols
         (string-list->symbol-list ascii-one-char-strings))
        (ascii-two-char-symbols
         (string-list->symbol-list ascii-two-char-strings)))
    (test "ASCII one-char symbols" ascii-one-char-symbols)
    (test "ASCII two-char symbols" ascii-two-char-symbols)))

;;-----------------------------------------------------
;; Latin-1 chars, symbols, and strings

(define latin1-list 
  (append
   (range #x80 #x9F)                    ; C1 controls
   (list #xA0)                          ; NBSP - non-breaking space
   (range #xA1 #xA3)                    ; Some Latin-1 punct
   (list #xAD)                          ; SHY - soft hyphen
   (range #xC0 #xC4)))                  ; Some Latin-1 letters
  
(define latin1-chars 
  (integer-list->char-list latin1-list))

(define latin1-one-char-strings
  (char-list->string-list latin1-chars))

(define latin1-two-char-strings
  (string-product latin1-chars latin1-chars))

;; VALID: should suceed on any platform
(define (test-latin1-chars-strings)
  (test "Latin-1 chars" latin1-chars)
  (test "Latin-1 one-char strings" latin1-one-char-strings)
  (test "Latin-1 two-char strings" latin1-two-char-strings))

;; INVALID: these symbols aren't r5rs
(define (test-latin1-symbols)
  (let ((latin1-one-char-symbols
         (string-list->symbol-list latin1-one-char-strings))
        (latin1-two-char-symbols
         (string-list->symbol-list latin1-two-char-strings)))
    (test "Latin-1 one-char symbols" latin1-one-char-symbols)
    (test "Latin-1 two-char symbols" latin1-two-char-symbols)))

;;-----------------------------------------------------
;; Unicode chars, symbols, and strings
(define unicode-list
  (list
   #x0061                      ; 'a'
   #x0531                      ; ARMENIAN CAPITAL LETTER AYB
   #x2c80                      ; COPTIC CAPITAL LETTER ALFA
   #x0400                      ; CYRILLIC CAPITAL LETTER IE WITH GRAVE
   #x0410                      ; CYRILLIC CAPITAL LETER A
   #x03A1                      ; GREEK CAPITAL LETTER RHO
   #x0300                      ; COMBINING GRAVE ACCENT
   #x031F                      ; COMBINING PLUS SIGN BELOW
   #x0627                      ; ARABIC LETTER ALEF
   ))

(define (test-BMP-chars-strings)
  (let* ((unicode-chars
          (integer-list->char-list unicode-list))
         (unicode-one-char-strings
          (char-list->string-list unicode-chars))
         (unicode-two-char-strings
          (string-product unicode-chars unicode-chars)))
    (test "BMP chars" unicode-chars)
    (test "BMP one-char strings" unicode-one-char-strings)
    (test "BMP two-char strings" unicode-two-char-strings)))

(define (test-BMP-symbols)
  (let* ((unicode-chars
          (integer-list->char-list unicode-list))
         (unicode-one-char-symbols
          (string-list->symbol-list (char-list->string-list unicode-chars)))
         (unicode-two-char-symbols
          (string-list->symbol-list (string-product unicode-chars unicode-chars))))
    (test "BMP one-char symbols" unicode-one-char-symbols)
    (test "BMP two-char symbols" unicode-two-char-symbols)))


(define (all-tests strict-r5rs-symbols latin1 unicode)
  (test-real-numbers)
  (test-complex-numbers)
  (test-booleans)
  (test-pairs-and-lists)
  (test-ascii-chars-strings)
  (if latin1
      (test-latin1-chars-strings))
  (if unicode
      (test-BMP-chars-strings))
  (if strict-r5rs-symbols
      (test-r5rs-symbols)
      ;; else
      (begin
        (test-ascii-symbols)
        (if latin1
            (test-latin1-symbols))
        (if unicode
            (test-BMP-symbols)))))

;;-------------------------------------------------------------------

;; ASCII

(setlocale LC_ALL "C")

(set! *n* 1)
(set! *port* (open-output-file "iotest.ascii.txt"))
(set-port-encoding! *port* "US-ASCII")
(set-port-conversion-strategy! *port* 'escape)

(display "-*- coding: us-ascii -*-" *port*)
(newline *port*)
(display (version) *port*)
(newline *port*)
(all-tests #f #t #t)
(close *port*)

;; ISO-8859-1

(setlocale LC_ALL "en_US.iso88591")

(set! *n* 1)
(set! *port* (open-output-file "iotest.latin1.txt"))
(set-port-encoding! *port* "iso-8859-1")
(set-port-conversion-strategy! *port* 'escape)

(display "-*- coding: iso-8859-1 -*-" *port*)
(newline *port*)
(display (version) *port*)
(newline *port*)
(all-tests #f #t #t)
(close *port*)

;; ISO-8859-1 R6RS

(setlocale LC_ALL "en_US.iso88591")

(set! *n* 1)
(set! *port* (open-output-file "iotest.latin1-r6rs.txt"))
(set-port-encoding! *port* "iso-8859-1")
(set-port-conversion-strategy! *port* 'escape)
(read-enable 'r6rs-hex-escapes)

(display "-*- coding: iso-8859-1 -*-" *port*)
(newline *port*)
(display (version) *port*)
(newline *port*)
(all-tests #f #t #t)
(close *port*)

(read-disable 'r6rs-hex-escapes)

;; UTF-8

(setlocale LC_ALL "en_US.utf8")

(set! *n* 1)
(set! *port* (open-output-file "iotest.utf8.txt"))
(set-port-encoding! *port* "utf-8")
(set-port-conversion-strategy! *port* 'escape)

(display "-*- coding: utf-8 -*-" *port*)
(newline *port*)
(display (version) *port*)
(newline *port*)
(all-tests #f #t #t)
(close *port*)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Optimization & factorization of ‘write’
  2010-09-15  4:19     ` Mike Gran
@ 2010-09-15 11:40       ` Ludovic Courtès
  2010-09-15 14:28         ` Mike Gran
  0 siblings, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2010-09-15 11:40 UTC (permalink / raw)
  To: guile-devel

Hi,

Mike Gran <spk121@yahoo.com> writes:

>> From: Ludovic Courtès <ludo@gnu.org>
>
>> 
>> > I have a script that I used to check string and char  input/output.
>> 
>> You mean in addition to the ‘.test’  files?
>
> I have the attached.  It is not really a test because it doesn't
> have criteria for right and wrong, but, you can diff the changes
> of the output files after committing to see if output formats
> have stayed the same.

OK, interesting.  Any idea how well it covers the display/write code
compared to the test suite?

> But I tried it just now on HEAD, and it triggered a 'glibc detected
> corrupteddouble-linked list' error.  Curious.

Hmm indeed.  Can you come up with a reduced test case and perhaps a
backtrace or something?

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Optimization & factorization of ‘write’
  2010-09-15 11:40       ` Ludovic Courtès
@ 2010-09-15 14:28         ` Mike Gran
  2010-09-15 21:35           ` Ludovic Courtès
  0 siblings, 1 reply; 8+ messages in thread
From: Mike Gran @ 2010-09-15 14:28 UTC (permalink / raw)
  To: Ludovic Courtès, guile-devel

> From: Ludovic Courtès <ludo@gnu.org>


> Hi,
> 
> Mike Gran <spk121@yahoo.com> writes:
> 
> > I have the attached.  It is not really a test  because it doesn't
> > have criteria for right and wrong, but, you can diff  the changes
> > of the output files after committing to see if output  formats
> > have stayed the same.
> 
> OK, interesting.  Any idea how  well it covers the display/write code
> compared to the test suite?

No.  That's a good question.  I had done this type of script testing
when I was hacking the code, but, now, if there are any corner cases
that this catches, they should be rolled into the test suite.

> 
> >  But I tried it just now on HEAD, and it triggered a 'glibc detected
> >  corrupteddouble-linked list' error.  Curious.
> 
> Hmm indeed.  Can  you come up with a reduced test case and perhaps a
> backtrace or  something?
> 

in  scm_i_unistring_escapes_to_r6rs_escapes(), you can have a write
off the end of a string when the buffer passed into the function contains
only 4-digit hex unistring hex escapes, such as "\u1100".  The
R6RS-escaped string will be longer "\x1100;" causing the memcpy at the
end of the function to write off then end of the string.

Thanks,

Mike



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Optimization & factorization of ‘write’
  2010-09-15 14:28         ` Mike Gran
@ 2010-09-15 21:35           ` Ludovic Courtès
  0 siblings, 0 replies; 8+ messages in thread
From: Ludovic Courtès @ 2010-09-15 21:35 UTC (permalink / raw)
  To: guile-devel

Hi,

Mike Gran <spk121@yahoo.com> writes:

>> >  But I tried it just now on HEAD, and it triggered a 'glibc detected
>> >  corrupteddouble-linked list' error.  Curious.
>> 
>> Hmm indeed.  Can  you come up with a reduced test case and perhaps a
>> backtrace or  something?
>> 
>
> in  scm_i_unistring_escapes_to_r6rs_escapes(), you can have a write
> off the end of a string when the buffer passed into the function contains
> only 4-digit hex unistring hex escapes, such as "\u1100".  The
> R6RS-escaped string will be longer "\x1100;" causing the memcpy at the
> end of the function to write off then end of the string.

Indeed, good catch!

I think commit f1ee6d54d219056c62d87a8e4a6b199162c946e8 hackily fixes
it.

The whole thing is hackish though: the conversion is inelegant, and it
assumes that BUF is in an ASCII-compatible encoding.  I think the right
way would be to have libunistring allow us to specify what we want
escapes to look like.

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-09-15 21:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-14 14:21 Optimization & factorization of ‘write’ Ludovic Courtès
2010-09-14 18:31 ` Mike Gran
2010-09-14 19:12   ` Ludovic Courtès
2010-09-14 23:04     ` Ludovic Courtès
2010-09-15  4:19     ` Mike Gran
2010-09-15 11:40       ` Ludovic Courtès
2010-09-15 14:28         ` Mike Gran
2010-09-15 21:35           ` Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).