unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* efficient implementation of string-replace-substring / string-replace-all
@ 2013-09-13 14:32 Arne Babenhauserheide
  2013-09-13 19:41 ` Mark H Weaver
  0 siblings, 1 reply; 5+ messages in thread
From: Arne Babenhauserheide @ 2013-09-13 14:32 UTC (permalink / raw)
  To: guile-devel

Hi,

For wisp I created an efficient implementation of substring replacement and thought it might be useful for guile in general.

I optimized it a bit to get rid of (likely) quadratic behaviour:


; ,time (string-replace-substring (xsubstring "abcdefghijkl" 0 99999) "def" "abc")
; 1.140127s real time, 1.139714s run time.  0.958733s spent in GC.
; 0.885618s real time, 0.885350s run time.  0.742805s spent in GC.
; second number after multiple runs
(define (string-replace-substring s substring replacement)
       "Replace every instance of substring in s by replacement."
       (let ((sublen (string-length substring)))
           (let replacer
               ((newstring s)
                 (index (string-contains s substring)))
               (if (not (equal? index #f))
                  (let ((replaced (string-replace newstring replacement index (+ index sublen))))
                    (replacer replaced (string-contains replaced substring index)))
                  newstring))))


This fast and elegant approach is thanks to ijp.

I tested a two alternative approaches. The following uses explicit readonly substrings and is fast, the one afterwards always searches from the beginning and shows drastic slowdown on long strings. 

; efficient version from rev 12266d455bb0 of the wisp hg repo
; ,time (string-replace-substring-fast-but-long (xsubstring "abcdefghijkl" 0 99999) "def" "abc")
; 1.316090s real time, 1.315626s run time.  1.129857s spent in GC.
; 0.668212s real time, 0.667948s run time.  0.482193s spent in GC.
; 0.868419s real time, 0.868131s run time.  0.685472s spent in GC
; Second and third are after multiple runs
(define (string-replace-substring-fast-but-long s substring replacement)
       "Replace every instance of substring in s by replacement."
       (let ((sublen (string-length substring)))
           (let replacer
               ((newstring s)
                 (startindex 0)
                 (addindex (string-contains s substring)))
               (if (not (equal? addindex #f))
                  (let*
                      ((index (+ startindex addindex))
                        (replaced (string-replace newstring replacement index (+ index sublen)))
                        (newaddindex (string-contains (substring/read-only replaced index) substring)))
                      (replacer replaced index newaddindex))
                  newstring))))


; less efficient version from rev a887aeb0dfe2
; ,time (string-replace-substring-slow (xsubstring "abcdefghijkl" 0 99999) "def" "abc")
; 5.242456s real time, 5.240834s run time.  0.358750s spent in GC.
; 5.727069s real time, 5.724973s run time.  0.835445s spent in GC.
; switches between fast and slow version
(define (string-replace-substring-slow s substring replacement)
       "Replace every instance of substring in s by replacement."
       (let ((sublen (string-length substring)))
           (let replacer
               ((newstring s)
                 (index (string-contains s substring)))
               (if (not (equal? index #f))
                  (let ((replaced (string-replace newstring replacement index (+ index sublen))))
                    (replacer replaced (string-contains replaced substring)))
                  newstring))))
               


ijp suggested renaming to (string-replace-all, but I do not know enough about conventions in guile to judge that).

Best wishes,
Arne



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: efficient implementation of string-replace-substring / string-replace-all
  2013-09-13 14:32 efficient implementation of string-replace-substring / string-replace-all Arne Babenhauserheide
@ 2013-09-13 19:41 ` Mark H Weaver
  2014-03-23 20:48   ` Andy Wingo
  0 siblings, 1 reply; 5+ messages in thread
From: Mark H Weaver @ 2013-09-13 19:41 UTC (permalink / raw)
  To: arne_bab; +Cc: guile-devel

Hi Arne,

Arne Babenhauserheide <arne_bab@web.de> writes:

> For wisp I created an efficient implementation of substring replacement and thought it might be useful for guile in general.
>
> I optimized it a bit to get rid of (likely) quadratic behaviour:
>
>
> ; ,time (string-replace-substring (xsubstring "abcdefghijkl" 0 99999) "def" "abc")
> ; 1.140127s real time, 1.139714s run time.  0.958733s spent in GC.
> ; 0.885618s real time, 0.885350s run time.  0.742805s spent in GC.
> ; second number after multiple runs

Here, you're including (xsubstring "abcdefghijkl" 0 99999) in the
benchmark.  Better to (define big (xsubstring "abcdefghijkl" 0 99999))
first, and then: ,time (string-replace-substring big "def" "abc")

> (define (string-replace-substring s substring replacement)
>        "Replace every instance of substring in s by replacement."
>        (let ((sublen (string-length substring)))
>            (let replacer
>                ((newstring s)
>                  (index (string-contains s substring)))
>                (if (not (equal? index #f))
>                   (let ((replaced (string-replace newstring replacement index (+ index sublen))))
>                     (replacer replaced (string-contains replaced substring index)))
>                   newstring))))

Here's an implementation that does this benchmark about 80 times faster
on my machine: (20 milliseconds vs 1.69 seconds)

--8<---------------cut here---------------start------------->8---
(define* (string-replace-substring s substr replacement
                                   #:optional
                                   (start 0)
                                   (end (string-length s)))
  (let ((substr-length (string-length substr)))
    (if (zero? substr-length)
        (error "string-replace-substring: empty substr")
        (let loop ((start start)
                   (pieces (list (substring s 0 start))))
          (let ((idx (string-contains s substr start end)))
            (if idx
                (loop (+ idx substr-length)
                      (cons* replacement
                             (substring s start idx)
                             pieces))
                (string-concatenate-reverse (cons (substring s start)
                                                  pieces))))))))
--8<---------------cut here---------------end--------------->8---

The reason this is so much faster is because it avoids needless
generation of intermediate strings.

     Regards,
       Mark



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: efficient implementation of string-replace-substring / string-replace-all
  2013-09-13 19:41 ` Mark H Weaver
@ 2014-03-23 20:48   ` Andy Wingo
  2014-03-24  5:19     ` Mark H Weaver
  0 siblings, 1 reply; 5+ messages in thread
From: Andy Wingo @ 2014-03-23 20:48 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: arne_bab, guile-devel

On Fri 13 Sep 2013 21:41, Mark H Weaver <mhw@netris.org> writes:

> Here's an implementation that does this benchmark about 80 times faster
> on my machine: (20 milliseconds vs 1.69 seconds)
>
> (define* (string-replace-substring s substr replacement
>                                    #:optional
>                                    (start 0)
>                                    (end (string-length s)))
>   (let ((substr-length (string-length substr)))
>     (if (zero? substr-length)
>         (error "string-replace-substring: empty substr")
>         (let loop ((start start)
>                    (pieces (list (substring s 0 start))))
>           (let ((idx (string-contains s substr start end)))
>             (if idx
>                 (loop (+ idx substr-length)
>                       (cons* replacement
>                              (substring s start idx)
>                              pieces))
>                 (string-concatenate-reverse (cons (substring s start)
>                                                   pieces))))))))

Inspired to code-golf a bit, here's one that's even faster :)

(define (string-replace-substring s substring replacement)
  "Replace every instance of substring in s by replacement."
  (let ((sublen (string-length substring)))
    (with-output-to-string
      (lambda ()
        (let lp ((start 0))
          (cond
           ((string-contains s substring start)
            => (lambda (end)
                 (display (substring/shared s start end))
                 (display replacement)
                 (lp (+ end sublen))))
           (else
            (display (substring/shared s start)))))))))

Just marginally so, though.

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: efficient implementation of string-replace-substring / string-replace-all
  2014-03-23 20:48   ` Andy Wingo
@ 2014-03-24  5:19     ` Mark H Weaver
  2014-03-26 20:14       ` Arne Babenhauserheide
  0 siblings, 1 reply; 5+ messages in thread
From: Mark H Weaver @ 2014-03-24  5:19 UTC (permalink / raw)
  To: Andy Wingo; +Cc: arne_bab, guile-devel

Andy Wingo <wingo@pobox.com> writes:

> On Fri 13 Sep 2013 21:41, Mark H Weaver <mhw@netris.org> writes:
>
>> Here's an implementation that does this benchmark about 80 times faster
>> on my machine: (20 milliseconds vs 1.69 seconds)
>>
>> (define* (string-replace-substring s substr replacement
>>                                    #:optional
>>                                    (start 0)
>>                                    (end (string-length s)))
>>   (let ((substr-length (string-length substr)))
>>     (if (zero? substr-length)
>>         (error "string-replace-substring: empty substr")
>>         (let loop ((start start)
>>                    (pieces (list (substring s 0 start))))
>>           (let ((idx (string-contains s substr start end)))
>>             (if idx
>>                 (loop (+ idx substr-length)
>>                       (cons* replacement
>>                              (substring s start idx)
>>                              pieces))
>>                 (string-concatenate-reverse (cons (substring s start)
>>                                                   pieces))))))))
>
> Inspired to code-golf a bit, here's one that's even faster :)
>
> (define (string-replace-substring s substring replacement)
>   "Replace every instance of substring in s by replacement."
>   (let ((sublen (string-length substring)))
>     (with-output-to-string
>       (lambda ()
>         (let lp ((start 0))
>           (cond
>            ((string-contains s substring start)
>             => (lambda (end)
>                  (display (substring/shared s start end))
>                  (display replacement)
>                  (lp (+ end sublen))))
>            (else
>             (display (substring/shared s start)))))))))
>
> Just marginally so, though.

Nice!  I confess that I find this very surprising.  I would have
expected that the overhead in creating the string port, repeatedly
expanding the string buffer, doing UTF-8 encoding in 'display', and
decoding the UTF-8 when retrieving the result string, would add up to
something slower than what I had.  But experiment trumps theory, and I
guess I'll take your word on it that you did some reasonable benchmarks
and determined that my intuitions were wrong :)

One warning though: in Guile 2.0, string ports only support characters
representable in the %default-port-encoding, which defaults to the
encoding of the current locale.  Importing (srfi srfi-6) fixes this for
open-input-string and open-output-string, but with-output-to-string
remains limited.  Therefore, I recommend using the code above only in
Guile master, where string ports are proper.

    Regards,
      Mark



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: efficient implementation of string-replace-substring / string-replace-all
  2014-03-24  5:19     ` Mark H Weaver
@ 2014-03-26 20:14       ` Arne Babenhauserheide
  0 siblings, 0 replies; 5+ messages in thread
From: Arne Babenhauserheide @ 2014-03-26 20:14 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Andy Wingo, guile-devel

Am Montag, 24. März 2014, 01:19:12 schrieb Mark H Weaver:
> Andy Wingo <wingo@pobox.com> writes:
> 
> > On Fri 13 Sep 2013 21:41, Mark H Weaver <mhw@netris.org> writes:
> >
> > Inspired to code-golf a bit, here's one that's even faster :)
> >
> > (define (string-replace-substring s substring replacement)
> >   "Replace every instance of substring in s by replacement."
> >   (let ((sublen (string-length substring)))
> >     (with-output-to-string
> >       (lambda ()
> >         (let lp ((start 0))
> >           (cond
> >            ((string-contains s substring start)
> >             => (lambda (end)
> >                  (display (substring/shared s start end))
> >                  (display replacement)
> >                  (lp (+ end sublen))))
> >            (else
> >             (display (substring/shared s start)))))))))
> >
> > Just marginally so, though.

Nice!

> One warning though: in Guile 2.0, string ports only support characters
> representable in the %default-port-encoding

Then I’ll keep the original one for the time being.

How are the chances of getting this function into regular Guile? If nothing else, then the speedup of more than factor 80 over my naive implementation would merit providing the efficient one to all guile users, I think. (Also I think that replacing a substring is a very common operation when working with text - like serving websites)

Best wishes,
Arne
-- 
Unpolitisch sein
heißt politisch sein, 
ohne es zu merken. 
- Arne (http://draketo.de)






^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-03-26 20:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-13 14:32 efficient implementation of string-replace-substring / string-replace-all Arne Babenhauserheide
2013-09-13 19:41 ` Mark H Weaver
2014-03-23 20:48   ` Andy Wingo
2014-03-24  5:19     ` Mark H Weaver
2014-03-26 20:14       ` Arne Babenhauserheide

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).