Re: Custom streams 10x faster than srfi-41

unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed

From: Amirouche Boubekki <amirouche@hypermove.net>
To: jeremy@thatgeoguy.ca
Cc: guile-user@gnu.org,
	guile-user <guile-user-bounces+amirouche=hypermove.net@gnu.org>
Subject: Re: Custom streams 10x faster than srfi-41
Date: Mon, 05 Sep 2016 22:48:59 +0200	[thread overview]
Message-ID: <6316b7053cdfd6b5eb78639523a8698a@hypermove.net> (raw)
In-Reply-To: <428246d2-a4ba-afc6-d4a0-efe7832942bd@thatgeoguy.ca>

On 2016-09-05 21:14, Jeremy Steward wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> Hey Amirouche,
> 
> | (define (lazy-iota count) (let loop ((i 0)) (lambda () (unless (eq?
> | i count) (cons i (loop (1+ i)))))))
> |
> | (define iota3 (lazy-iota 3))
> |
> | (pk (iota3))
> |
> | The first time `lazy-iota' is called it returns a procedure that is
> | bound to `count'. Then you can call that procedure to get a pair
> | with the first value and a procedure which will allow to retrieve
> | the rest.
> 
> This seems very similar to SRFI 121 (generators) and SRFI 127 (Lazy
> Sequences).

Thanks for the pointer.

> Is there something particularly different about this implementation?

Values generated by traversi-map are chained which means what actually
travels in the stream is the history list of values with the first
value being the last created value. It allows to backtrack to previous
values. (This is not the case of lazy-iota)

I can't find it anymore in Tinkerpop documentation [0] so maybe it's
not a useful feature.

[0] http://tinkerpop.apache.org/docs/current/reference/#traversal

I'll explain how I use it nonetheless and the reasons I used to do so
in the context of graph traversal.

Usually what happens in graph traversal is that you start with one or
several vertex do some traversal/navigation or other map operation and
filter based on some heuristic and want to find out the original
vertex that lead to the current result you have, that's where
"backtracking" happens. The thing is that you could do the traversal
in reverse order to retrieve the original vertex that's why it might
not be as useful as I was thinking.

The thing is that my graph database library never fully load vertices
and edges into RAM when it traverse a graph. The current value in the
traversi is never more than a proper list. So it can be a string, an
integer or a list of.

For instance if one can use traversi to implement item-item
recommendations.

Draw a graph like the following:

u1 --[likes@1]--> m1

u2 --[likes@1]--> m1
u2 --[likes@1]--> m2
u2 --[likes@0]--> m3

u3 --[likes@1]--> m1
u3 --[likes@1]--> m4

u4 --[likes@0]--> m1
u4 --[likes@1]--> m5

What does the item-item recommendation do, is search for user that
rated high some movie and look what other movie they rated high.

For instance, we can do the following traversal starting with movie
m1:

m1 <--[likes@1]-- u1,u2,u3 --[likes@1]--> m1,m2,m4

I won't explain how everything happens it's better explained
elsewhere [1] (in french here [2])

[1] 
https://markorodriguez.com/2011/09/22/a-graph-based-movie-recommender-engine/
[2] http://hyperdev.fr/notes/recommendations-basees-sur-les-graphes.html

Basically the traversal starts at m1, ask for incomings edges that
are likes which have a score of 1. If edges were fully loaded into
memory I could simply use a filter proc like

(lambda (edge)
   (and (eqv? (assoc-ref edge 'label) 'likes)
        (eq? (assoc-ref edge 'score) 1)))

But instead I have a procedure which fetch a single field.

So the flow of the traversi program is:

- m1
- fetch incomings edges [*]
- fetch 'label for current value
- filter: if value == 'likes
- backtrack to [*]
- fetch 'score for current value
- filter: if value == 1
- backtrack to [*]

Like I said it's an optimisation trick, maybe it's not worthwhile,
maybe it's evil.

> I've seen similar performance improvements in CHICKEN Scheme using
> SRFI 121 and 127 over SRFI 41 (streams).
> 
> Cheers,
> - -- Jeremy Steward
> 

Thanks for giving me the opportunity to re-think it.

-- 
Amirouche ~ amz3 ~ http://www.hyperdev.fr

next prev parent reply	other threads:[~2016-09-05 20:48 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-05 17:24 Custom streams 10x faster than srfi-41 Amirouche Boubekki
2016-09-05 19:14 ` Jeremy Steward
2016-09-05 20:48   ` Amirouche Boubekki [this message]
2016-09-06 17:59     ` Amirouche Boubekki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6316b7053cdfd6b5eb78639523a8698a@hypermove.net \
    --to=amirouche@hypermove.net \
    --cc=guile-user-bounces+amirouche=hypermove.net@gnu.org \
    --cc=guile-user@gnu.org \
    --cc=jeremy@thatgeoguy.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).