From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Amirouche Boubekki Newsgroups: gmane.lisp.guile.user Subject: Re: babelia Date: Sat, 16 Nov 2019 11:19:03 +0100 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="87314"; mail-complaints-to="usenet@blaine.gmane.org" To: Guile User Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Sat Nov 16 11:19:51 2019 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1iVvBF-000MXK-Ow for guile-user@m.gmane.org; Sat, 16 Nov 2019 11:19:49 +0100 Original-Received: from localhost ([::1]:47186 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iVvBE-0005Rs-4y for guile-user@m.gmane.org; Sat, 16 Nov 2019 05:19:48 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:51908) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iVvAj-0005Rm-0v for guile-user@gnu.org; Sat, 16 Nov 2019 05:19:19 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iVvAh-00007t-Q1 for guile-user@gnu.org; Sat, 16 Nov 2019 05:19:16 -0500 Original-Received: from mail-ua1-x936.google.com ([2607:f8b0:4864:20::936]:33463) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1iVvAh-00006d-M1 for guile-user@gnu.org; Sat, 16 Nov 2019 05:19:15 -0500 Original-Received: by mail-ua1-x936.google.com with SMTP id a13so3804505uaq.0 for ; Sat, 16 Nov 2019 02:19:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-transfer-encoding; bh=etuS6jzvsUcR4YWzKFkhdh4zYFMywEa9mbc3nPG8/ac=; b=VAqrrPKu7XfV5eKCs4yQjQExf9s4RdDS3AscUqGs5NsCOONDG5kn3jby1RF6ujPplr xv9+PZWZZAvWrUugf2vrDltzAJ/KKQ+1c8V7+NT8d34PczKHfumJi7GS5KMkxFlV53QD zfUVLnFchSb+IY0BNQL976I/DvochYilbjc2Z9VeNukalo/TeUWeFbAF1GLl26N8Xv0M HyLXFbXqnC4tMKv1Ih1FbtvVp5L1m0Id3o9j8FDDHijXsYMqbR2zeK6YQE1FErXkH8xC NFTQ97z3vihthbfaNV7sflTu8mBfLn3lg1OuwBKUUQudECIC15GgPa/6QlUOZYCS523z nFjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=etuS6jzvsUcR4YWzKFkhdh4zYFMywEa9mbc3nPG8/ac=; b=iyjAcoZJMWRuGtEnvf9q123om3KpqbAa8bclvzKrFeGiFsOpD2hDldq2wXTR2oW23N 1KUEB59MiAKEkrHCAZCFcXgW1GXySPopE8a2H0wx1y15duOcHBDOiyeQF2mul+Gdg0Em 3paUMEaT/wGlgeDPjZ1qlCN8e5CawajRVOJKoT0n7xn/jHUgdwMoifV4lC9jQEmLa2Zj tk7SyCyQtUjlEf4iDlbh1DxpQHAjQY+/IU9L6pf4CxAn99D/y0e1DPgmHjRnSs54tMLc aAi1US+2tMZWGbeD0JQyugVi+9RG4ikAhrR86rjE7Q3jVchL7AC60G1lIEwfIF6Gnm/o RMHA== X-Gm-Message-State: APjAAAUnc3aS9nhKoAUp6MxffcT7yJuJFqC3t6RErdfemPsduxRpEtSp WjjQ4jFkHuTlEU7itV09CQklwjzHozIyUcp8GJyKGGYZmm0= X-Google-Smtp-Source: APXvYqwwNi5Z3hDeGH570mtfCPpLtSws8FltVZNfUn4dj+BHDSOvUJC/f/ignytUzaGeNa2MckxSmA6ylPmPytApNh8= X-Received: by 2002:ab0:4e87:: with SMTP id l7mr11998746uah.63.1573899554578; Sat, 16 Nov 2019 02:19:14 -0800 (PST) In-Reply-To: X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::936 X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Original-Sender: "guile-user" Xref: news.gmane.org gmane.lisp.guile.user:15901 Archived-At: Le sam. 16 nov. 2019 =C3=A0 11:06, Amirouche Boubekki a =C3=A9crit : > > I restarted working on my personal search engine. > > It used to be called culturia [0] with too many planned features. At > some point, I called it asylum [1] and focused on personal knowledge > base aspects and the last iteration was called gotofish [2] > > [0] https://framagit.org/a-guile-mind/culturia > [1] https://framagit.org/a-guile-mind/culturia.next > [2] https://git.sr.ht/~amz3/guile-gotofish > > I learned much from all this projects. In particular, I learned that > it will be a long long long project, even if I focus only on "personal > search engine" line of work. > > The last iteration, gotofish, was not too bad even if it has bitrot. > Based on my research and practical experiment, it seems very clear > that there is no workaround the use of map-reduce, that might be known > as n-par-for-each [3]. > > [3] https://www.gnu.org/software/guile/manual/html_node/Parallel-Forms.ht= ml#index-n_002dpar_002dfor_002deach > > I made a prototype similar to that n-par-for-each, except it works > with guile-fibers, is asynchronous and works with a shared pool of > threads instead of spawning N threads for each incoming query like > gotofish does. > > Related blog post: https://hyper.dev/blog/on-the-road-to-babelia.html > > If you want to help or discuss those matters, do not hesitate to reply > to this message. I forgot to add that there is several big-ish tasks that can be tackled in parallel (see the above blog post). In particular, a parser for wet or warc files, see https://en.wikipedia.org/wiki/Web_ARChive. This is the most common format of the output of crawlers e.g. http://commoncrawl.org/