From: Catonano <catonano@gmail.com>
To: Amirouche Boubekki <amirouche@hypermove.net>
Cc: Guile User <guile-user@gnu.org>
Subject: Re: What's next with culturia search engine? (and guile-wiredtiger)
Date: Sun, 14 Jan 2018 15:12:23 +0100 [thread overview]
Message-ID: <CAJ98PDydepHtMQGer+XfxL1jmos954qFC-sMaiSd-cKW1UaSjA@mail.gmail.com> (raw)
In-Reply-To: <89c7dc5c588f12ceae8efe078efb15fa@hypermove.net>
2018-01-14 11:05 GMT+01:00 Amirouche Boubekki <amirouche@hypermove.net>:
> On 2018-01-14 09:12, Catonano wrote:
>
>> 2017-11-26 23:33 GMT+01:00 Amirouche Boubekki
>> <amirouche@hypermove.net>:
>>
>>>
>>> The quering engine will first compute the frequency of both
>>> keywords and then lookup the inverted index for the least
>>> frequent keyword.
>>>
>>
>> The least frequent keyword ?
>>
>> Not the most frequent keyword ?
>>
>
> Yes, imagine you search for serif+font, most common
> word and the least discriminant is "font" because there
> is (I think) more page containing "font".
>
> The result of the inverted lookup above is used as seed
> of the rest of the algorithm that is O(n) so I need to
> minimize 'n' ie. the count of initial documents.
I see now. Thanks
>
>
>
>> That way, there is a 'seed' set of documents
>>> that we can filter with a small vm that will interpret the
>>> rest of the query for instance. Something like:
>>>
>>> (filter (hit? (cdr query)) seed)
>>>
>>> Sort of. I can't make it simpler right now, but you can
>>> have a look at the code. The public procedure and the bottom
>>> called 'search' [4] is the where the code starts.
>>>
>>
> This is badly explained. At this point SEED contains the unique
> identifier of document that contains the least frequent word.
> We remove it from the query hence the (cdr query) and filter
> the SEED with the rest of the query. This is small optimization,
> because we know that the least frequent word is already in the
> documents found in the SEED, so we do not need to check its
> presence in the SEED documents. 'hits?' will return somekind
> of state-machine that will check that a given document match
> the QUERY passed as argument.
>
> That what I mean to do, the (cdr query) to remove the most
> discriminant query term is not implemented, yet.
>
Ok
>
>>> [4]
>>>
>>> https://github.com/a-guile-mind/culturia.one/blob/master/src
>> /wiredtiger/ix.scm#L455
>>
>>> [8]
>>>
>>
>> file not fond
>>
>>
> It's here: https://github.com/a-guile-mind/culturia.one/blob/master/src
> /ix.scm#L439
>
> I reworked the thing to use grf3 graph abstraction to store
> the documents.
>
> Also guile-wiredtiger 0.6.4 is in guix.
I know ;-)
At the moment I don't feel confident with guile code calling C code
There's guile-squee thaht needs some love too, that could be a starting
point ofrr me
Also G-golf is very important. GUIs are not optional, they are fundamental,
we should absolutely have a decent integration between Guile and Gnome
When and if I'll know more, I'll take a look at Culturia too :-)
>
>
>
>
>> All this looks pretty interesting but I have to say that I prefer the
>> work you're doing on GNUNet ;-)
>>
>
> Tx for you interest!
>
No, thank you !
GNUNet is also very important, probably more than g-golf, I'm not sure
I can't wait to test drive it
prev parent reply other threads:[~2018-01-14 14:12 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-26 22:33 What's next with culturia search engine? (and guile-wiredtiger) Amirouche Boubekki
2017-11-27 22:03 ` Tom Jakubowski
2017-11-27 23:29 ` Amirouche Boubekki
2018-01-14 8:12 ` Catonano
2018-01-14 10:05 ` Amirouche Boubekki
2018-01-14 14:12 ` Catonano [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJ98PDydepHtMQGer+XfxL1jmos954qFC-sMaiSd-cKW1UaSjA@mail.gmail.com \
--to=catonano@gmail.com \
--cc=amirouche@hypermove.net \
--cc=guile-user@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).