From: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
To: Jani Nikula <jani@nikula.org>,
notmuch@notmuchmail.org, Olly Betts <olly@survex.com>
Subject: Re: multilingual notmuch (and Content-Language)
Date: Mon, 19 Mar 2018 07:38:07 +0000 [thread overview]
Message-ID: <87vadstt0g.fsf@fifthhorseman.net> (raw)
In-Reply-To: <87bmflrxgs.fsf@nikula.org>
On Sun 2018-03-18 21:32:35 +0200, Jani Nikula wrote:
> On Sun, 18 Mar 2018, Daniel Kahn Gillmor <dkg@fifthhorseman.net> wrote:
>> * if we know our index expects english, and we have a message part that
>> *is not* english (e.g. Content-Language: es), we could avoid indexing
>> that part.
>
> Why would we do that? Search mostly works just fine for non-English
> languages, it's just that the *stemming* is not right.
>
>> what do you think? what ideas are missing from the branstorm above? I'd
>> love to hear from people with multilingual mailboxes about how we might
>> be able to make notmuch work better for them.
>
> With my limited understanding of this, stemming happens both at indexing
> and searching. Basically at indexing, the term generator indexes both
> the full and the stemmed version of words. I'm wondering if we could
> look at Content-Language (and missing that, heuristics), and (if the
> user so desires) use multiple term generators with different stemmers on
> a per document basis. Or, use non-stemming indexing for unidentified or
> unsupported languages. How far would that take us? Then, perhaps, we
> could also perform language specific queries?
>
> I don't know how feasible that is, or if it would require Xapian
> changes.
thanks, this is exactly the kind of promising idea i was hoping my dumb
questions and half-baked suggestions would provoke :)
Maybe Olly or someone else with deeper knowledge of xapian can weigh in
about the feasibility of this proposal?
--dkg
next prev parent reply other threads:[~2018-03-19 9:42 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-18 15:02 multilingual notmuch (and Content-Language) Daniel Kahn Gillmor
2018-03-18 18:22 ` David Bremner
2018-03-18 19:32 ` Jani Nikula
2018-03-19 7:38 ` Daniel Kahn Gillmor [this message]
2018-03-21 15:05 ` Servilio Afre Puentes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://notmuchmail.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87vadstt0g.fsf@fifthhorseman.net \
--to=dkg@fifthhorseman.net \
--cc=jani@nikula.org \
--cc=notmuch@notmuchmail.org \
--cc=olly@survex.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://yhetil.org/notmuch.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).