From: Stefano Zacchiroli <zack@debian.org>
To: David Bremner <david@tethera.net>
Cc: Bruno Deremble <bruno.deremble@ens.fr>, notmuch@notmuchmail.org
Subject: Re: accented characters
Date: Mon, 13 Nov 2017 15:35:15 +0100 [thread overview]
Message-ID: <20171113143515.5hbnsma72r24qutf@upsilon.cc> (raw)
In-Reply-To: <87efp2b9er.fsf@tethera.net>
On Mon, Nov 13, 2017 at 09:22:36AM -0400, David Bremner wrote:
> The other thing I don't know is how many people would be happy with just
> stripping all accents. That could be done in a gmime filter, as you
> suggest. That would be more likely to require changes to the query
> language. Off hand I don't know how to transparently de-accent all query
> words.
My gut feeling is that removing accents by default from both the terms
in the index and user queries would go a long way in addressing this
problem. Especially so if it's a boolean option in notmuch config (which
default to stripping accents).
As a random example/data point, chromium does that and when you search
unaccented strings in a web page will find any combination of them with
accents. Is, by far, my best UX experience w.r.t. accents on GNU/Linux.
Unicode has a notion of canonical form that rearrange accented
characters in a sequence of non-accented characters + modifiers
https://en.wikipedia.org/wiki/Unicode_equivalence . A bunch of libraries
use that stuff to normalize-away accents in unicode strings. I'm aware
of a few in Python for instance, but not in C++ (which I believe is what
you'd be interested in).
HTH,
--
Stefano Zacchiroli . zack@upsilon.cc . upsilon.cc/zack . . o . . . o . o
Computer Science Professor . CTO Software Heritage . . . . . o . . . o o
Former Debian Project Leader & OSI Board Director . . . o o o . . . o .
« the first rule of tautology club is the first rule of tautology club »
next prev parent reply other threads:[~2017-11-13 14:41 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-12 21:02 accented characters Bruno Deremble
2017-11-13 13:22 ` David Bremner
2017-11-13 14:35 ` Stefano Zacchiroli [this message]
2017-11-13 17:47 ` David Bremner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://notmuchmail.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171113143515.5hbnsma72r24qutf@upsilon.cc \
--to=zack@debian.org \
--cc=bruno.deremble@ens.fr \
--cc=david@tethera.net \
--cc=notmuch@notmuchmail.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://yhetil.org/notmuch.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).