unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: Stefano Zacchiroli <zack@debian.org>
To: David Bremner <david@tethera.net>
Cc: Bruno Deremble <bruno.deremble@ens.fr>, notmuch@notmuchmail.org
Subject: Re: accented characters
Date: Mon, 13 Nov 2017 15:35:15 +0100	[thread overview]
Message-ID: <20171113143515.5hbnsma72r24qutf@upsilon.cc> (raw)
In-Reply-To: <87efp2b9er.fsf@tethera.net>

On Mon, Nov 13, 2017 at 09:22:36AM -0400, David Bremner wrote:
> The other thing I don't know is how many people would be happy with just
> stripping all accents. That could be done in a gmime filter, as you
> suggest. That would be more likely to require changes to the query
> language. Off hand I don't know how to transparently de-accent all query
> words.

My gut feeling is that removing accents by default from both the terms
in the index and user queries would go a long way in addressing this
problem. Especially so if it's a boolean option in notmuch config (which
default to stripping accents).

As a random example/data point, chromium does that and when you search
unaccented strings in a web page will find any combination of them with
accents. Is, by far, my best UX experience w.r.t. accents on GNU/Linux.

Unicode has a notion of canonical form that rearrange accented
characters in a sequence of non-accented characters + modifiers
https://en.wikipedia.org/wiki/Unicode_equivalence . A bunch of libraries
use that stuff to normalize-away accents in unicode strings. I'm aware
of a few in Python for instance, but not in C++ (which I believe is what
you'd be interested in).

HTH,
-- 
Stefano Zacchiroli . zack@upsilon.cc . upsilon.cc/zack . . o . . . o . o
Computer Science Professor . CTO Software Heritage . . . . . o . . . o o
Former Debian Project Leader & OSI Board Director  . . . o o o . . . o .
« the first rule of tautology club is the first rule of tautology club »

  reply	other threads:[~2017-11-13 14:41 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-12 21:02 accented characters Bruno Deremble
2017-11-13 13:22 ` David Bremner
2017-11-13 14:35   ` Stefano Zacchiroli [this message]
2017-11-13 17:47     ` David Bremner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171113143515.5hbnsma72r24qutf@upsilon.cc \
    --to=zack@debian.org \
    --cc=bruno.deremble@ens.fr \
    --cc=david@tethera.net \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).