From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp12.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id eG+rDeoq8WLZkgAAbAwnHQ (envelope-from ) for ; Mon, 08 Aug 2022 17:25:30 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp12.migadu.com with LMTPS id UNaUDeoq8WL3RAAAauVa8A (envelope-from ) for ; Mon, 08 Aug 2022 17:25:30 +0200 Received: from mail.notmuchmail.org (yantan.tethera.net [IPv6:2a01:4f9:c011:7a79::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id C4D46E114 for ; Mon, 8 Aug 2022 17:25:29 +0200 (CEST) Received: from yantan.tethera.net (localhost [127.0.0.1]) by mail.notmuchmail.org (Postfix) with ESMTP id 458075F370; Mon, 8 Aug 2022 15:25:27 +0000 (UTC) Received: from mail.securebox.hu (mail.securebox.hu [185.51.190.242]) by mail.notmuchmail.org (Postfix) with ESMTPS id F25575E545 for ; Mon, 8 Aug 2022 15:25:23 +0000 (UTC) Received: from mail-lj1-f174.google.com ([209.85.208.174]) by mail.securebox.hu with esmtpsa (TLS1.3) tls TLS_AES_128_GCM_SHA256 (Exim 4.93) (envelope-from ) id 1oL4d6-0005Jq-10 for notmuch@notmuchmail.org; Mon, 08 Aug 2022 17:25:21 +0200 Received: by mail-lj1-f174.google.com with SMTP id s9so10189903ljs.6 for ; Mon, 08 Aug 2022 08:25:19 -0700 (PDT) X-Gm-Message-State: ACgBeo0dxBn6iVFUH1N+vc9h4PSuMnihA8ikxSrGlx8kzUD+qV/MI5eC Yoi8aLd3ldP/VcTIA6DaRMu7uoxKwAvRjDWGwsY= X-Google-Smtp-Source: AA6agR7xSXK6qcSgwrTx1jCCJLTD4oFfs9goBa4ACr00qm6jZ5cWXLeBpRkpqRX5KMxDc99J8IKgEYEX1cRHqq5YIbA= X-Received: by 2002:a05:651c:222c:b0:25f:e654:36e3 with SMTP id y44-20020a05651c222c00b0025fe65436e3mr1107843ljq.20.1659972319329; Mon, 08 Aug 2022 08:25:19 -0700 (PDT) MIME-Version: 1.0 References: <87wnbikcjs.fsf@tethera.net> In-Reply-To: <87wnbikcjs.fsf@tethera.net> From: Bence Ferdinandy Date: Mon, 8 Aug 2022 17:24:53 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: matching both accented and non-accented character for non-accented characters? To: David Bremner X-Authenticated-User: bence@ferdinandy.com Message-ID-Hash: H2WIQLDETZ45NTSGCRZKUSOG44OTDUPM X-Message-ID-Hash: H2WIQLDETZ45NTSGCRZKUSOG44OTDUPM X-MailFrom: bence@ferdinandy.com X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-notmuch.notmuchmail.org-0 CC: notmuch@notmuchmail.org X-Mailman-Version: 3.3.3 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: multipart/mixed; boundary="===============8669405218818805452==" X-Migadu-Flow: FLOW_IN X-Migadu-To: larch@yhetil.org X-Migadu-Country: DE ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1659972329; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references:list-id:list-help: list-owner:list-unsubscribe:list-subscribe:list-post; bh=t1RUtm+JZQKq3yx03MvPeq365HtcjgcG3S/1zMO7QnQ=; b=IFT9wMOjJ/u1Y/Td/Jl+q7//W3OLXb8pjbkmsv7g9LuZCtt6sxA7mz9p8uhewt0KNdIEYS avFWAXiU2oGzO5JcXVAACSzVPg+wxGWDwxa4/zymVT2SmC/DAd8o0/3UGS4Map1YWRhq0P H9JKvzP8ASwLQOuOSFmOYEQpfSqD6ufBRGH409+Nqi4vzhB6FI6O2hCMnllxiqht3L5Kuq 1e6D3w702MDP9q1R5+4cnsEpii04Rjpj9udzxJ36xIIKTfsfwjY0chnDhOb+TdOYrPrrhN DwEFyhbRYKc80aKSkhvzpPKNPpgT5ESe5MfG9NEpsb4ZiS+lPqg8efzUI1/4TA== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1659972329; a=rsa-sha256; cv=none; b=dvO3xbqU+0zj1f0bjCTNUl3DvqywbXW2eXjZ6mDudHi3NQ6aPqO23hiDfv2AMbFMsiITl9 IDrLK6kVv5iV/3KbIh4r8Xqbfp6622U6oLXGyu+hYrc8mQvjyX1vi16iM5yl8Obgfh+4pe cWYwDDnPbRwkkmund+Dw95i2aCTi6mE3OmpwHefet8SaIP6oUIwRAgcynZocFvetpasqn0 WnhUxLYYsdpCkSMabI/1OPeejl1GTWr18nHODKghiBVocSALACZixe917yNuFh/7tImTsg p2Ee57hFxCTLF4QVABOTu4xp1rg6S/oVL/+8gruK4OFl74SK8StoFuPTNo4dVA== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 2a01:4f9:c011:7a79::1 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org X-Migadu-Spam-Score: -1.14 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 2a01:4f9:c011:7a79::1 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org X-Migadu-Queue-Id: C4D46E114 X-Spam-Score: -1.14 X-Migadu-Scanner: scn1.migadu.com X-TUID: ReX85U/UME55 --===============8669405218818805452== Content-Type: multipart/alternative; boundary="0000000000006e432305e5bc6c5e" --0000000000006e432305e5bc6c5e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks! I didn't know unicode equivalence existed, but it seems to be the feature I want, so at least now I have a name for it :) And yes, actually setting the stemmer would also be cool, I saw that Xapian has a Hungarian stemmer but I kind of assumed all stemmers are applied somehow (although it makes sense they're not). Is stemming done during search or would it affect the database as well? Just to have a notion of how complicated a settable stemmer feature would be. David Bremner ezt =C3=ADrta (id=C5=91pont: 2022. aug. 8= ., H, 16:58): > Bence Ferdinandy writes: > > > Hi, > > > > I'm in the process of trying to set up reading email in the terminal an= d > > just installed notmuch, which looks like a pretty awesome tool. I > currently > > have one question nagging me: > > > > I have a lot of mail in my native Hungarian, which properly written is > full > > of characters like =C3=A9=C3=A1=C5=B1=C3=BA=C3=BC, but if someone's wri= ting on a non-Hungarian > > keyboard, or just quickly writing an email from a phone, they often dro= p > > the accents as it's faster and we'll likely understand anyway. Is it > > possible to set it up that if I search for "lanc" it would also match > > "l=C3=A1nc" other than going `notmuch search lanc OR l=C3=A1nc`? > > There is some previous discussion at > > > https://nmbug.notmuchmail.org/nmweb/search/id%3A87efp2b9er.fsf%40tethera.= net > > I don't think anyone worked on this in the meantime, so I guess the > short answer is that there is currently no support, but people have > tossed around some ideas. > > d > --=20 +36305425054 bence.ferdinandy.com --0000000000006e432305e5bc6c5e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks! I didn't know unicode equivalence existed, but= it seems to be the feature I want, so at least now I have a name for it :)= And yes, actually setting the stemmer would also be cool, I saw that Xapia= n has a Hungarian stemmer but I kind of assumed all stemmers are applied so= mehow (although it makes sense they're not). Is stemming done during se= arch or would it affect the database=C2=A0as well? Just to have a notion of= how complicated a settable stemmer feature would be.

David Bremner <david@tethera.net> ezt =C3=ADrta = (id=C5=91pont: 2022. aug. 8., H, 16:58):
Bence Ferdinandy <bence@ferdinandy.com> writes:

> Hi,
>
> I'm in the process of trying to set up reading email in the termin= al and
> just installed notmuch, which looks like a pretty awesome tool. I curr= ently
> have one question nagging me:
>
> I have a lot of mail in my native Hungarian, which properly written is= full
> of characters like =C3=A9=C3=A1=C5=B1=C3=BA=C3=BC, but if someone'= s writing on a non-Hungarian
> keyboard, or just quickly writing an email from a phone, they often dr= op
> the accents as it's faster and we'll likely understand anyway.= Is it
> possible to set it up that if I search for "lanc" it would a= lso match
> "l=C3=A1nc" other than going `notmuch search lanc OR l=C3=A1= nc`?

There is some previous discussion at

=C2=A0 =C2=A0 =C2=A0 htt= ps://nmbug.notmuchmail.org/nmweb/search/id%3A87efp2b9er.fsf%40tethera.net

I don't think anyone worked on this in the meantime, so I guess the
short answer is that there is currently no support, but people have
tossed around some ideas.

d


--
--0000000000006e432305e5bc6c5e-- --===============8669405218818805452== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline --===============8669405218818805452==--