From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.devel Subject: Re: Character folding in the pretest Date: Thu, 4 Feb 2016 11:25:41 -0800 Organization: UCLA Computer Science Department Message-ID: <56B3A5B5.6040102@cs.ucla.edu> References: <87mvriuk3a.fsf@gmail.com> <8737t9ex1p.fsf@petton.fr> <83oabxyf71.fsf@gnu.org> <56B230D1.90902@gmail.com> <87bn7x4i4o.fsf@wanadoo.es> <87d1sc4rin.fsf@djcbsoftware.nl> <83r3gswh5e.fsf@gnu.org> <56B38C2F.30006@cs.ucla.edu> <83bn7wwerl.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1454613974 18626 80.91.229.3 (4 Feb 2016 19:26:14 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 4 Feb 2016 19:26:14 +0000 (UTC) Cc: djcb@djcbsoftware.nl, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Feb 04 20:26:05 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aRPXY-0003sK-A4 for ged-emacs-devel@m.gmane.org; Thu, 04 Feb 2016 20:26:04 +0100 Original-Received: from localhost ([::1]:43701 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aRPXX-0006f4-FI for ged-emacs-devel@m.gmane.org; Thu, 04 Feb 2016 14:26:03 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:59821) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aRPXK-0006eo-PN for emacs-devel@gnu.org; Thu, 04 Feb 2016 14:25:51 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aRPXJ-0002Gc-TK for emacs-devel@gnu.org; Thu, 04 Feb 2016 14:25:50 -0500 Original-Received: from zimbra.cs.ucla.edu ([131.179.128.68]:60805) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aRPXE-0002GL-Rv; Thu, 04 Feb 2016 14:25:44 -0500 Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 5BAFB160F57; Thu, 4 Feb 2016 11:25:42 -0800 (PST) Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id a8NUHhMl20xA; Thu, 4 Feb 2016 11:25:41 -0800 (PST) Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 95723160F5B; Thu, 4 Feb 2016 11:25:41 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id Meox0NItp5zL; Thu, 4 Feb 2016 11:25:41 -0800 (PST) Original-Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 7CF4B160F57; Thu, 4 Feb 2016 11:25:41 -0800 (PST) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 In-Reply-To: <83bn7wwerl.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 131.179.128.68 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:199321 Archived-At: On 02/04/2016 09:45 AM, Eli Zaretskii wrote: > We should instead cater to users who search text they_can_ read. This depends on what one means by "read". I can "read" Swedish in the=20 sense that I know where the word boundaries are and have some idea of=20 how they're pronounced. I can also "read" Belarusian in the sense that I=20 know Cyrillic and a bit of Russian and can follow Belarusian better than=20 Swedish, though I easily get lost. In both cases, I'd prefer=20 Unicode-type case folding even though it's "wrong" to ignore diacritics=20 in the native languages. Conversely, I can't "read" Hebrew or Chinese or Arabic in the same sense=20 and so don't much care how folding works for those language. Perhaps=20 some Hebrew-speaking experts want =D7=A4=D6=BC and =D7=A4 and =D7=A3 to b= e treated the same=20 while searching, while other experts do not; it doesn't matter to me. To help provide context here, most of my reading of non-English text is=20 to support other free projects such as the tz database. That database is=20 mostly English but contains short passages from other languages. I use=20 Emacs for primary database maintenance, but often use other programs to=20 browse the Internet as they're more convenient. I'll cut and paste out=20 of a Firefox browser between a page of interest and Google Translate,=20 for example. Examples of text under Emacs control include "Bah=C3=ADa", "= L=E1=BB=8Bch=20 hai th=E1=BA=BF k=E1=BB=B7", "=E4=B8=AD=E5=9B=BD=E7=A7=91=E6=8A=80=E5=8F=B2= =E6=96=99", and "=D0=9D=D0=BE=D0=B2=D1=8B=D0=B9 =D1=81=D1=87=D0=B5=D1=82 = =D0=B2=D1=80=D0=B5=D0=BC=D0=B5=D0=BD=D0=B8". Most of the=20 searching for this sort of thing in Emacs will involve typing strings=20 like "bahia" and "lich" where I almost always prefer diacritic- and=20 case-folded search.