From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Lars Ingebrigtsen Newsgroups: gmane.emacs.devel Subject: Re: On language-dependent defaults for character-folding Date: Sat, 20 Feb 2016 17:31:48 +1100 Message-ID: <87egc7evu3.fsf@gnus.org> References: <83twle71xy.fsf@gnu.org> <87io1us0te.fsf@wanadoo.es> <83pow26svf.fsf@gnu.org> <87a8n5srbp.fsf@wanadoo.es> <83d1s17npz.fsf@gnu.org> <87oablfpn3.fsf@mail.linkov.net> <834mdd6llx.fsf@gnu.org> <7fbb8bc7-9a97-4bad-a103-a6690a35241d@default> <834mdc5w6o.fsf@gnu.org> <838u2hu6aq.fsf@gnu.org> <871t899tde.fsf@gnus.org> <83y4ahru04.fsf@gnu.org> <83fuwproyf.fsf@gnu.org> <837fi0sz29.fsf@gnu.org> <83egc8qzjh.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1455949963 5310 80.91.229.3 (20 Feb 2016 06:32:43 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 20 Feb 2016 06:32:43 +0000 (UTC) Cc: Eli Zaretskii , emacs-devel To: Elias =?iso-8859-1?Q?M=E5rtenson?= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Feb 20 07:32:33 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aX15l-0001Cp-H9 for ged-emacs-devel@m.gmane.org; Sat, 20 Feb 2016 07:32:33 +0100 Original-Received: from localhost ([::1]:58584 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aX15k-0004qQ-Vj for ged-emacs-devel@m.gmane.org; Sat, 20 Feb 2016 01:32:32 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:38575) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aX15W-0004q8-UR for emacs-devel@gnu.org; Sat, 20 Feb 2016 01:32:20 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aX15T-0002BA-Md for emacs-devel@gnu.org; Sat, 20 Feb 2016 01:32:18 -0500 Original-Received: from hermes.netfonds.no ([80.91.224.195]:49637) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aX15T-0002B3-Ff; Sat, 20 Feb 2016 01:32:15 -0500 Original-Received: from cpe-60-225-211-161.nsw.bigpond.net.au ([60.225.211.161] helo=mouse) by hermes.netfonds.no with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1aX156-0007xF-Ti; Sat, 20 Feb 2016 07:31:53 +0100 In-Reply-To: ("Elias \=\?iso-8859-1\?Q\?M\=E5rtenson\=22's\?\= message of "Sat, 20 Feb 2016 13:22:57 +0800") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux) X-MailScanner-ID: 1aX156-0007xF-Ti MailScanner-NULL-Check: 1456554713.84143@J+NlM0f6HcZZxdiYzVoPyw X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 80.91.224.195 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:200277 Archived-At: Elias M=C3=A5rtenson writes: > Every example you have given so far discusses the decomposition > equivalence. I.e. the fact that the who variants of =C3=B1 are the > same. Section 5.16 discuss the _concept_ of allowing n and =C3=B1 match > similarly but the mechanism to do so is locale-dependent. This is what > Unicode says, and that is what I say. Yes. Here are my thoughts (I was sitting on a plane today): It seems to me that we're considering using the Unicode decomposition rules for "variant detection" because it's what we have. But this doesn't allow people to say `C-s l' to find =C5=82 or `C-s o' to find =C3= =B8, and this would obviously be something that many people would find helpful. So the Unicode decomposition rules only get us halfway there. On the other hand, they go to far for other users, who absolutely do not want `C-s o' to find =C3=B8, but would be really glad if `C-s hermes' would find "Herm=C3=A9s" (or is it "Herm=C3=A8s"? I can't even type that in on this keyboard). Emacs is awesome. We should aim to make this extremely useful feature awesome. So: How many characters are we really talking about? Unicode is big and scary, but this only applies to alphabetical scripts, right? That is, all the Latin-like scripts, and... possibly Greek/Hebrew/Cyrillic? I don't know? But if we only consider the Latin scripts for a moment, there aren't more than a few hundred Unicode points that we care about. Basically all the old iso-8859-foos from around Europe. And what we want is a way for people with normal keyboards (they have a-z in Latin alphabet countries) to search for variants. So: That sounds like an evening's work. (defvar *character-variants* '((?a ?=C3=A1 ?=C3=A5 ?=C3=A4 ...) (?o ?=C3=B8 ?=C3=B6 ?=C3=B3 ...) ...)) Everything that somebody says "that's kinda an a, right?" goes on there. Then we have something like: (define-locale-execption :no ?a ?=C3=A5) There would be few of these exceptions per locale. The Scandinavian countries would have three each, and Denmark's and Norway's would be the same. That bit is more than an evening, but is something that people would enjoy submitting exceptions to, I think. And then we just look up the locale, create the mapping when we type `C-s', and there we are. An awesome, very useful feature that would annoy nobody, and that should be on by default. --=20 (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no