From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: On language-dependent defaults for character-folding Date: Mon, 22 Feb 2016 20:51:49 +0200 Message-ID: <831t84lgsa.fsf@gnu.org> References: <83pow26svf.fsf@gnu.org> <87a8n5srbp.fsf@wanadoo.es> <83d1s17npz.fsf@gnu.org> <87oablfpn3.fsf@mail.linkov.net> <834mdd6llx.fsf@gnu.org> <7fbb8bc7-9a97-4bad-a103-a6690a35241d@default> <834mdc5w6o.fsf@gnu.org> <838u2hu6aq.fsf@gnu.org> <871t899tde.fsf@gnus.org> <83y4ahru04.fsf@gnu.org> <83fuwproyf.fsf@gnu.org> <837fi0sz29.fsf@gnu.org> <83egc8qzjh.fsf@gnu.org> <87egc7evu3.fsf@gnus.org> <83io1jpt4u.fsf@gnu.org> <87povqhj25.fsf@gnus.org> <83povqm3dw.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1456167155 14326 80.91.229.3 (22 Feb 2016 18:52:35 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 22 Feb 2016 18:52:35 +0000 (UTC) Cc: larsi@gnus.org, lokedhs@gmail.com, emacs-devel@gnu.org To: rms@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Feb 22 19:52:30 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aXvaw-0006IO-2K for ged-emacs-devel@m.gmane.org; Mon, 22 Feb 2016 19:52:30 +0100 Original-Received: from localhost ([::1]:51234 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aXvav-0001xO-CA for ged-emacs-devel@m.gmane.org; Mon, 22 Feb 2016 13:52:29 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39494) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aXvad-0001qK-MV for emacs-devel@gnu.org; Mon, 22 Feb 2016 13:52:15 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aXvaY-0000Pa-NF for emacs-devel@gnu.org; Mon, 22 Feb 2016 13:52:11 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:42963) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aXvaY-0000PW-Jm; Mon, 22 Feb 2016 13:52:06 -0500 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:1477 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1aXvaQ-0004Mf-Dh; Mon, 22 Feb 2016 13:51:58 -0500 In-reply-to: (message from Richard Stallman on Mon, 22 Feb 2016 12:59:00 -0500) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:200486 Archived-At: > From: Richard Stallman > CC: eliz@gnu.org, larsi@gnus.org, emacs-devel@gnu.org > Date: Mon, 22 Feb 2016 12:59:00 -0500 > > > Right. I guess I'm getting ahead of myself. As you know, I'm advocating > > choosing a default language based on the locale of the user. > > We need: > > * A per-buffer language preference variable. > * A global value which becomes the default for new buffers. That's unnecessarily restrictive; we can do better with the current infrastructure. Some encodings provide us with charset information, which can be used to deduce the language of the text. Some characters belong to Unicode blocks that allow identification of the language, or maybe a small group of languages. In some cases, the text itself comes with metadata which describes the language. And there might be other sources of information about the language. It would be silly to disregard this information where it exists. There are other aspects of this that need to be considered, if we want for language-specific searching to be solid. E.g., what happens with text copied to another buffer which might have a different per-buffer language preference? does it suddenly behave differently when searched? But the most basic issue is that any significant development in these directions require to re-implement the feature on the C level, and use char-tables for folding, like we do with case-mapping. So until someone steps forward for the job, all we can do is small corrections to the existing implementation. For example, the default state of character-folding might depend on the locale's language -- we could turn it off by default for languages whose users expressed dissatisfaction with the feature. We could also augment the regular expressions created for folding the search string by filtering out variants that users of a particular language don't want. If people think these ideas will make more users happy, we can work on that.