From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: =?UTF-8?Q?Per_Starb=C3=A4ck?= <per@starback.se>
Newsgroups: gmane.emacs.devel
Subject: Re: Questions about isearch
Date: Thu, 26 Nov 2015 21:46:49 +0100
Message-ID: <CADkQgvtg=vfPM0Xk9CGmc3TFAVasAU4qKtqxjz=tf4ss5u+oXA@mail.gmail.com>
References: <83lh9lx6oi.fsf@gnu.org> <87egfdant7.fsf@gmx.us>
	<upzctwo9alup.fsf@dod.no> <E1a1xoH-0004xa-TP@fencepost.gnu.org>
	<83h9k8vig7.fsf@gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Trace: ger.gmane.org 1448570827 7596 80.91.229.3 (26 Nov 2015 20:47:07 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Thu, 26 Nov 2015 20:47:07 +0000 (UTC)
Cc: Eli Zaretskii <eliz@gnu.org>, sb@dod.no, rms@gnu.org
To: "emacs-devel@gnu.org" <emacs-devel@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Nov 26 21:46:59 2015
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1a23RR-0007Ua-Fc
	for ged-emacs-devel@m.gmane.org; Thu, 26 Nov 2015 21:46:57 +0100
Original-Received: from localhost ([::1]:53115 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1a23RS-000552-SU
	for ged-emacs-devel@m.gmane.org; Thu, 26 Nov 2015 15:46:58 -0500
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:41683)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <per.starback@gmail.com>) id 1a23RP-00054x-9R
	for emacs-devel@gnu.org; Thu, 26 Nov 2015 15:46:56 -0500
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <per.starback@gmail.com>) id 1a23RO-0005Bf-5v
	for emacs-devel@gnu.org; Thu, 26 Nov 2015 15:46:55 -0500
Original-Received: from mail-vk0-x232.google.com ([2607:f8b0:400c:c05::232]:36012)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <per.starback@gmail.com>)
	id 1a23RK-00059V-Rk; Thu, 26 Nov 2015 15:46:50 -0500
Original-Received: by vkay187 with SMTP id y187so58687643vka.3;
	Thu, 26 Nov 2015 12:46:50 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:sender:in-reply-to:references:date:message-id:subject
	:from:to:cc:content-type:content-transfer-encoding;
	bh=Nohxcey4XgBYRN0dUaBeYPK1v5rAK/DGDWRRa/alD0Y=;
	b=ad2+0coU/IS71ppgVs5W6lrCGMwDxsIF2AR6OlsTaVmXycGT8adP2Q3dqQgJHakhCW
	HyIGA1XjSqn0zh16NnypqbttPYTz2XmZiJO3PNOuLYgbuxmLxyuES/dcrKHbWgI1s7Kr
	EqOeud9itRtGxSNIrvSO2L1J/DL9B5qXMM2ObnKKh0QPOXBy1d8qTUcPPyzvIRHP4G00
	41udix94OabJOmDqUARHCO23y1jZ7E0QiCzaXcVTLz2m37kT8f6SnSo/llyzGdtjqCO+
	DLPSwXAaBJkv74pnOkfm8DghpDpiioyylW6wk75s5JszXksXI6j4Jbzhza/2pE7JxaFn
	tKdg==
X-Received: by 10.31.52.211 with SMTP id b202mr39723527vka.82.1448570810054;
	Thu, 26 Nov 2015 12:46:50 -0800 (PST)
Original-Received: by 10.31.54.197 with HTTP; Thu, 26 Nov 2015 12:46:49 -0800 (PST)
In-Reply-To: <83h9k8vig7.fsf@gnu.org>
X-Google-Sender-Auth: ij0rEDkEmNCHyMLPYuacjQD46qA
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
	(bad octet value).
X-Received-From: 2607:f8b0:400c:c05::232
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:195316
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/195316>

> IMO, it is more important to have language-independent matching in
> Emacs.  Language-specific rules are also needed in some situations,
> but they are secondary for Emacs.
>
>> It seems to me that we want to introduce a concept of current language

Yes! The language of a buffer is something I have wished for a long
long time, probably using minor modes. It has primarily been to have
the correct ispell dictionary and to have different abbrevs depending
on language.

With the new search folding it is much more needed.

> It's a problematic concept for Emacs, which is a multi-lingual
> environment.  For example, what is the "current language" of the
> buffer showing this message?

It's in English.

>  It cannot be US English, since it
> includes characters not in that language, and can easily include
> Turkish words.  Or consider the etc/HELLO file.

I don't understand at all what you are saying here. Yes, of course
Turkish words (and any character) can be in an English text. That
doesn't make it false that it is in English. Do you just mean that it
can be hard do determine the language of a text automatically?

> We could probably have a text property which will specify the
> language, but we don't have good means to set such a property.  IOW,
> where that information would come from?

I don't envision a text property, but just a value for the buffer,
because it is much easier and good enough for most things. Yes, there
are situations where you might want to differentiate it like that, but
that goes for other things we have in modes as well. (It would
sometimes be nice to get Javascript mode for part of an HTML file
etc.)

So from where do we get it? Normally from the user. Many users mostly
write in a few languages, like Swedish and English to take myself as
an example. What I want is an indication "en" or "sv" somewhere in the
information line and commands to toggle between my favourite
languages.

Sometimes it can be determined automatically. For example when opening
a html file Emacs could look at the "lang" attribute, in a LaTeX file
it could see how you use packages like Babel or Polyglossia. And in
any text file various methods (like n-gram frequencies) can be used to
try to identify the language automatically.

I think the focus should be on buffers being able to have a (natural)
language, and commands to change that. It would be quite sufficient
with:
 * a setting listing what languages I normally want to use (the first
one being the default)
 * a cycling command that sets the language to the next in that list
(that is a toggle when you have a two-list)
 * a command to explicitly set any valid value

Anything else can be done a lot later, and as experiments outside of
the core. Automatic detection is neat, but not really needed. And
exactly what changes the different languages need to do will be
determined part by part by time in different language communities. The
important thing is that there is some hook to hang your code on.

* Why it is so important, now with the new search folding *

For Scandinavians it is really important, because (with Swedish as
example) =C3=A5=C3=A4=C3=B6 are really totally their own letters in the Swe=
dish
alphabet, regardless of their historic origin. To have a search for
"varpa" in a Swedish text find "v=C3=A4rpa" or "varp=C3=A5" would be just w=
rong.
It would give a strong impression of this being an American program
not meant to be used for Swedish.

An analogue would be finding "jamb" when looking for "iamb" in
English, where I and J are totally different letters, even though they
originally (in Latin) were the same. Or you start an isearch for
"valid" and after the first four letters you are inside "dualism". (U
and V also were the same letter originally.) Confusing and irritating,
and something to make people turn off this search folding which would
be sad, because it's a nice thing to have.