From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Questions about isearch
Date: Thu, 26 Nov 2015 23:02:25 +0200
Message-ID: <83oaegtqxq.fsf@gnu.org>
References: <83lh9lx6oi.fsf@gnu.org> <87egfdant7.fsf@gmx.us>
	<upzctwo9alup.fsf@dod.no> <E1a1xoH-0004xa-TP@fencepost.gnu.org>
	<83h9k8vig7.fsf@gnu.org>
	<CADkQgvtg=vfPM0Xk9CGmc3TFAVasAU4qKtqxjz=tf4ss5u+oXA@mail.gmail.com>
Reply-To: Eli Zaretskii <eliz@gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: QUOTED-PRINTABLE
X-Trace: ger.gmane.org 1448571778 21852 80.91.229.3 (26 Nov 2015 21:02:58 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Thu, 26 Nov 2015 21:02:58 +0000 (UTC)
Cc: sb@dod.no, rms@gnu.org, emacs-devel@gnu.org
To: Per =?utf-8?Q?Starb=C3=A4ck?= <per@starback.se>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Nov 26 22:02:47 2015
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1a23gl-0003bJ-83
	for ged-emacs-devel@m.gmane.org; Thu, 26 Nov 2015 22:02:47 +0100
Original-Received: from localhost ([::1]:53149 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1a23gn-00013S-Cf
	for ged-emacs-devel@m.gmane.org; Thu, 26 Nov 2015 16:02:49 -0500
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:45097)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1a23gk-00013N-AE
	for emacs-devel@gnu.org; Thu, 26 Nov 2015 16:02:47 -0500
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1a23gh-0000ao-3G
	for emacs-devel@gnu.org; Thu, 26 Nov 2015 16:02:46 -0500
Original-Received: from mtaout20.012.net.il ([80.179.55.166]:50923)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@gnu.org>)
	id 1a23gg-0000ae-Rz; Thu, 26 Nov 2015 16:02:43 -0500
Original-Received: from conversion-daemon.a-mtaout20.012.net.il by
	a-mtaout20.012.net.il (HyperSendmail v2007.08) id
	<0NYF00A00X4HBJ00@a-mtaout20.012.net.il>;
	Thu, 26 Nov 2015 23:02:41 +0200 (IST)
Original-Received: from HOME-C4E4A596F7 ([84.94.185.246]) by a-mtaout20.012.net.il
	(HyperSendmail v2007.08) with ESMTPA id
	<0NYF00AQOX4G3V60@a-mtaout20.012.net.il>;
	Thu, 26 Nov 2015 23:02:41 +0200 (IST)
In-reply-to: <CADkQgvtg=vfPM0Xk9CGmc3TFAVasAU4qKtqxjz=tf4ss5u+oXA@mail.gmail.com>
X-012-Sender: halo1@inter.net.il
X-detected-operating-system: by eggs.gnu.org: Solaris 10
X-Received-From: 80.179.55.166
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:195317
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/195317>

> Date: Thu, 26 Nov 2015 21:46:49 +0100
> From: Per Starb=C3=A4ck <per@starback.se>
> Cc: rms@gnu.org, Eli Zaretskii <eliz@gnu.org>, sb@dod.no
>=20
> >  It cannot be US English, since it
> > includes characters not in that language, and can easily include
> > Turkish words.  Or consider the etc/HELLO file.
>=20
> I don't understand at all what you are saying here. Yes, of course
> Turkish words (and any character) can be in an English text. That
> doesn't make it false that it is in English. Do you just mean that =
it
> can be hard do determine the language of a text automatically?

So you will sort Turkish words in an otherwise English text according
to English rules?  And spell-check them using an English dictionary?
I don't think so.

A language attribute is something that should control how certain
linguistic operations are tailored.  You cannot use one language's
rules with words from another language.

So saying that an email message that is mostly in English, but
includes words and phrases from another language, is in English is no=
t
useful, at least for handling the non-English parts of that message.

And what about etc/HELLO? what language is it in?  There are more
non-English words there than English words, and no language in
particular can claim it has the majority of the words, or even too
many to count as "many".  How do we treat such buffers? what rules of
character folding do we apply there?

> > We could probably have a text property which will specify the
> > language, but we don't have good means to set such a property.  I=
OW,
> > where that information would come from?
>=20
> I don't envision a text property, but just a value for the buffer,
> because it is much easier and good enough for most things. Yes, the=
re
> are situations where you might want to differentiate it like that, =
but
> that goes for other things we have in modes as well. (It would
> sometimes be nice to get Javascript mode for part of an HTML file
> etc.)

Having Javascript in HTML just makes it highlighted wrongly.  That's
aesthetically bad (and there's a todo item to solve that problem), bu=
t
that's not fatal.  Trying to treat a word in Japanese according to
Latin rules is much worse.

So I think a per-buffer language attribute is the wrong way to go.  W=
e
need a finer granularity.