From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Questions about isearch Date: Thu, 26 Nov 2015 23:02:25 +0200 Message-ID: <83oaegtqxq.fsf@gnu.org> References: <83lh9lx6oi.fsf@gnu.org> <87egfdant7.fsf@gmx.us> <83h9k8vig7.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE X-Trace: ger.gmane.org 1448571778 21852 80.91.229.3 (26 Nov 2015 21:02:58 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 26 Nov 2015 21:02:58 +0000 (UTC) Cc: sb@dod.no, rms@gnu.org, emacs-devel@gnu.org To: Per =?utf-8?Q?Starb=C3=A4ck?= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Nov 26 22:02:47 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1a23gl-0003bJ-83 for ged-emacs-devel@m.gmane.org; Thu, 26 Nov 2015 22:02:47 +0100 Original-Received: from localhost ([::1]:53149 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a23gn-00013S-Cf for ged-emacs-devel@m.gmane.org; Thu, 26 Nov 2015 16:02:49 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:45097) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a23gk-00013N-AE for emacs-devel@gnu.org; Thu, 26 Nov 2015 16:02:47 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a23gh-0000ao-3G for emacs-devel@gnu.org; Thu, 26 Nov 2015 16:02:46 -0500 Original-Received: from mtaout20.012.net.il ([80.179.55.166]:50923) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a23gg-0000ae-Rz; Thu, 26 Nov 2015 16:02:43 -0500 Original-Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0NYF00A00X4HBJ00@a-mtaout20.012.net.il>; Thu, 26 Nov 2015 23:02:41 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([84.94.185.246]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NYF00AQOX4G3V60@a-mtaout20.012.net.il>; Thu, 26 Nov 2015 23:02:41 +0200 (IST) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-Received-From: 80.179.55.166 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:195317 Archived-At: > Date: Thu, 26 Nov 2015 21:46:49 +0100 > From: Per Starb=C3=A4ck > Cc: rms@gnu.org, Eli Zaretskii , sb@dod.no >=20 > > It cannot be US English, since it > > includes characters not in that language, and can easily include > > Turkish words. Or consider the etc/HELLO file. >=20 > I don't understand at all what you are saying here. Yes, of course > Turkish words (and any character) can be in an English text. That > doesn't make it false that it is in English. Do you just mean that = it > can be hard do determine the language of a text automatically? So you will sort Turkish words in an otherwise English text according to English rules? And spell-check them using an English dictionary? I don't think so. A language attribute is something that should control how certain linguistic operations are tailored. You cannot use one language's rules with words from another language. So saying that an email message that is mostly in English, but includes words and phrases from another language, is in English is no= t useful, at least for handling the non-English parts of that message. And what about etc/HELLO? what language is it in? There are more non-English words there than English words, and no language in particular can claim it has the majority of the words, or even too many to count as "many". How do we treat such buffers? what rules of character folding do we apply there? > > We could probably have a text property which will specify the > > language, but we don't have good means to set such a property. I= OW, > > where that information would come from? >=20 > I don't envision a text property, but just a value for the buffer, > because it is much easier and good enough for most things. Yes, the= re > are situations where you might want to differentiate it like that, = but > that goes for other things we have in modes as well. (It would > sometimes be nice to get Javascript mode for part of an HTML file > etc.) Having Javascript in HTML just makes it highlighted wrongly. That's aesthetically bad (and there's a todo item to solve that problem), bu= t that's not fatal. Trying to treat a word in Japanese according to Latin rules is much worse. So I think a per-buffer language attribute is the wrong way to go. W= e need a finer granularity.