From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: =?UTF-8?Q?Per_Starb=C3=A4ck?= Newsgroups: gmane.emacs.devel Subject: Re: Questions about isearch Date: Thu, 26 Nov 2015 21:46:49 +0100 Message-ID: References: <83lh9lx6oi.fsf@gnu.org> <87egfdant7.fsf@gmx.us> <83h9k8vig7.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1448570827 7596 80.91.229.3 (26 Nov 2015 20:47:07 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 26 Nov 2015 20:47:07 +0000 (UTC) Cc: Eli Zaretskii , sb@dod.no, rms@gnu.org To: "emacs-devel@gnu.org" Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Nov 26 21:46:59 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1a23RR-0007Ua-Fc for ged-emacs-devel@m.gmane.org; Thu, 26 Nov 2015 21:46:57 +0100 Original-Received: from localhost ([::1]:53115 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a23RS-000552-SU for ged-emacs-devel@m.gmane.org; Thu, 26 Nov 2015 15:46:58 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:41683) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a23RP-00054x-9R for emacs-devel@gnu.org; Thu, 26 Nov 2015 15:46:56 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a23RO-0005Bf-5v for emacs-devel@gnu.org; Thu, 26 Nov 2015 15:46:55 -0500 Original-Received: from mail-vk0-x232.google.com ([2607:f8b0:400c:c05::232]:36012) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a23RK-00059V-Rk; Thu, 26 Nov 2015 15:46:50 -0500 Original-Received: by vkay187 with SMTP id y187so58687643vka.3; Thu, 26 Nov 2015 12:46:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=Nohxcey4XgBYRN0dUaBeYPK1v5rAK/DGDWRRa/alD0Y=; b=ad2+0coU/IS71ppgVs5W6lrCGMwDxsIF2AR6OlsTaVmXycGT8adP2Q3dqQgJHakhCW HyIGA1XjSqn0zh16NnypqbttPYTz2XmZiJO3PNOuLYgbuxmLxyuES/dcrKHbWgI1s7Kr EqOeud9itRtGxSNIrvSO2L1J/DL9B5qXMM2ObnKKh0QPOXBy1d8qTUcPPyzvIRHP4G00 41udix94OabJOmDqUARHCO23y1jZ7E0QiCzaXcVTLz2m37kT8f6SnSo/llyzGdtjqCO+ DLPSwXAaBJkv74pnOkfm8DghpDpiioyylW6wk75s5JszXksXI6j4Jbzhza/2pE7JxaFn tKdg== X-Received: by 10.31.52.211 with SMTP id b202mr39723527vka.82.1448570810054; Thu, 26 Nov 2015 12:46:50 -0800 (PST) Original-Received: by 10.31.54.197 with HTTP; Thu, 26 Nov 2015 12:46:49 -0800 (PST) In-Reply-To: <83h9k8vig7.fsf@gnu.org> X-Google-Sender-Auth: ij0rEDkEmNCHyMLPYuacjQD46qA X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2607:f8b0:400c:c05::232 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:195316 Archived-At: > IMO, it is more important to have language-independent matching in > Emacs. Language-specific rules are also needed in some situations, > but they are secondary for Emacs. > >> It seems to me that we want to introduce a concept of current language Yes! The language of a buffer is something I have wished for a long long time, probably using minor modes. It has primarily been to have the correct ispell dictionary and to have different abbrevs depending on language. With the new search folding it is much more needed. > It's a problematic concept for Emacs, which is a multi-lingual > environment. For example, what is the "current language" of the > buffer showing this message? It's in English. > It cannot be US English, since it > includes characters not in that language, and can easily include > Turkish words. Or consider the etc/HELLO file. I don't understand at all what you are saying here. Yes, of course Turkish words (and any character) can be in an English text. That doesn't make it false that it is in English. Do you just mean that it can be hard do determine the language of a text automatically? > We could probably have a text property which will specify the > language, but we don't have good means to set such a property. IOW, > where that information would come from? I don't envision a text property, but just a value for the buffer, because it is much easier and good enough for most things. Yes, there are situations where you might want to differentiate it like that, but that goes for other things we have in modes as well. (It would sometimes be nice to get Javascript mode for part of an HTML file etc.) So from where do we get it? Normally from the user. Many users mostly write in a few languages, like Swedish and English to take myself as an example. What I want is an indication "en" or "sv" somewhere in the information line and commands to toggle between my favourite languages. Sometimes it can be determined automatically. For example when opening a html file Emacs could look at the "lang" attribute, in a LaTeX file it could see how you use packages like Babel or Polyglossia. And in any text file various methods (like n-gram frequencies) can be used to try to identify the language automatically. I think the focus should be on buffers being able to have a (natural) language, and commands to change that. It would be quite sufficient with: * a setting listing what languages I normally want to use (the first one being the default) * a cycling command that sets the language to the next in that list (that is a toggle when you have a two-list) * a command to explicitly set any valid value Anything else can be done a lot later, and as experiments outside of the core. Automatic detection is neat, but not really needed. And exactly what changes the different languages need to do will be determined part by part by time in different language communities. The important thing is that there is some hook to hang your code on. * Why it is so important, now with the new search folding * For Scandinavians it is really important, because (with Swedish as example) =C3=A5=C3=A4=C3=B6 are really totally their own letters in the Swe= dish alphabet, regardless of their historic origin. To have a search for "varpa" in a Swedish text find "v=C3=A4rpa" or "varp=C3=A5" would be just w= rong. It would give a strong impression of this being an American program not meant to be used for Swedish. An analogue would be finding "jamb" when looking for "iamb" in English, where I and J are totally different letters, even though they originally (in Latin) were the same. Or you start an isearch for "valid" and after the first four letters you are inside "dualism". (U and V also were the same letter originally.) Confusing and irritating, and something to make people turn off this search folding which would be sad, because it's a nice thing to have.