From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Titus von der Malsburg Newsgroups: gmane.emacs.devel Subject: Re: Changing dictionary while flyspell-buffer is running Date: Fri, 22 Feb 2019 10:57:29 +0100 Message-ID: <87a7iocfhy.fsf@posteo.de> References: <874l8ztmgk.fsf@posteo.de> <838sy9hkgk.fsf@gnu.org> <87k1htsfpa.fsf@posteo.de> <83zhqpfb0t.fsf@gnu.org> <87ftsh3p42.fsf@fastmail.fm> <83mumogaza.fsf@gnu.org> <87ef80dekm.fsf@posteo.de> <83lg28fgdg.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="243065"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: mu4e 1.1.0; emacs 26.1.91 Cc: joostkremers@fastmail.fm, rms@gnu.org, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Feb 22 11:04:35 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1gx7h4-00115H-JD for ged-emacs-devel@m.gmane.org; Fri, 22 Feb 2019 11:04:34 +0100 Original-Received: from localhost ([127.0.0.1]:47958 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gx7h3-0005Fv-Go for ged-emacs-devel@m.gmane.org; Fri, 22 Feb 2019 05:04:33 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:43420) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gx7aK-0000Q2-W3 for emacs-devel@gnu.org; Fri, 22 Feb 2019 04:57:38 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gx7aJ-0001Ze-Ch for emacs-devel@gnu.org; Fri, 22 Feb 2019 04:57:36 -0500 Original-Received: from mout01.posteo.de ([185.67.36.65]:51878) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gx7aI-0001Sd-8M for emacs-devel@gnu.org; Fri, 22 Feb 2019 04:57:35 -0500 Original-Received: from submission (posteo.de [89.146.220.130]) by mout01.posteo.de (Postfix) with ESMTPS id 01B75160062 for ; Fri, 22 Feb 2019 10:57:30 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.de; s=2017; t=1550829451; bh=qrvZWauvwz8MIeRms7Tp1/NoQeV83o2d062UP/5Ry8g=; h=From:To:Cc:Subject:Date:From; b=ge4BJBJWyBweUhy1F7VTuLkKTqeQEobPG4C+9D7zkqL4DsUwNL3bewIBIeYtbMoAL 75Tt5YxEGffuzO208rzgzb4avHd0GzxW3wpaOiI6vU0ruG+FxsPjTgSB9nql8eX1ca EmTqTTPZxK4OQ2yXaLE4YgO0QV9HDoRNS014NR3HkgA/+th5zEyhFNUhoDPce9IbNp hPegJl5ZDsIpGIceROEnpjyZeNzs41VKSgNkeBBSnGRyMgMrN+XN0tu1GJ8jg/KqBh oP0P0ckf2LGwX5ecUQxbNrA6XZR+rwV3gQSOjFfuyUDUNgKN+fK+xcmE2NZp5/vCUS SuAV8V0e9c9Pg== Original-Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 445RbG0vb1z6tmL; Fri, 22 Feb 2019 10:57:29 +0100 (CET) In-reply-to: <83lg28fgdg.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 185.67.36.65 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:233530 Archived-At: On 2019-02-22 Fri 08:10, Eli Zaretskii wrote: >> From: Titus von der Malsburg >> Cc: Joost Kremers , emacs-devel@gnu.org, rms@g= nu.org >> Date: Thu, 21 Feb 2019 22:19:53 +0100 >> >> > It needs at least 30 letters to guess right, which is quite a few. >> >> The number of letters depends on the configured languages, it could be >> less than 30 when the scripts are different but for English, Dutch, >> and German 30 works well in my experience and languages don=E2=80=99t ge= t much >> more similar than that (except if you want to distinguish between US >> English and UK English). > > The minimum number also depends on the expected reliability of > language detection, of course. Of course. I should say that I didn=E2=80=99t come up with the algorithm. = It=E2=80=99s a standard approach to language detection used in many contexts. Its selling points are high accuracy, low computational complexity, and that only a small amount of language data is required. For most languages, we need only 1.2Kb of data. [More below.] >> I just tried it and noticed one downside: Flyspell offers possible >> corrections for unknown words and when multiple languages are >> configured, these suggestions come from all configured dictionaries. > > Of course, but what would you expect? I would expect to get only suggestions from the language that I=E2=80=99m currently typing in. > And how is that a downside? If I have to pick the correct word from a list that contains many irrelevant words, it will take more time. The suggestions are just less relevant on average. > Hunspell doesn't try to guess the language at all, it just looks in > all loaded dictionaries one by one. That=E2=80=99s the problem. :) >> Many of them are of course not relevant because they are not in the >> language of the paragraph. > > There's no "language of the paragraph" in this method, you can freely > mix words from different languages in the same paragraph. There are > important use cases for that, like editing a message translation > catalog or text that that explains in-line the meaning of words in > another language. If the use case is working with paragraphs that mix languages, the user is free to use Hunspell. However, there is the other use case, the one that I=E2=80=99m interested in, where the document contains whole paragraphs each in its own language. Plus the use case where the document is in just one language and I don=E2=80=99t want the spell-checker to suggest wor= ds in some other language. Note, that automatic language detection has other applications beyond changing dictionaries for spell checkers. For instance, it allows to automatically switch the voice used by the Festival speech synthesizer, which is useful for blind people working with text in multiple languages. It can also switch the typographical conventions used by type-mode (e.g., use quote symbols that are appropriate for the current language). It could also switch the language of dictionaries, thesauri, and text completion packages such as company-ngrams. >> Flyspell also has an autocorrection feature (which I=E2=80=99m not using) >> and this feature would also largely stop being useful with multiple >> dictionaries. > > It will only become less useful if the first correction is off in a > significant number of cases. Which is not at all expected, certainly > not when each language uses a different script. > >> I think that this makes the Hunspell solution less appealing. > > I think you are slightly biased ;-). As am I, most probably. Both > solutions have their advantages and disadvantages, and the user should > choose which one better suits his/her needs in each case. Exactly, and that=E2=80=99s why I never said that people should be prevented from using Hunspell with multiple dictionaries if that=E2=80=99s the best solution for them. :) > I mentioned Hunspell because I think few people even know about this > feature, which is quite unique among spellers supported by Emacs. That is true. I certainly didn=E2=80=99t know about that feature. Hunspel= l is fairly impressive, especially for languages like German that can freely compose new words. Following this conversation, I might actually switch. In sum, I don=E2=80=99t want to push my package to anyone. I said I would = be happy to contribute it to Emacs/Elpa /if/ there is interest. But I=E2=80= =99m perfectly happy with keeping it in Melpa where it currently lives. Regarding my initial question: I had a closer look at how flyspell-buffer w= orks internally and I=E2=80=99m afraid there is no easy way to make it swit= ch languages half-way through the document. The hook for incorrect words i= s called only when the spell-checker has already finished its work. It wil= l be necessary to write a new function that processes the document paragrap= h by paragraph. Thanks for all the suggestions. Titus -- Dr. Titus von der Malsburg Department of Linguistics University of Potsdam, Germany https://tmalsburg.github.io