From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Emanuel Berg Newsgroups: gmane.emacs.devel Subject: Re: auto-detect multiple languages -- ispell-detect.el Date: Sat, 03 Aug 2024 15:25:58 +0200 Message-ID: <878qxdivq1.fsf@dataswamp.org> References: <87y15h7ppj.fsf@dataswamp.org> <87wmkzl56a.fsf@no.lan> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="9061"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) To: emacs-devel@gnu.org Cancel-Lock: sha1:xGHb36ESoZzlBDyWM110r+F6GJY= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Aug 03 17:19:04 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1saGX9-0002BG-Oz for ged-emacs-devel@m.gmane-mx.org; Sat, 03 Aug 2024 17:19:04 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1saGWD-00033a-Q8; Sat, 03 Aug 2024 11:18:06 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1saEly-00024Q-R7 for emacs-devel@gnu.org; Sat, 03 Aug 2024 09:26:14 -0400 Original-Received: from ciao.gmane.io ([116.202.254.214]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1saElx-0005nl-BX for emacs-devel@gnu.org; Sat, 03 Aug 2024 09:26:14 -0400 Original-Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1saElv-0002NG-9M for emacs-devel@gnu.org; Sat, 03 Aug 2024 15:26:11 +0200 X-Injected-Via-Gmane: http://gmane.org/ Mail-Followup-To: emacs-devel@gnu.org Mail-Copies-To: never Received-SPF: pass client-ip=116.202.254.214; envelope-from=ged-emacs-devel@m.gmane-mx.org; helo=ciao.gmane.io X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Sat, 03 Aug 2024 11:18:03 -0400 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:322309 Archived-At: Gregor Zattler wrote: > I don't use [guess-language] it, because I did not manage to > configure it to integrate wcheck-mode. Now you got me motivated ... What is the optimal solution to this problem? 1. Detection - is done _once_ for the entire material - processes _everything_, so does not rely on incomplete and error-prone polling/probing 2. Spelling - is done _once_ for each detected language - works for all Emacs spellcheckers So for (1), we need a detect-languages to return a list like this - if we for example envision a buffer with mixed English, Latvian, and Swedish: (("en" ((en-1-beg en-1-end) ... (en-n-beg en-n-end))) ("lv" ((lv-1-beg lv-1-end) ... (lv-m-beg lv-m-end))) ("sv" ((sv-1-beg sv-1-end) ... (sv-o-beg sv-o-end)))) ; [*] For (2) we need the equivalence of not `narrow-to-region' but narrow-to-regions (or virtual-buffer-from-regions or something). Then it is a matter of: (cl-loop for (lang regs) in (detect-languages) do (narrow-to-regions regs) (funcall #'spellchecker-change-dictionary lang) (funcall #'spellchecker-spell)) Tada, works for everything! \o/ So do we have detect-languages (maybe that is in `guess-language'?) and narrow-to-regions? [*] Language codes from ISO 639 set 1, https://en.wikipedia.org/wiki/List_of_ISO_639_language_codes -- underground experts united https://dataswamp.org/~incal