From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Emanuel Berg Newsgroups: gmane.emacs.devel Subject: Re: auto-detect multiple languages -- ispell-detect.el Date: Sun, 04 Aug 2024 08:14:10 +0200 Message-ID: <87h6c0stl9.fsf@dataswamp.org> References: <87y15h7ppj.fsf@dataswamp.org> <87wmkzl56a.fsf@no.lan> <878qxdivq1.fsf@dataswamp.org> <87zfpth6l6.fsf@dataswamp.org> <87v80hgyzy.fsf@dataswamp.org> <86r0b4ub5z.fsf@gnu.org> <87mslssvks.fsf@dataswamp.org> <86plqou98v.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="35502"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) To: emacs-devel@gnu.org Cancel-Lock: sha1:OFPWZmRDgA9eBSfNOWZ4+W7SUl0= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Aug 04 08:24:20 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1saUfE-000974-Qp for ged-emacs-devel@m.gmane-mx.org; Sun, 04 Aug 2024 08:24:20 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1saUeS-0006bp-ST; Sun, 04 Aug 2024 02:23:33 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1saUVf-0004Kw-7H for emacs-devel@gnu.org; Sun, 04 Aug 2024 02:14:27 -0400 Original-Received: from ciao.gmane.io ([116.202.254.214]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1saUVd-00046P-9h for emacs-devel@gnu.org; Sun, 04 Aug 2024 02:14:26 -0400 Original-Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1saUVb-0000cu-8Q for emacs-devel@gnu.org; Sun, 04 Aug 2024 08:14:23 +0200 X-Injected-Via-Gmane: http://gmane.org/ Mail-Followup-To: emacs-devel@gnu.org Mail-Copies-To: never Received-SPF: pass client-ip=116.202.254.214; envelope-from=ged-emacs-devel@m.gmane-mx.org; helo=ciao.gmane.io X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Sun, 04 Aug 2024 02:23:30 -0400 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:322327 Archived-At: Eli Zaretskii wrote: >> (setq ispell-silently-savep t) > > That just sweeps the problem under the carpet, because users > shouldn't need to save the personal dictionary more than > once in a session. Here, "the problem" is spelling multiple languages in the same buffer. >> How does Hunspell know vad som är vilket språk if they >> are intermixed? > > It checks the words in all the dictionaries. Yes, that's the only way to do it? You can do that with ispell, it is what I did in ispell-detect.el [1] however that, `ispell-lookup-words', doesn't use the regular ispell 'i-' dictionaries "iamerican-insane" and so on, but the 'w-' wordlists typically installed in /usr/share/dict/ - files that are uncompressed any without metadata. You can of course uncompress the ispell files and run some sed, maybe sed -e 's/\/.*$//g' $src > $dst to remove the metadata but naah, I think ispell should be more than capable doing this all by itself using its own dictionaries. Ispell's code is a complete jungle, I'm not digging into it. I can see in one second it is written in a style that is, well, not mine. I can spell individual words and get the result back, with this in another file I have [2], this solved a cute little problem [3]. That code is obviously a joke, here we see Emacs harmful "do everything in a buffer" policy taken to a degree where a big program cannot even do its own most basic task programmatically. The word first has to be inserted! (Perhaps this has changed since.) If there was a better such function one could maybe consider the Hunspell approach for ispell but if one did this for every word it is out of the question to reload dictionary all the time. (defun spell-word (word) (with-temp-buffer (save-excursion (insert word) ) (condition-case nil (not (ispell-word)) (error nil) ))) ;; (spell-word "length") ; t ;; (spell-word "lenght") ; nil Anyway, maybe this should be dropped then and we should recommend people who want this to use Hunspell? [1] https://dataswamp.org/~incal/emacs-init/ispell-detect.el [2] https://dataswamp.org/~incal/emacs-init/spell.el [3] https://dataswamp.org/~incal/emacs-init/perm.el -- underground experts united https://dataswamp.org/~incal