From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Emanuel Berg Newsgroups: gmane.emacs.devel Subject: Re: auto-detect multiple languages -- ispell-detect.el Date: Sun, 04 Aug 2024 13:03:45 +0200 Message-ID: <87zfpsr1m6.fsf@dataswamp.org> References: <87y15h7ppj.fsf@dataswamp.org> <87wmkzl56a.fsf@no.lan> <878qxdivq1.fsf@dataswamp.org> <87zfpth6l6.fsf@dataswamp.org> <87v80hgyzy.fsf@dataswamp.org> <86r0b4ub5z.fsf@gnu.org> <87mslssvks.fsf@dataswamp.org> <86plqou98v.fsf@gnu.org> <87h6c0stl9.fsf@dataswamp.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="9455"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) To: emacs-devel@gnu.org Cancel-Lock: sha1:GLCuM/i5UkhNj3BoIWGP9jIJ56E= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Aug 04 13:16:49 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1saZEG-0002K9-Sf for ged-emacs-devel@m.gmane-mx.org; Sun, 04 Aug 2024 13:16:48 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1saZDY-0000ea-Cj; Sun, 04 Aug 2024 07:16:04 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1saZ1x-00078S-1d for emacs-devel@gnu.org; Sun, 04 Aug 2024 07:04:05 -0400 Original-Received: from ciao.gmane.io ([116.202.254.214]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1saZ1s-00057T-MT for emacs-devel@gnu.org; Sun, 04 Aug 2024 07:04:04 -0400 Original-Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1saZ1q-0001qa-VR for emacs-devel@gnu.org; Sun, 04 Aug 2024 13:03:58 +0200 X-Injected-Via-Gmane: http://gmane.org/ Mail-Followup-To: emacs-devel@gnu.org Mail-Copies-To: never Received-SPF: pass client-ip=116.202.254.214; envelope-from=ged-emacs-devel@m.gmane-mx.org; helo=ciao.gmane.io X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Sun, 04 Aug 2024 07:15:58 -0400 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:322337 Archived-At: > Ispell's code is a complete jungle, I'm not digging into it. > I can see in one second it is written in a style that is, > well, not mine. I was curious to see, if Hunspell is much better and more advanced, maybe that would show itself in modern code as well? So I looked in ispell.el and made a search for "hunspell". I yank it here. Notes: - ispell.el is 4 323 lines long. (Those are lines 1081-1205.) - (how-many "\\Wcl-" (point-min)) gives 12 hits: one for cl-lib, 11 for `cl-pushnew'. - (how-many "\\Wpcase" (point-min)) gives a single hit, for `pcase'. This experiment was carried out on: GNU Emacs 30.0.50 (build 1, x86_64-pc-linux-gnu, cairo version 1.16.0) of 2024-04-01 [commit a5fbb652ed3614d6735015551564f32b80e42c53] (defun ispell-find-hunspell-dictionaries (&optional dictionary) "Look for installed Hunspell dictionaries. Will initialize `ispell-hunspell-dictionary-alist' according to dictionaries found, and will remove aliases from the list in `ispell-dicts-name2locale-equivs-alist' if an explicit dictionary from that list was found. If DICTIONARY, check for that dictionary explicitly." (let ((hunspell-found-dicts (seq-filter (lambda (str) (when (string-match ;; Hunspell gives this error when there is some ;; installation problem, for example if $LANG is unset. (concat "^Can't open affix or dictionary files " "for dictionary named \"default\".$") str) (user-error "Hunspell error (is $LANG unset?): %s" str)) (file-name-absolute-p str)) (split-string (with-temp-buffer (apply #'ispell-call-process ispell-program-name nil t nil `("-D" ,@(and dictionary (list "-d" dictionary)) ;; Use -a to prevent Hunspell from trying to ;; initialize its curses/termcap UI, which ;; causes it to crash or fail to start in some ;; MS-Windows ports. "-a" ;; Hunspell 1.7.0 (and later?) won't show LOADED ;; DICTIONARY unless there's at least one file ;; argument on the command line. So we feed it ;; with the null device. ,null-device)) (buffer-string)) "[\n\r]+" t))) hunspell-default-dict hunspell-default-dict-entry hunspell-multi-dict) (dolist (dict hunspell-found-dicts) (let* ((full-name (file-name-nondirectory dict)) (basename (file-name-sans-extension full-name)) (affix-file (concat dict ".aff"))) (if (string-match "\\.aff$" dict) ;; Found default dictionary (progn (if hunspell-default-dict (setq hunspell-multi-dict (concat (or hunspell-multi-dict (car hunspell-default-dict)) "," basename)) (setq affix-file dict) ;; FIXME: The cdr of the list we cons below is never ;; used. Why do we need a list? (setq hunspell-default-dict (list basename affix-file))) (ispell-print-if-debug "++ ispell-fhd: default dict-entry:%s name:%s basename:%s\n" dict full-name basename)) (if (and (not (assoc basename ispell-hunspell-dict-paths-alist)) (file-exists-p affix-file)) ;; Entry has an associated .aff file and no previous value. (let ((affix-file (expand-file-name affix-file))) (ispell-print-if-debug "++ ispell-fhd: dict-entry:%s name:%s basename:%s affix-file:%s\n" dict full-name basename affix-file) (cl-pushnew (list basename affix-file) ispell-hunspell-dict-paths-alist :test #'equal)) (ispell-print-if-debug "-- ispell-fhd: Skipping entry: %s\n" dict))))) ;; Remove entry from aliases alist if explicit dict was found. (let (newlist) (dolist (dict ispell-dicts-name2locale-equivs-alist) (if (assoc (car dict) ispell-hunspell-dict-paths-alist) (ispell-print-if-debug "-- ispell-fhd: Excluding %s alias. Standalone dict found.\n" (car dict)) (cl-pushnew dict newlist :test #'equal))) (setq ispell-dicts-name2locale-equivs-alist newlist)) ;; Add known hunspell aliases (dolist (dict-equiv ispell-dicts-name2locale-equivs-alist) (let ((dict-equiv-key (car dict-equiv)) (dict-equiv-value (cadr dict-equiv)) (exclude-aliases (list ;; Exclude TeX aliases "esperanto-tex" "francais7" "francais-tex" "norsk7-tex"))) (if (and (assoc dict-equiv-value ispell-hunspell-dict-paths-alist) (not (assoc dict-equiv-key ispell-hunspell-dict-paths-alist)) (not (member dict-equiv-key exclude-aliases))) (let ((affix-file (cadr (assoc dict-equiv-value ispell-hunspell-dict-paths-alist)))) (ispell-print-if-debug "++ ispell-fhd: Adding alias %s -> %s.\n" dict-equiv-key affix-file) (cl-pushnew (list dict-equiv-key affix-file) ispell-hunspell-dict-paths-alist :test #'equal))))) ;; Parse and set values for default dictionary. (setq hunspell-default-dict (or hunspell-multi-dict (car hunspell-default-dict))) ;; If we didn't find a dictionary based on the environment (i.e., ;; the locale and the DICTIONARY variable), try again if ;; `ispell-dictionary' is set. (when (and (not hunspell-default-dict) (not dictionary) ispell-dictionary) (setq hunspell-default-dict (ispell-find-hunspell-dictionaries ispell-dictionary))) ;; If hunspell-default-dict is nil, ispell-parse-hunspell-affix-file ;; will barf with an error message that doesn't help users figure ;; out what is wrong. Produce an error message that points to the ;; root cause of the problem. (unless hunspell-default-dict (error "Can't find Hunspell dictionary with a .aff affix file")) (setq hunspell-default-dict-entry (ispell-parse-hunspell-affix-file hunspell-default-dict)) ;; Create an alist of found dicts with only names, except for default dict. (setq ispell-hunspell-dictionary-alist (list (cons nil (cdr hunspell-default-dict-entry)))) (dolist (dict (mapcar #'car ispell-hunspell-dict-paths-alist)) (cl-pushnew (if (string= dict hunspell-default-dict) hunspell-default-dict-entry (list dict)) ispell-hunspell-dictionary-alist :test #'equal)) hunspell-default-dict)) -- underground experts united https://dataswamp.org/~incal