From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Agustin Martin Newsgroups: gmane.emacs.bugs Subject: bug#13639: [emacs] ispell.el: hunspell dicts autodetection under Emacs. Date: Wed, 20 Feb 2013 18:50:45 +0100 Message-ID: <20130220175045.GA20958@agmartin.aq.upm.es> References: <20130116122509.GA2209@omega.in.herr-schmitt.de> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="ew6BAiZeqk4r7MaW" X-Trace: ger.gmane.org 1361382720 16041 80.91.229.3 (20 Feb 2013 17:52:00 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 20 Feb 2013 17:52:00 +0000 (UTC) To: 13639@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Wed Feb 20 18:52:23 2013 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1U8DqA-0001Xa-L2 for geb-bug-gnu-emacs@m.gmane.org; Wed, 20 Feb 2013 18:52:22 +0100 Original-Received: from localhost ([::1]:38858 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U8Dpq-0007w3-9k for geb-bug-gnu-emacs@m.gmane.org; Wed, 20 Feb 2013 12:52:02 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:43381) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U8Dpg-0007tW-A2 for bug-gnu-emacs@gnu.org; Wed, 20 Feb 2013 12:51:59 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1U8Dpe-0008BN-LV for bug-gnu-emacs@gnu.org; Wed, 20 Feb 2013 12:51:52 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:34972) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U8Dpe-0008B0-I5 for bug-gnu-emacs@gnu.org; Wed, 20 Feb 2013 12:51:50 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.72) (envelope-from ) id 1U8Dqn-0005Fk-O5 for bug-gnu-emacs@gnu.org; Wed, 20 Feb 2013 12:53:01 -0500 X-Loop: help-debbugs@gnu.org In-Reply-To: <20130116122509.GA2209@omega.in.herr-schmitt.de> Resent-From: Agustin Martin Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 20 Feb 2013 17:53:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13639 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: pending Original-Received: via spool by 13639-submit@debbugs.gnu.org id=B13639.136138272520123 (code B ref 13639); Wed, 20 Feb 2013 17:53:01 +0000 Original-Received: (at 13639) by debbugs.gnu.org; 20 Feb 2013 17:52:05 +0000 Original-Received: from localhost ([127.0.0.1]:40435 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1U8Dps-0005EV-DQ for submit@debbugs.gnu.org; Wed, 20 Feb 2013 12:52:04 -0500 Original-Received: from edison.ccupm.upm.es ([138.100.198.71]:49961 helo=smtp.upm.es) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1U8Dpm-0005E2-SW for 13639@debbugs.gnu.org; Wed, 20 Feb 2013 12:52:01 -0500 Original-Received: from agmartin.aq.upm.es (Agmartin.aq.upm.es [138.100.41.131]) by smtp.upm.es (8.14.3/8.14.3/edison-001) with ESMTP id r1KHojtQ020047; Wed, 20 Feb 2013 18:50:45 +0100 Original-Received: by agmartin.aq.upm.es (Postfix, from userid 1000) id 724127CC; Wed, 20 Feb 2013 18:50:45 +0100 (CET) Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:71563 Archived-At: --ew6BAiZeqk4r7MaW Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Thu, Jan 17, 2013 at 09:36:09PM +0200, Eli Zaretskii wrote: > > On Thu, Jan 17, 2013 at 08:42:58PM +0200, Eli Zaretskii wrote: > > > > Date: Thu, 17 Jan 2013 19:12:34 +0100 > > > > From: Agustin Martin > > > > > > > > Sorry, I should have written WORDCHARS. > > > > > > Why do we need that? > > > > This is what ispell.el calls otherchars. Parsing WORDCHARS ensures that > > both > > hunspell and ispell.el think about the same characters in that category. > > I think you are mistaken, that's not my reading of hunspell(4). Sorry for the late reply, (Opening a new thread specifically about hunspell dicts autodetection and using new cloned bugreport #13639 specific about this) Although WORDCHARS description in hunspell(4) WORDCHARS characters WORDCHARS extends tokenizer of Hunspell command line interface with additional word character. For example, dot, dash, n-dash, numbers, percent sign are word character in Hungarian. is too hungarian biassed and does not mention usual apostrophe AFAIK it mostly refers to the same as 'otherchars', although hunspell may accept that in locations not in the middle of a word. The good news are that I started working on hunspell dicts autodetection. For those curious I am attaching my initial test suite. I am currently integrating this into ispell.el (unfortunately slowly due to time constraints) -- Agustin --ew6BAiZeqk4r7MaW Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="hunspell-autodetect.el" (require 'ispell) (setq ispell-debug t) (setq ispell-program-name "hunspell") (setq ispell-hunspell-dict-paths-alist nil) (setq ispell-hunspell-dictionary-alist nil) (defun ispell-print-if-debug (string) "" (if ispell-debug (message "%s" string))) (defun ispell-replace-dictionary-entry (dicts-alist new-entry) "Replace old entry in `DICTS-ALIST' with `NEW-ENTRY'. Mostly intended to play with `ispell-dictionary-alist' and friends." (let (newlist) (dolist (entry dicts-alist) (if (string= (car new-entry) (car entry)) (add-to-list 'newlist new-entry) (add-to-list 'newlist entry))) newlist)) (defun ispell-parse-hunspell-affix-file (dict-name) "Parse hunspell affix file for `dict-name'. Return a list in `ispell-dictionary-alist' format." (let* ((path (cadr (assoc dict-name ispell-hunspell-dict-paths-alist))) (affix-file (concat path dict-name ".aff"))) (unless path (error "No matching entry for %s" dict-name)) (if (file-exists-p affix-file) (with-temp-buffer (insert-file-contents affix-file) (let (otherchars-string otherchars-list) (setq otherchars-string (save-excursion (beginning-of-buffer) (if (search-forward-regexp "^WORDCHARS +" nil t ) (buffer-substring (point) (progn (end-of-line) (point)))))) ;; Remove trailing whitespace and extra stuff. Make list if non-nil. (setq otherchars-list (if otherchars-string (split-string (if (string-match " +.*$" otherchars-string) (replace-match "" nil nil otherchars-string) otherchars-string) "" t))) ;; Fill dict entry (list dict-name "[[:alpha:]]" "[^[:alpha:]]" (if otherchars-list (regexp-opt otherchars-list) "") t ;; many-otherchars-p: We can't tell, set to t (list "-d" dict-name) nil ;; extended-char-mode: not supported by hunspell 'utf-8))) (error "File \"%s\" not found" affix-file)))) (defun ispell-find-hunspell-dictionaries () "Parse installed hunspell dictionaries." (let ((hunspell-found-dicts (split-string (with-temp-buffer (ispell-call-process ispell-program-name null-device t nil "-D") (buffer-string)) "[\n\r]+" t)) hunspell-default-dict hunspell-default-dict-entry) (dolist (dict hunspell-found-dicts) (let* ((full-name (file-name-nondirectory dict)) (path (file-name-directory dict)) (basename (file-name-sans-extension full-name))) (if (string-match "\\.aff$" dict) ;; Found default dictionary (if hunspell-default-dict (error "Default dict already defined as %s. Not using %s." hunspell-default-dict dict) (setq hunspell-default-dict (list basename path))) (if (and (not (assoc basename ispell-hunspell-dict-paths-alist)) (file-exists-p (concat dict ".aff"))) ;; Entry has an associated .aff file and no previous value. (progn (ispell-print-if-debug (format "++ dict-entry:%s name:%s basename:%s path:%s aff:%s" dict full-name basename path (concat dict ".aff"))) (add-to-list 'ispell-hunspell-dict-paths-alist (list basename path))) (ispell-print-if-debug (format "-- Skipping %s" dict)))))) ;; Parse values for default dictionary. (setq hunspell-default-dict (car hunspell-default-dict)) (setq hunspell-default-dict-entry (ispell-parse-hunspell-affix-file hunspell-default-dict)) ;; Create an alist of found dicts with only names, except for default dict. (setq ispell-hunspell-dictionary-alist (list (append (list nil) (cdr hunspell-default-dict-entry)))) (dolist (dict (mapcar 'car ispell-hunspell-dict-paths-alist)) (if (string= dict hunspell-default-dict) (add-to-list 'ispell-hunspell-dictionary-alist hunspell-default-dict-entry) (add-to-list 'ispell-hunspell-dictionary-alist (list dict)))))) (ispell-find-hunspell-dictionaries) (setq mylang "en_US") (message "-- For selected language \"%s\" before: %s" mylang (assoc mylang ispell-hunspell-dictionary-alist)) (or (cadr (assoc mylang ispell-hunspell-dictionary-alist)) (let ((dict-entry (ispell-parse-hunspell-affix-file mylang))) (setq ispell-hunspell-dictionary-alist (ispell-replace-dictionary-entry ispell-hunspell-dictionary-alist dict-entry)))) (message "-- For selected language \"%s\" after: %s" mylang (assoc mylang ispell-hunspell-dictionary-alist)) --ew6BAiZeqk4r7MaW--