all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* auto-detect multiple languages -- ispell-detect.el
@ 2024-07-31 11:42 Emanuel Berg
  2024-08-02  2:53 ` Richard Stallman
  0 siblings, 1 reply; 34+ messages in thread
From: Emanuel Berg @ 2024-07-31 11:42 UTC (permalink / raw)
  To: emacs-devel

Automatic on-the-fly language detection and spelling with the
appropriate dictionary.

It works on arbitrary texts with no markup or
anything required.

Also a future release for ELPA, I hope.

;;; -*- lexical-binding: t -*-
;;
;; this file:
;;   https://dataswamp.org/~incal/emacs-init/ispell-detect.el
;;
;; Installation on Debian:
;;
;; 1. For detection, install the /usr/share/dict files from
;;    the 'w-' packages, for example 'wamerican-insane',
;;    'wfrench' and 'wswedish'.
;;
;; 2. For correction, install the ispell files from the 'i-'
;;    packages, for example 'iamerican-insane', 'ifrench-gut'
;;    and 'iswedish'.
;;
;; 3. Set `ispell-detect--langs' as below.
;;
;; Test detection:
;;
;;   (ispell-detect (point) (pos-eol)) ; l'oiseau aimait le beau
;;   (ispell-detect (point) (pos-eol)) ; detta är en mening på svenska
;;   (ispell-detect (point) (pos-eol)) ; this isn't just another program
;;
;; Test multiple language spelling:
;;
;;   https://dataswamp.org/~incal/test-spell/3lang.txt

(require 'cl-lib)
(require 'ispell)

(defvar ispell-detect--langs
  '(("/usr/share/dict/american-english-insane" "american-insane")
    ("/usr/share/dict/french"                  "francais")
    ("/usr/share/dict/swedish"                 "svenska")))

(defun ids--region ()
  (if (use-region-p)
      (list (region-beginning) (region-end))
    (list nil nil)))

(defun ispell-detect-spell (&optional beg end probe-forward spell-forward)
  (interactive (ids--region))
  (or beg (setq beg (point-min)))
  (or end (setq end (point-max)))
  (or probe-forward (setq probe-forward #'forward-sentence))
  (or spell-forward (setq spell-forward #'forward-paragraph))
  (goto-char beg)
  (cl-loop
    for beg = (point)
    for probe-end = (progn (funcall probe-forward) (point))
    for spell-end = (progn (goto-char beg) (funcall spell-forward) (min (point) end))
    while (< beg spell-end)
    for lang = (ispell-detect beg probe-end)
    do (unless (string= lang ispell-current-dictionary)
         (ispell-change-dictionary lang))
       (ispell-region beg spell-end)))

(defun ispell-detect (&optional beg end)
  "Detect the language used in the current buffer,
from the alternatives in `ispell-detect--langs'.
\nDetect on the region from BEG to END [paragraph].
\nMethod: `ispell-count'"
  (interactive (ids--region))
  (save-mark-and-excursion
    (cl-loop
      with beg = (or beg (progn (start-of-paragraph-text) (point)))
      with end = (or end (progn   (end-of-paragraph-text) (point)))
      for (wl d) in ispell-detect--langs
      collect (list d (ispell-count beg end wl)) into res
      finally return (caar (cl-sort res #'< :key #'cadr)))))

(defun ispell-count (&optional beg end wordlist)
  "Spell the region from BEG to END [whole buffer] with WORDLIST,
Return the ratio of incorrectly spelled words."
  (interactive (ids--region))
  (or beg (setq beg (point-min)))
  (or end (setq end (point-max)))
  (save-mark-and-excursion
    (goto-char beg)
    (forward-word)
    (backward-word)
    (cl-loop
      with words  = 0
      with errors = 0
      while (< (point) end)
      do (let ((word (thing-at-point 'word t)))
           (unless (ispell-lookup-words word wordlist)
             (cl-incf errors))
           (cl-incf words)
           (forward-to-word))
      finally return (/ errors words 1.0))))

(provide 'ispell-detect)

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
  2024-07-31 11:42 auto-detect multiple languages -- ispell-detect.el Emanuel Berg
@ 2024-08-02  2:53 ` Richard Stallman
  2024-08-02  8:06   ` Gregor Zattler
  0 siblings, 1 reply; 34+ messages in thread
From: Richard Stallman @ 2024-08-02  2:53 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > It works on arbitrary texts with no markup or
  > anything required.

  > Also a future release for ELPA, I hope.

It is a good idea, if it works reliably enough,
In order to be convenient, it needs to choose heuristically how
much text to test.

I suggest having a custom variable `ispell-language-list'
which could look like ("french" "spanish" "english").
Maybe there should be a special short file property to specify
a given file's spell check language.

Let's not complicate usage by putting small pieces of code
in separate packaes.  It would be simpler for users
to add this code to ispell.el.

-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
  2024-08-02  2:53 ` Richard Stallman
@ 2024-08-02  8:06   ` Gregor Zattler
  2024-08-02 22:32     ` Emanuel Berg
                       ` (3 more replies)
  0 siblings, 4 replies; 34+ messages in thread
From: Gregor Zattler @ 2024-08-02  8:06 UTC (permalink / raw)
  To: rms, Emanuel Berg; +Cc: emacs-devel

Hi Richard, Emanuel,
* Richard Stallman <rms@gnu.org> [2024-08-01; 22:53 -04]:
>   > It works on arbitrary texts with no markup or
>   > anything required.
>
>   > Also a future release for ELPA, I hope.
>
> It is a good idea, if it works reliably enough,
> In order to be convenient, it needs to choose heuristically how
> much text to test.
>
> I suggest having a custom variable `ispell-language-list'
> which could look like ("french" "spanish" "english").
> Maybe there should be a special short file property to specify
> a given file's spell check language.

there is already in GNU ELPA the package

    * guess-language: Emacs minor mode
      for robust automatic language
      detection

    Emacs minor mode that detects the
    language of what you're typing.
    Automatically switches the spell
    checker and typo-mode (if present).

    *Key features:*
    - Detection algorithm is robust,
      efficient, and dead simple.  Based
      on character trigrams.
    - Support for many languages.  More
      can be easily added.
    - Stays out of your way.  Set up
      once, then forget it exists.
    - Works with documents written in
      multiple languages.  [...]

It features a variable
`guess-language-languages'.  I don't use
it, because I did not manage to
configure it to integrate wcheck-mode.


Regards, Gregor



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
  2024-08-02  8:06   ` Gregor Zattler
@ 2024-08-02 22:32     ` Emanuel Berg
  2024-08-03  2:02     ` Emanuel Berg
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 34+ messages in thread
From: Emanuel Berg @ 2024-08-02 22:32 UTC (permalink / raw)
  To: emacs-devel

Gregor Zattler wrote:

>> I suggest having a custom variable `ispell-language-list'
>> which could look like ("french" "spanish" "english").
>> Maybe there should be a special short file property to
>> specify a given file's spell check language.
>
> there is already in GNU ELPA the package
>
>     * guess-language: Emacs minor mode for robust automatic
>       language detection

Okay, I should check that out before I do something with my
code, then.

Thank you!

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
  2024-08-02  8:06   ` Gregor Zattler
  2024-08-02 22:32     ` Emanuel Berg
@ 2024-08-03  2:02     ` Emanuel Berg
  2024-08-04  6:52       ` Uwe Brauer via Emacs development discussions.
  2024-08-03 13:25     ` Emanuel Berg
  2024-08-04  2:05     ` Richard Stallman
  3 siblings, 1 reply; 34+ messages in thread
From: Emanuel Berg @ 2024-08-03  2:02 UTC (permalink / raw)
  To: emacs-devel

Gregor Zattler wrote:

> there is already in GNU ELPA the package
>
>     * guess-language: Emacs minor mode
>       for robust automatic language
>       detection
>
>     Emacs minor mode that detects the
>     language of what you're typing.
>     Automatically switches the spell
>     checker and typo-mode (if present).
>
>     *Key features:*
>     - Detection algorithm is robust,
>       efficient, and dead simple.  Based
>       on character trigrams.
>     - Support for many languages.  More
>       can be easily added.
>     - Stays out of your way.  Set up
>       once, then forget it exists.
>     - Works with documents written in
>       multiple languages.  [...]
>
> It features a variable
> `guess-language-languages'.  I don't use
> it, because I did not manage to
> configure it to integrate wcheck-mode.

I think I'll drop this project if there is already one.
From that description it sounds better, to be honest :)

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
  2024-08-02  8:06   ` Gregor Zattler
  2024-08-02 22:32     ` Emanuel Berg
  2024-08-03  2:02     ` Emanuel Berg
@ 2024-08-03 13:25     ` Emanuel Berg
  2024-08-03 17:14       ` Emanuel Berg
  2024-08-04  2:05     ` Richard Stallman
  3 siblings, 1 reply; 34+ messages in thread
From: Emanuel Berg @ 2024-08-03 13:25 UTC (permalink / raw)
  To: emacs-devel

Gregor Zattler wrote:

> I don't use [guess-language] it, because I did not manage to
> configure it to integrate wcheck-mode.

Now you got me motivated ...

What is the optimal solution to this problem?

1. Detection

- is done _once_ for the entire material

- processes _everything_, so does not rely on incomplete and
  error-prone polling/probing

2. Spelling

- is done _once_ for each detected language

- works for all Emacs spellcheckers

So for (1), we need a detect-languages to return a list like
this - if we for example envision a buffer with mixed English,
Latvian, and Swedish:

(("en" ((en-1-beg en-1-end) ... (en-n-beg en-n-end)))
 ("lv" ((lv-1-beg lv-1-end) ... (lv-m-beg lv-m-end)))
 ("sv" ((sv-1-beg sv-1-end) ... (sv-o-beg sv-o-end)))) ; [*]

For (2) we need the equivalence of not `narrow-to-region' but
narrow-to-regions (or virtual-buffer-from-regions or
something).

Then it is a matter of:

(cl-loop for (lang regs) in (detect-languages) do
  (narrow-to-regions regs)
  (funcall #'spellchecker-change-dictionary lang)
  (funcall #'spellchecker-spell))

Tada, works for everything! \o/

So do we have detect-languages (maybe that is in
`guess-language'?) and narrow-to-regions?

[*] Language codes from ISO 639 set 1,
    https://en.wikipedia.org/wiki/List_of_ISO_639_language_codes

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
  2024-08-03 13:25     ` Emanuel Berg
@ 2024-08-03 17:14       ` Emanuel Berg
  2024-08-03 19:58         ` Emanuel Berg
  0 siblings, 1 reply; 34+ messages in thread
From: Emanuel Berg @ 2024-08-03 17:14 UTC (permalink / raw)
  To: emacs-devel

>> I don't use [guess-language] it, because I did not manage
>> to configure it to integrate wcheck-mode.
>
> Now you got me motivated [...]

If you get detection to get good data and do good annotations,
it is very easy with ispell.

Insert annotations, spell as below for every language,
remove them.

;; (setq ispell-skip-region-alist nil)
;; (cl-pushnew (list "-->{[^e][^n]}" "-->{en}") ispell-skip-region-alist :test #'equal)
;; (cl-pushnew (list                 "-->{en}") ispell-skip-region-alist :test #'equal)
;; (ispell-region (point) (point-max))
;; (while (re-search-forward "-->{.\\{2\\}}" nil t) (replace-match ""))

-->{en}Gnus is an Emacs package prymarily designed for reading
and posting Usenet news.

-->{sv}Aftonbladet är en svensk kvälls- och nättidning grundad
den 6 december 1830 av Lars Johan Hierta, vars ledarsida sedan
början av 1960-talet har den politiska beteckningen "oberoende
socialdemokratisk". Sedan den 25 augusti 1994 finns
Aftonbladet också på webben.

-->{sv}Inte mig.

-->{en}It can also be used to read and respond to messages
from a munber of other sources — email, remote directories,
digests, and so on.

-->{fr}L’oiseau aimait le beau sourire de la baleine.
Il aimait la façon gracieuse qu’elle avait de nager dans
l’eau. La baleine aimait les belles plumes blanches de
l’oiseau. Elle aimait le regarder planer dans le ciel.

-->{en}Sspell me.

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
  2024-08-03 17:14       ` Emanuel Berg
@ 2024-08-03 19:58         ` Emanuel Berg
  2024-08-04  5:09           ` Eli Zaretskii
  2024-08-04  7:17           ` Uwe Brauer via Emacs development discussions.
  0 siblings, 2 replies; 34+ messages in thread
From: Emanuel Berg @ 2024-08-03 19:58 UTC (permalink / raw)
  To: emacs-devel

>>> I don't use [guess-language] it, because I did not manage
>>> to configure it to integrate wcheck-mode.
>>
>> Now you got me motivated [...]
>
> If you get detection to get good data and do good
> annotations, it is very easy with ispell.
>
> Insert annotations, spell as below for every language [...]

See annotations last in the commented-out text/test area.
I don't remove them, which should happen when spell is done,
because they are inserted manually :)

Other than that - works.

;;; -*- lexical-binding: t -*-
;;
;; this file:
;;   https://dataswamp.org/~incal/emacs-init/ispell-all.el

(require 'cl-lib)
(require 'ispell)

(defvar ispell-langs '(("en" "american-insane")
                       ("fr" "francais")
                       ("sv" "svenska")))

(defun ispell-spell-all (&optional beg end)
  (interactive
    (when (use-region-p)
      (list (region-beginning)
            (region-end))))
  (or beg (setq beg (point)))
  (or end (setq end (point-max)))
  (cl-loop for (lang dict) in ispell-langs do
    (let* ((this (format "-->{%s}" lang))
           (all  "-->{[[:alpha:]]\\{2\\}}")
           (ispell-skip-region-alist
            `(,(list this)
              ,(list all (lambda ()
                           (unless (re-search-forward this nil t)
                             (goto-char end))))
              ,@ispell-skip-region-alist)))
      (unless (string= dict ispell-current-dictionary)
        (ispell-change-dictionary dict))
      (ispell-region beg end))))

;; -->{en}Gnus is an Emacs package prxmarily designed for
;; reading and posting Usenet news.
;;
;; -->{sv}Aftonbladet är en svensk kvälls- och nättidning
;; grundad den 6 december 1830 av Lars Johan Hierta, vars
;; ledarsida sedan början av 1960-talet har den politiska
;; beteckningen "oberoende socialdemokratisk". Sedan den 25
;; augusti 1994 finns Aftonbladet också på webben.
;;
;; -->{sv}Inte mig.
;;
;; -->{en}It can also be used to read and respond to messages
;; from a nhhunber of other sources — email, remote
;; directories, digests, and so on.
;;
;; -->{fr}Il aimait la façxn gracieuze qu’elle avait de nager
;; dans l’eau. La baleine aimait les belles plumes blanches de
;; l’oiseau. Elle aimait le regarder planer dans le ciel.
;;
;; -->{en}Sszpell me.

(provide 'ispell-all)

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
  2024-08-02  8:06   ` Gregor Zattler
                       ` (2 preceding siblings ...)
  2024-08-03 13:25     ` Emanuel Berg
@ 2024-08-04  2:05     ` Richard Stallman
  2024-08-07 15:36       ` Björn Bidar
  3 siblings, 1 reply; 34+ messages in thread
From: Richard Stallman @ 2024-08-04  2:05 UTC (permalink / raw)
  To: Gregor Zattler; +Cc: incal, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > there is already in GNU ELPA the package

  >     * guess-language: Emacs minor mode
  >       for robust automatic language
  >       detection

It sounds quite convenient.  Perhaps we should make more
packages make use of it -- and move it into Emacs core.

But first we should look it over and see if there are any
cleanups or imporovements we ought to make before that.

-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
  2024-08-03 19:58         ` Emanuel Berg
@ 2024-08-04  5:09           ` Eli Zaretskii
  2024-08-04  5:31             ` Emanuel Berg
  2024-08-04  5:42             ` Emanuel Berg
  2024-08-04  7:17           ` Uwe Brauer via Emacs development discussions.
  1 sibling, 2 replies; 34+ messages in thread
From: Eli Zaretskii @ 2024-08-04  5:09 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: emacs-devel

> From: Emanuel Berg <incal@dataswamp.org>
> Date: Sat, 03 Aug 2024 21:58:09 +0200
> 
> (defun ispell-spell-all (&optional beg end)
>   (interactive
>     (when (use-region-p)
>       (list (region-beginning)
>             (region-end))))
>   (or beg (setq beg (point)))
>   (or end (setq end (point-max)))
>   (cl-loop for (lang dict) in ispell-langs do
>     (let* ((this (format "-->{%s}" lang))
>            (all  "-->{[[:alpha:]]\\{2\\}}")
>            (ispell-skip-region-alist
>             `(,(list this)
>               ,(list all (lambda ()
>                            (unless (re-search-forward this nil t)
>                              (goto-char end))))
>               ,@ispell-skip-region-alist)))
>       (unless (string= dict ispell-current-dictionary)
>         (ispell-change-dictionary dict))
>       (ispell-region beg end))))

Looping through several languages and restarting the speller each time
has the disadvantage that it is quite expensive, and will ask whether
to save the local dictionary each time, which is a minor nuisance,
IMO.

And if one uses Hunspell, one can set up the speller so that it
spell-checks several languages in parallel, in which case the above is
not needed.  This has an advantage that different languages can be
mixed in the same region, instead of separating them.  (If you don't
separate, then spelling in one language will mark text ion another
language as misspellings.)



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
  2024-08-04  5:09           ` Eli Zaretskii
@ 2024-08-04  5:31             ` Emanuel Berg
  2024-08-04  5:50               ` Eli Zaretskii
  2024-08-04  5:42             ` Emanuel Berg
  1 sibling, 1 reply; 34+ messages in thread
From: Emanuel Berg @ 2024-08-04  5:31 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii wrote:

>> (defun ispell-spell-all (&optional beg end)
>>   (interactive
>>     (when (use-region-p)
>>       (list (region-beginning)
>>             (region-end))))
>>   (or beg (setq beg (point)))
>>   (or end (setq end (point-max)))
>>   (cl-loop for (lang dict) in ispell-langs do
>>     (let* ((this (format "-->{%s}" lang))
>>            (all  "-->{[[:alpha:]]\\{2\\}}")
>>            (ispell-skip-region-alist
>>             `(,(list this)
>>               ,(list all (lambda ()
>>                            (unless (re-search-forward this nil t)
>>                              (goto-char end))))
>>               ,@ispell-skip-region-alist)))
>>       (unless (string= dict ispell-current-dictionary)
>>         (ispell-change-dictionary dict))
>>       (ispell-region beg end))))
>
> Looping through several languages and restarting the speller
> each time has the disadvantage that it is quite expensive,

3 languages, 3 spellchecks - so linear, that is, quite good.

> and will ask whether to save the local dictionary each time,
> which is a minor nuisance

(setq ispell-silently-savep t)

> And if one uses Hunspell one can set up the speller so that
> it spell-checks several languages in parallel

Okay, how?

> This has an advantage that different languages can be mixed
> in the same region, instead of separating them.

How does Hunspell know vad som är vilket språk if they
are intermixed?

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
  2024-08-04  5:09           ` Eli Zaretskii
  2024-08-04  5:31             ` Emanuel Berg
@ 2024-08-04  5:42             ` Emanuel Berg
  2024-08-04  5:52               ` Eli Zaretskii
  1 sibling, 1 reply; 34+ messages in thread
From: Emanuel Berg @ 2024-08-04  5:42 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii wrote:

> spell-checks several languages in parallel, in which case
> the above is not needed. This has an advantage that
> different languages can be mixed in the same region, instead
> of separating them.

The languages are mixed here as well. It just has to be
annotated what is what. This maybe `guess-language' can
automate, I don't know.

That's the whole idea we are doing here BTW - mix languages.

If Hunspell can do it without restarting, all languages in
parallel and without sorting out the mix first in some way -
yeah, sounds too good to be true?

The only way I can think of is to do it word-by-word.
If language 1 fails, try language 2 and so on.

But with that method, how do you know that you didn't misspell
a language 1 word into a word in some other language?

Maybe that actually isn't a big problem in practice?

How does Hunspell do it in some other way?

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
  2024-08-04  5:31             ` Emanuel Berg
@ 2024-08-04  5:50               ` Eli Zaretskii
  2024-08-04  6:14                 ` Emanuel Berg
  0 siblings, 1 reply; 34+ messages in thread
From: Eli Zaretskii @ 2024-08-04  5:50 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: emacs-devel

> From: Emanuel Berg <incal@dataswamp.org>
> Date: Sun, 04 Aug 2024 07:31:15 +0200
> 
> Eli Zaretskii wrote:
> 
> > and will ask whether to save the local dictionary each time,
> > which is a minor nuisance
> 
> (setq ispell-silently-savep t)

That just sweeps the problem under the carpet, because users shouldn't
need to save the personal dictionary more than once in a session.

> > And if one uses Hunspell one can set up the speller so that
> > it spell-checks several languages in parallel
> 
> Okay, how?

See the doc string of ispell-hunspell-add-multi-dic.

> > This has an advantage that different languages can be mixed
> > in the same region, instead of separating them.
> 
> How does Hunspell know vad som är vilket språk if they
> are intermixed?

It checks the words in all the dictionaries.




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
  2024-08-04  5:42             ` Emanuel Berg
@ 2024-08-04  5:52               ` Eli Zaretskii
  0 siblings, 0 replies; 34+ messages in thread
From: Eli Zaretskii @ 2024-08-04  5:52 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: emacs-devel

> From: Emanuel Berg <incal@dataswamp.org>
> Date: Sun, 04 Aug 2024 07:42:05 +0200
> 
> The only way I can think of is to do it word-by-word.
> If language 1 fails, try language 2 and so on.
> 
> But with that method, how do you know that you didn't misspell
> a language 1 word into a word in some other language?

You don't.  But it works very well in practice, try it.  Especially if
the languages use different scripts, like English and Russian.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
  2024-08-04  5:50               ` Eli Zaretskii
@ 2024-08-04  6:14                 ` Emanuel Berg
  2024-08-04  6:27                   ` Eli Zaretskii
  2024-08-04 11:03                   ` Emanuel Berg
  0 siblings, 2 replies; 34+ messages in thread
From: Emanuel Berg @ 2024-08-04  6:14 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii wrote:

>> (setq ispell-silently-savep t)
>
> That just sweeps the problem under the carpet, because users
> shouldn't need to save the personal dictionary more than
> once in a session.

Here, "the problem" is spelling multiple languages in the
same buffer.

>> How does Hunspell know vad som är vilket språk if they
>> are intermixed?
>
> It checks the words in all the dictionaries.

Yes, that's the only way to do it?

You can do that with ispell, it is what I did in
ispell-detect.el [1] however that, `ispell-lookup-words',
doesn't use the regular ispell 'i-' dictionaries
"iamerican-insane" and so on, but the 'w-' wordlists
typically installed in /usr/share/dict/ - files that are
uncompressed any without metadata.

You can of course uncompress the ispell files and run some
sed, maybe

  sed -e 's/\/.*$//g' $src > $dst

to remove the metadata but naah, I think ispell should be more
than capable doing this all by itself using its
own dictionaries.

Ispell's code is a complete jungle, I'm not digging into it.
I can see in one second it is written in a style that is,
well, not mine.

I can spell individual words and get the result back, with
this in another file I have [2], this solved a cute little
problem [3].

That code is obviously a joke, here we see Emacs harmful "do
everything in a buffer" policy taken to a degree where a big
program cannot even do its own most basic task
programmatically. The word first has to be inserted! (Perhaps
this has changed since.)

If there was a better such function one could maybe consider
the Hunspell approach for ispell but if one did this for every
word it is out of the question to reload dictionary all
the time.

(defun spell-word (word)
  (with-temp-buffer
    (save-excursion
      (insert word) )
    (condition-case nil
        (not (ispell-word))
      (error nil) )))
      
;; (spell-word "length") ; t
;; (spell-word "lenght") ; nil

Anyway, maybe this should be dropped then and we should
recommend people who want this to use Hunspell?

[1] https://dataswamp.org/~incal/emacs-init/ispell-detect.el
[2] https://dataswamp.org/~incal/emacs-init/spell.el
[3] https://dataswamp.org/~incal/emacs-init/perm.el

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
  2024-08-04  6:14                 ` Emanuel Berg
@ 2024-08-04  6:27                   ` Eli Zaretskii
  2024-08-04  6:40                     ` Emanuel Berg
                                       ` (3 more replies)
  2024-08-04 11:03                   ` Emanuel Berg
  1 sibling, 4 replies; 34+ messages in thread
From: Eli Zaretskii @ 2024-08-04  6:27 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: emacs-devel

> From: Emanuel Berg <incal@dataswamp.org>
> Date: Sun, 04 Aug 2024 08:14:10 +0200
> 
> Eli Zaretskii wrote:
> 
> >> (setq ispell-silently-savep t)
> >
> > That just sweeps the problem under the carpet, because users
> > shouldn't need to save the personal dictionary more than
> > once in a session.
> 
> Here, "the problem" is spelling multiple languages in the
> same buffer.

The "problem" I meant was asking the user multiple times.

> >> How does Hunspell know vad som är vilket språk if they
> >> are intermixed?
> >
> > It checks the words in all the dictionaries.
> 
> Yes, that's the only way to do it?
> 
> You can do that with ispell

Not without restarting the speller, which is not the best solution,
IMO.  And it doesn't work with Flyspell, for example.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
  2024-08-04  6:27                   ` Eli Zaretskii
@ 2024-08-04  6:40                     ` Emanuel Berg
  2024-08-04  7:08                       ` Eli Zaretskii
  2024-08-04  7:13                     ` Emanuel Berg
                                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 34+ messages in thread
From: Emanuel Berg @ 2024-08-04  6:40 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii wrote:

>> You can do that with ispell
>
> Not without restarting the speller, which is not the best
> solution, IMO. And it doesn't work with Flyspell,
> for example.

You cannot even _spell a word_ programmatically in ispell.

If Hunspell does it, let's forget all about the IMO pretty
lousy software that is ispell and recommend Hunspell to
whoever asks for multiple languages.

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
  2024-08-03  2:02     ` Emanuel Berg
@ 2024-08-04  6:52       ` Uwe Brauer via Emacs development discussions.
  0 siblings, 0 replies; 34+ messages in thread
From: Uwe Brauer via Emacs development discussions. @ 2024-08-04  6:52 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1250 bytes --]


> Gregor Zattler wrote:

> I think I'll drop this project if there is already one.
> From that description it sounds better, to be honest :)


Not so hasty. I did not know about guess-language and tried it out.
(Till now, I switch my ispell-dictionaries manually, and have bound 4
different keys, for the 4 languages I use most)

Maybe I am missing something elementary, but it seems not to work for me.

    1. I open a new file (say in org mode.)

    2. I turn on guess-language-mode

    3. I type a paragraph in English, everything is fine, flyspell
       indicates me the errors, and if I mark the paragraph and run
       ispell-region ispell uses the british dictionary as it should.

    4. I repeat the experiment with German, but when I run
       ispell-region, it still uses the british dictionary not the
       german one. 

So I think I will open an issue in github.


BTW, there is also auto-dictionary but I have not tried that one out.

Uwe Brauer 
-- 
I strongly condemn Hamas heinous despicable pogroms/atrocities on Israel
I strongly condemn Putin's war of aggression against Ukraine.
I support to deliver weapons to Ukraine's military. 
I support the EU and NATO membership of Ukraine. 


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5684 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
  2024-08-04  6:40                     ` Emanuel Berg
@ 2024-08-04  7:08                       ` Eli Zaretskii
  2024-08-04 10:30                         ` Stefan Kangas
  0 siblings, 1 reply; 34+ messages in thread
From: Eli Zaretskii @ 2024-08-04  7:08 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: emacs-devel

> From: Emanuel Berg <incal@dataswamp.org>
> Date: Sun, 04 Aug 2024 08:40:32 +0200
> 
> You cannot even _spell a word_ programmatically in ispell.

Of course you can: just invoke ispell-word from your program.

> If Hunspell does it, let's forget all about the IMO pretty
> lousy software that is ispell and recommend Hunspell to
> whoever asks for multiple languages.

Hunspell is the back-end, ispell.el is its Emacs front-end and UI.  So
you cannot forget about ispell and use Hunspell.

And yes, I recommend using Hunspell via ispell.el to anyone who needs
to spell-check text that mixes several languages.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
  2024-08-04  6:27                   ` Eli Zaretskii
  2024-08-04  6:40                     ` Emanuel Berg
@ 2024-08-04  7:13                     ` Emanuel Berg
  2024-08-07 15:41                     ` Björn Bidar
       [not found]                     ` <87wmkspcgn.fsf@>
  3 siblings, 0 replies; 34+ messages in thread
From: Emanuel Berg @ 2024-08-04  7:13 UTC (permalink / raw)
  To: emacs-devel

And `wcheck-mode' - no hope for anything to work for that?

Gregor, but try Hunspell, if you feel like it!

(I don't even have that so must be a package?)

BTW so Hunspell merges the detection/spelling process, or
rather, they spell so much they don't have to think
about detecting. I can see why it works for Russian!

This is what I did as well, in the initial post. In effect,
what language is it? By spelling it and if it isn't spelled
right, chances are you have made a mistake - but also that the
language is actually some other.

That leaves only one stone left to turn then, and that is what
is the algorithm for detection they boast so much about at
`guess-language'?

If it really works well, that would be fun to see!

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
  2024-08-03 19:58         ` Emanuel Berg
  2024-08-04  5:09           ` Eli Zaretskii
@ 2024-08-04  7:17           ` Uwe Brauer via Emacs development discussions.
  2024-08-04  9:30             ` Emanuel Berg
  1 sibling, 1 reply; 34+ messages in thread
From: Uwe Brauer via Emacs development discussions. @ 2024-08-04  7:17 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1253 bytes --]



> See annotations last in the commented-out text/test area.
> I don't remove them, which should happen when spell is done,
> because they are inserted manually :)

> Other than that - works.

Not for me or I am missing somehting elementary.

I have 

Used your code but 

> (require 'cl-lib)
> (require 'ispell)

> (defvar ispell-langs '(("en" "american-insane")
>                        ("fr" "francais")
>                        ("sv" "svenska")))

But with 

(defvar ispell-langs '(("en" "american")
                       ("de" "odeutsch")))

american.aff -> english.aff
american.hash -> /var/lib/ispell/american.hash
deutsch.aff -> /usr/lib/ispell/odeutsch.aff
deutsch.hash -> /usr/lib/ispell/odeutsch.hash
odeutsch.aff -> ogerman.aff
odeutsch.hash -> ogerman.hash
ogerman.aff
ogerman.hash -> /var/lib/ispell/ogerman.hash

I tried the ispell-spell-all using

Damit sind die Voraussetzungen des lokalen Existenzsatzes für
Anfangsdaten $U(T)$ gegeben und man erhält die Existenz einer Lösung
auf dem Intervall $[T,T+\epsilon]$. 


Now let us look whether.


The german paragraph was checked using the american dictionary.

But I thought the language for each paragraph is automatically detected


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5684 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
  2024-08-04  7:17           ` Uwe Brauer via Emacs development discussions.
@ 2024-08-04  9:30             ` Emanuel Berg
  0 siblings, 0 replies; 34+ messages in thread
From: Emanuel Berg @ 2024-08-04  9:30 UTC (permalink / raw)
  To: emacs-devel

Uwe Brauer via "Emacs development discussions." wrote:

>> See annotations last in the commented-out text/test area.
>> I don't remove them, which should happen when spell is
>> done, because they are inserted manually :)
>
> But I thought the language for each paragraph is
> automatically detected

Nope, that's the only thing that doesn't happen.

> (defvar ispell-langs '(("en" "american")
>                        ("de" "odeutsch")))

For those, you need "-->{en}" and "-->{de}" to indicate where
one thing stops and the other begins.

So, no without automatic detection it is not so useful :)

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
  2024-08-04  7:08                       ` Eli Zaretskii
@ 2024-08-04 10:30                         ` Stefan Kangas
  2024-08-04 15:25                           ` Uwe Brauer via Emacs development discussions.
  0 siblings, 1 reply; 34+ messages in thread
From: Stefan Kangas @ 2024-08-04 10:30 UTC (permalink / raw)
  To: Eli Zaretskii, Emanuel Berg; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> And yes, I recommend using Hunspell via ispell.el to anyone who needs
> to spell-check text that mixes several languages.

Perhaps we should add something about this to (info "(emacs) Spelling").



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
  2024-08-04  6:14                 ` Emanuel Berg
  2024-08-04  6:27                   ` Eli Zaretskii
@ 2024-08-04 11:03                   ` Emanuel Berg
  1 sibling, 0 replies; 34+ messages in thread
From: Emanuel Berg @ 2024-08-04 11:03 UTC (permalink / raw)
  To: emacs-devel

> Ispell's code is a complete jungle, I'm not digging into it.
> I can see in one second it is written in a style that is,
> well, not mine.

I was curious to see, if Hunspell is much better and more
advanced, maybe that would show itself in modern code as well?

So I looked in ispell.el and made a search for "hunspell".

I yank it here.

Notes:

- ispell.el is 4 323 lines long. (Those are lines 1081-1205.)

- (how-many "\\Wcl-" (point-min))

  gives 12 hits: one for cl-lib, 11 for `cl-pushnew'.

- (how-many "\\Wpcase" (point-min))

  gives a single hit, for `pcase'.

This experiment was carried out on:

  GNU Emacs 30.0.50 (build 1, x86_64-pc-linux-gnu, cairo version
  1.16.0) of 2024-04-01 [commit
  a5fbb652ed3614d6735015551564f32b80e42c53]

(defun ispell-find-hunspell-dictionaries (&optional dictionary)
  "Look for installed Hunspell dictionaries.
Will initialize `ispell-hunspell-dictionary-alist' according
to dictionaries found, and will remove aliases from the list
in `ispell-dicts-name2locale-equivs-alist' if an explicit
dictionary from that list was found.

If DICTIONARY, check for that dictionary explicitly."
  (let ((hunspell-found-dicts
         (seq-filter
          (lambda (str)
            (when (string-match
                   ;; Hunspell gives this error when there is some
                   ;; installation problem, for example if $LANG is unset.
                   (concat "^Can't open affix or dictionary files "
                           "for dictionary named \"default\".$")
                   str)
              (user-error "Hunspell error (is $LANG unset?): %s" str))
            (file-name-absolute-p str))
          (split-string
           (with-temp-buffer
             (apply #'ispell-call-process
                    ispell-program-name nil t nil
                    `("-D"
                      ,@(and dictionary (list "-d" dictionary))
                      ;; Use -a to prevent Hunspell from trying to
                      ;; initialize its curses/termcap UI, which
                      ;; causes it to crash or fail to start in some
                      ;; MS-Windows ports.
                      "-a"
                      ;; Hunspell 1.7.0 (and later?) won't show LOADED
                      ;; DICTIONARY unless there's at least one file
                      ;; argument on the command line.  So we feed it
                      ;; with the null device.
                      ,null-device))
             (buffer-string))
           "[\n\r]+"
           t)))
	hunspell-default-dict
	hunspell-default-dict-entry
	hunspell-multi-dict)
    (dolist (dict hunspell-found-dicts)
      (let* ((full-name (file-name-nondirectory dict))
	     (basename  (file-name-sans-extension full-name))
	     (affix-file (concat dict ".aff")))
	(if (string-match "\\.aff$" dict)
	    ;; Found default dictionary
	    (progn
	      (if hunspell-default-dict
		  (setq hunspell-multi-dict
			(concat (or hunspell-multi-dict
				    (car hunspell-default-dict))
				"," basename))
		(setq affix-file dict)
		;; FIXME: The cdr of the list we cons below is never
		;; used.  Why do we need a list?
		(setq hunspell-default-dict (list basename affix-file)))
	      (ispell-print-if-debug
	       "++ ispell-fhd: default dict-entry:%s name:%s basename:%s\n"
	       dict full-name basename))
	  (if (and (not (assoc basename ispell-hunspell-dict-paths-alist))
		   (file-exists-p affix-file))
	      ;; Entry has an associated .aff file and no previous value.
	      (let ((affix-file (expand-file-name affix-file)))
		(ispell-print-if-debug
                 "++ ispell-fhd: dict-entry:%s name:%s basename:%s affix-file:%s\n"
                 dict full-name basename affix-file)
		(cl-pushnew (list basename affix-file)
                            ispell-hunspell-dict-paths-alist :test #'equal))
	    (ispell-print-if-debug
             "-- ispell-fhd: Skipping entry: %s\n" dict)))))
    ;; Remove entry from aliases alist if explicit dict was found.
    (let (newlist)
      (dolist (dict ispell-dicts-name2locale-equivs-alist)
	(if (assoc (car dict) ispell-hunspell-dict-paths-alist)
	    (ispell-print-if-debug
             "-- ispell-fhd: Excluding %s alias.  Standalone dict found.\n"
             (car dict))
	  (cl-pushnew dict newlist :test #'equal)))
      (setq ispell-dicts-name2locale-equivs-alist newlist))
    ;; Add known hunspell aliases
    (dolist (dict-equiv ispell-dicts-name2locale-equivs-alist)
      (let ((dict-equiv-key (car dict-equiv))
	    (dict-equiv-value (cadr dict-equiv))
	    (exclude-aliases (list   ;; Exclude TeX aliases
			      "esperanto-tex"
			      "francais7"
			      "francais-tex"
			      "norsk7-tex")))
	(if (and (assoc dict-equiv-value ispell-hunspell-dict-paths-alist)
		 (not (assoc dict-equiv-key ispell-hunspell-dict-paths-alist))
		 (not (member dict-equiv-key exclude-aliases)))
	    (let ((affix-file (cadr (assoc dict-equiv-value
                                           ispell-hunspell-dict-paths-alist))))
	      (ispell-print-if-debug "++ ispell-fhd: Adding alias %s -> %s.\n"
                                     dict-equiv-key affix-file)
	      (cl-pushnew (list dict-equiv-key affix-file)
                          ispell-hunspell-dict-paths-alist :test #'equal)))))
    ;; Parse and set values for default dictionary.
    (setq hunspell-default-dict (or hunspell-multi-dict
				    (car hunspell-default-dict)))
    ;; If we didn't find a dictionary based on the environment (i.e.,
    ;; the locale and the DICTIONARY variable), try again if
    ;; `ispell-dictionary' is set.
    (when (and (not hunspell-default-dict)
               (not dictionary)
               ispell-dictionary)
      (setq hunspell-default-dict
            (ispell-find-hunspell-dictionaries ispell-dictionary)))
    ;; If hunspell-default-dict is nil, ispell-parse-hunspell-affix-file
    ;; will barf with an error message that doesn't help users figure
    ;; out what is wrong.  Produce an error message that points to the
    ;; root cause of the problem.
    (unless hunspell-default-dict
      (error "Can't find Hunspell dictionary with a .aff affix file"))
    (setq hunspell-default-dict-entry
	  (ispell-parse-hunspell-affix-file hunspell-default-dict))
    ;; Create an alist of found dicts with only names, except for default dict.
    (setq ispell-hunspell-dictionary-alist
	  (list (cons nil (cdr hunspell-default-dict-entry))))
    (dolist (dict (mapcar #'car ispell-hunspell-dict-paths-alist))
      (cl-pushnew (if (string= dict hunspell-default-dict)
                      hunspell-default-dict-entry
                    (list dict))
                  ispell-hunspell-dictionary-alist :test #'equal))
    hunspell-default-dict))

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
  2024-08-04 10:30                         ` Stefan Kangas
@ 2024-08-04 15:25                           ` Uwe Brauer via Emacs development discussions.
  2024-08-04 15:56                             ` Emanuel Berg
  0 siblings, 1 reply; 34+ messages in thread
From: Uwe Brauer via Emacs development discussions. @ 2024-08-04 15:25 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 718 bytes --]

>>> "SK" == Stefan Kangas <stefankangas@gmail.com> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>> And yes, I recommend using Hunspell via ispell.el to anyone who needs
>> to spell-check text that mixes several languages.

> Perhaps we should add something about this to (info "(emacs) Spelling").

That would be extremely useful, especially indicating what that language
mixes means and to which extend different languages are automatically
detected. 

-- 
I strongly condemn Hamas heinous despicable pogroms/atrocities on Israel
I strongly condemn Putin's war of aggression against Ukraine.
I support to deliver weapons to Ukraine's military. 
I support the EU and NATO membership of Ukraine. 


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5684 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
  2024-08-04 15:25                           ` Uwe Brauer via Emacs development discussions.
@ 2024-08-04 15:56                             ` Emanuel Berg
  2024-08-04 17:45                               ` tomas
  0 siblings, 1 reply; 34+ messages in thread
From: Emanuel Berg @ 2024-08-04 15:56 UTC (permalink / raw)
  To: emacs-devel

Uwe Brauer via "Emacs development discussions." wrote:

> That would be extremely useful, especially indicating what
> that language mixes means and

It means, two or more languages in the same buffer.

Goal: being able to spell and not care about that, the
spellchecker deals with that for you, for one, two,
... languages.

> to which extend different languages are
> automatically detected.

They are not - with Hunspell every word is checked with every
dictionary. So it is a way the opposite of detection.

But it should work for you anyway. Try it if you want multiple
dictionaries.

`guess-language' says they can do it, my code could do it but
was more of a proof-of-concept - well, at the state where
I left it, maybe one could have taken it somewhere. But if
`guess-language' does it, and especially, if Hunspell has
a many times more robust solution, I'm out. Yes, try it if you
want this!

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
  2024-08-04 15:56                             ` Emanuel Berg
@ 2024-08-04 17:45                               ` tomas
  0 siblings, 0 replies; 34+ messages in thread
From: tomas @ 2024-08-04 17:45 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 682 bytes --]

On Sun, Aug 04, 2024 at 05:56:36PM +0200, Emanuel Berg wrote:
> Uwe Brauer via "Emacs development discussions." wrote:
> 
> > That would be extremely useful, especially indicating what
> > that language mixes means and
> 
> It means, two or more languages in the same buffer.
> 
> Goal: being able to spell and not care about that, the
> spellchecker deals with that for you, for one, two,
> ... languages.

Instead of --uh-- guessing, you folks might like to take a
look at what is "out there" [1]. Turns out that Some Smart
Folks (TM) have been thinking about this problem for a while.

cheers
[1] https://en.wikipedia.org/wiki/Language_identification

-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
  2024-08-04  2:05     ` Richard Stallman
@ 2024-08-07 15:36       ` Björn Bidar
  0 siblings, 0 replies; 34+ messages in thread
From: Björn Bidar @ 2024-08-07 15:36 UTC (permalink / raw)
  To: Richard Stallman; +Cc: Gregor Zattler, incal, emacs-devel

Richard Stallman <rms@gnu.org> writes:

>   > there is already in GNU ELPA the package
>
>   >     * guess-language: Emacs minor mode
>   >       for robust automatic language
>   >       detection
>
> It sounds quite convenient.  Perhaps we should make more
> packages make use of it -- and move it into Emacs core.
>
> But first we should look it over and see if there are any
> cleanups or imporovements we ought to make before that.

Is the developer of the package in this mailing list? Can someone CC
them?

IMHO we should try to interact with the project directly rather than to
use this ML to conduct improvements.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
  2024-08-04  6:27                   ` Eli Zaretskii
  2024-08-04  6:40                     ` Emanuel Berg
  2024-08-04  7:13                     ` Emanuel Berg
@ 2024-08-07 15:41                     ` Björn Bidar
       [not found]                     ` <87wmkspcgn.fsf@>
  3 siblings, 0 replies; 34+ messages in thread
From: Björn Bidar @ 2024-08-07 15:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Emanuel Berg, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Emanuel Berg <incal@dataswamp.org>
>> Date: Sun, 04 Aug 2024 08:14:10 +0200
>> 
>> Eli Zaretskii wrote:
>> 
>> >> (setq ispell-silently-savep t)
>> >
>> > That just sweeps the problem under the carpet, because users
>> > shouldn't need to save the personal dictionary more than
>> > once in a session.
>> 
>> Here, "the problem" is spelling multiple languages in the
>> same buffer.
>
> The "problem" I meant was asking the user multiple times.
>
>> >> How does Hunspell know vad som är vilket språk if they
>> >> are intermixed?
>> >
>> > It checks the words in all the dictionaries.
>> 
>> Yes, that's the only way to do it?
>> 
>> You can do that with ispell
>
> Not without restarting the speller, which is not the best solution,
> IMO.  And it doesn't work with Flyspell, for example.

Jinx actually allows to do that e.g. I can type German, Finnish and
English in the same buffer without switching the dictionary. Being able
to write like is especially handy when you have English loan words in
your language.

Guess-language only helps if you switch between languages but then stick
only to one language.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* guess-language package
       [not found]                     ` <87wmkspcgn.fsf@>
@ 2024-08-08  2:01                       ` Richard Stallman
  2024-08-08  6:42                         ` Joost Kremers
  2024-08-08  6:55                       ` auto-detect multiple languages -- ispell-detect.el Joost Kremers
  2024-08-12  1:47                       ` Richard Stallman
  2 siblings, 1 reply; 34+ messages in thread
From: Richard Stallman @ 2024-08-08  2:01 UTC (permalink / raw)
  To: Björn Bidar; +Cc: eliz, incal, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > >   >     * guess-language: Emacs minor mode
  > >   >       for robust automatic language
  > >   >       detection

  > Is the developer of the package in this mailing list? Can someone CC
  > them?

Can anyone tell us per name and email addrses?

  > IMHO we should try to interact with the project directly rather than to
  > use this ML to conduct improvements.

The package is already part of Emacs.  But wherever the discussion
happens, let's actively help include its developer.

-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: guess-language package
  2024-08-08  2:01                       ` guess-language package Richard Stallman
@ 2024-08-08  6:42                         ` Joost Kremers
  2024-08-12  1:47                           ` Richard Stallman
  0 siblings, 1 reply; 34+ messages in thread
From: Joost Kremers @ 2024-08-08  6:42 UTC (permalink / raw)
  To: Richard Stallman; +Cc: Björn Bidar, eliz, incal, emacs-devel

On Wed, Aug 07 2024, Richard Stallman wrote:
> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
> [[[ whether defending the US Constitution against all enemies,     ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
>   > >   >     * guess-language: Emacs minor mode
>   > >   >       for robust automatic language
>   > >   >       detection
>
>   > Is the developer of the package in this mailing list? Can someone CC
>   > them?
>
> Can anyone tell us per name and email addrses?

The guess-language package is developed by Titus von der Malsberg
<malsburg@posteo.de>. The package is on MELPA and the code on Github.


-- 
Joost Kremers
Life has its moments



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
       [not found]                     ` <87wmkspcgn.fsf@>
  2024-08-08  2:01                       ` guess-language package Richard Stallman
@ 2024-08-08  6:55                       ` Joost Kremers
  2024-08-12  1:47                       ` Richard Stallman
  2 siblings, 0 replies; 34+ messages in thread
From: Joost Kremers @ 2024-08-08  6:55 UTC (permalink / raw)
  To: Björn Bidar; +Cc: Eli Zaretskii, Emanuel Berg, emacs-devel

On Wed, Aug 07 2024, Björn Bidar wrote:
> Guess-language only helps if you switch between languages but then stick
> only to one language.

For the paragraph, at least. You can have separate paragraphs in different
languages and when you move point to a particular paragraph, guess-language
detects the language of the paragraph and spell-checks it using that language.
Mixing languages within one paragraph is not something guess-language can
handle, though.

-- 
Joost Kremers
Life has its moments



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: guess-language package
  2024-08-08  6:42                         ` Joost Kremers
@ 2024-08-12  1:47                           ` Richard Stallman
  0 siblings, 0 replies; 34+ messages in thread
From: Richard Stallman @ 2024-08-12  1:47 UTC (permalink / raw)
  To: Joost Kremers; +Cc: bjorn.bidar, eliz, incal, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > The guess-language package is developed by Titus von der Malsberg
  > <malsburg@posteo.de>.

Thanks.

   The package is on MELPA and the code on Github.

That is not a good place -- but if we move it into Emacs core,
the source code will presumably be in Emacs, so people won't
be asked to look in GitHub for it.

-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: auto-detect multiple languages -- ispell-detect.el
       [not found]                     ` <87wmkspcgn.fsf@>
  2024-08-08  2:01                       ` guess-language package Richard Stallman
  2024-08-08  6:55                       ` auto-detect multiple languages -- ispell-detect.el Joost Kremers
@ 2024-08-12  1:47                       ` Richard Stallman
  2 siblings, 0 replies; 34+ messages in thread
From: Richard Stallman @ 2024-08-12  1:47 UTC (permalink / raw)
  To: Björn Bidar; +Cc: emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > Jinx actually allows to do that e.g. I can type German, Finnish and
  > English in the same buffer without switching the dictionary.

grep doesn't find any other mention of Jinx recently.
It is it a program we could reasonably install in Emacs?
Or has it mentioned to show that such a feature is possible?

  > For the paragraph, at least. You can have separate paragraphs in
  > different languages and when you move point to a particular
  > paragraph, guess-language detects the language of the paragraph
  > and spell-checks it using that language.  Mixing languages within
  > one paragraph is not something guess-language can handle, though.

This is not ideal, but it is pretty good.

If Hunspell can already handle mixing languages in one paragraph,
is the best option to add guess-language to Emacs core
and urge those who need mixing within paragraphs to use Hunspell?

-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2024-08-12  1:47 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-31 11:42 auto-detect multiple languages -- ispell-detect.el Emanuel Berg
2024-08-02  2:53 ` Richard Stallman
2024-08-02  8:06   ` Gregor Zattler
2024-08-02 22:32     ` Emanuel Berg
2024-08-03  2:02     ` Emanuel Berg
2024-08-04  6:52       ` Uwe Brauer via Emacs development discussions.
2024-08-03 13:25     ` Emanuel Berg
2024-08-03 17:14       ` Emanuel Berg
2024-08-03 19:58         ` Emanuel Berg
2024-08-04  5:09           ` Eli Zaretskii
2024-08-04  5:31             ` Emanuel Berg
2024-08-04  5:50               ` Eli Zaretskii
2024-08-04  6:14                 ` Emanuel Berg
2024-08-04  6:27                   ` Eli Zaretskii
2024-08-04  6:40                     ` Emanuel Berg
2024-08-04  7:08                       ` Eli Zaretskii
2024-08-04 10:30                         ` Stefan Kangas
2024-08-04 15:25                           ` Uwe Brauer via Emacs development discussions.
2024-08-04 15:56                             ` Emanuel Berg
2024-08-04 17:45                               ` tomas
2024-08-04  7:13                     ` Emanuel Berg
2024-08-07 15:41                     ` Björn Bidar
     [not found]                     ` <87wmkspcgn.fsf@>
2024-08-08  2:01                       ` guess-language package Richard Stallman
2024-08-08  6:42                         ` Joost Kremers
2024-08-12  1:47                           ` Richard Stallman
2024-08-08  6:55                       ` auto-detect multiple languages -- ispell-detect.el Joost Kremers
2024-08-12  1:47                       ` Richard Stallman
2024-08-04 11:03                   ` Emanuel Berg
2024-08-04  5:42             ` Emanuel Berg
2024-08-04  5:52               ` Eli Zaretskii
2024-08-04  7:17           ` Uwe Brauer via Emacs development discussions.
2024-08-04  9:30             ` Emanuel Berg
2024-08-04  2:05     ` Richard Stallman
2024-08-07 15:36       ` Björn Bidar

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.