all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Hongyi Zhao <hongyi.zhao@gmail.com>
To: Emanuel Berg <moasenwood@zoho.eu>,
	help-gnu-emacs <help-gnu-emacs@gnu.org>
Subject: Re: Let ispell use use multiple dictionaries stored in different files.
Date: Sun, 8 Aug 2021 12:16:38 +0800	[thread overview]
Message-ID: <CAGP6POL_pDezOvypj73CT75xxqS_EAsSQbKpxk259aQ=xr51QQ@mail.gmail.com> (raw)
In-Reply-To: <87o8a8fuzv.fsf@zoho.eu>

On Sun, Aug 8, 2021 at 11:58 AM Emanuel Berg via Users list for the
GNU Emacs text editor <help-gnu-emacs@gnu.org> wrote:
>
> Hongyi Zhao wrote:
>
> > combine the wamerican-insane and Webster_s_Unabridged_3 into
> > one even bigger word list file, which includes 790592
> > entries at the moment:
> >
> > $ awk '!a[$0]++' /usr/share/dict/american-english-insane
> > Webster_s_Unabridged_3.txt |
> >    tee ~/american-english-insane-Webster_s_Unabridged | wc
> >  790592  892589 8649982
>
> Hm, is awk '!a[$0]++' faster than sort -u?

The above awk code keep the occurrences order of all words appeared in
the original word list files.

>   $ sort -u A B > AB
>   $ wc -l AB

But the sort command will sort them accordingly. Keep in mind that the
ispell and any autocompletion tools/frameworks needs the already well
sorted word list file for achieving affordable performance when the
word list file is huge, at least this is the case for Emacs's ispell
initialization stage. Out of this consideration, I don't want to
change the occurrences order of the words given in the original word
list files.

> Where did you find Webster_s_Unabridged_3.txt ?

This file is built by myself, and I have given the specific steps to
create it [1]:

$ sudo apt-get install python3-tk tix
# pyenv python environment for this operation:
$ pyenv shell datasci
$ pip install gobject PyGObject pyglossary
$ mkdir -p ~/.stardict/dic && cd $_
$ curl -O http://download.huzheng.org/bigdict/stardict-Webster_s_Unabridged_3-2.4.2.tar.bz2
$ tar xvf stardict-Webster_s_Unabridged_3-2.4.2.tar.bz2
$ cd stardict-Webster_s_Unabridged_3-2.4.2
$ pyglossary Webster_s_Unabridged_3.ifo Webster_s_Unabridged_3.csv
$ awk -F, 'NR > 8 {sub(/^["]/,"",$1);sub(/["]$/,"",$1);print $1}'
Webster_s_Unabridged_3.csv > Webster_s_Unabridged_3.txt

[1] https://github.com/company-mode/company-mode/issues/1146#issuecomment-886172208

Best regards
-- 
Assoc. Prof. Hongyi Zhao <hongyi.zhao@gmail.com>
Theory and Simulation of Materials
Hebei Vocational University of Technology and Engineering
No. 473, Quannan West Street, Xindu District, Xingtai, Hebei province



  reply	other threads:[~2021-08-08  4:16 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-07 12:20 Let ispell use use multiple dictionaries stored in different files Hongyi Zhao
2021-08-07 12:41 ` Eli Zaretskii
2021-08-07 13:46   ` Hongyi Zhao
2021-08-07 14:03     ` Eli Zaretskii
2021-08-07 15:54 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08  3:10   ` Hongyi Zhao
2021-08-08  3:35     ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08  4:03       ` Hongyi Zhao
2021-08-08  4:09         ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08  4:22           ` Hongyi Zhao
2021-08-08  3:58     ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08  4:16       ` Hongyi Zhao [this message]
2021-08-08  4:40         ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08  4:52           ` Hongyi Zhao
2021-08-08  4:58             ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08  5:08               ` Hongyi Zhao
2021-08-08  5:19                 ` Hongyi Zhao
2021-08-08  6:53                   ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08  7:14                     ` Hongyi Zhao
2021-08-08  7:22                       ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08  7:43                         ` Hongyi Zhao
2021-08-08  7:51                           ` Hongyi Zhao
2021-08-08  8:02                             ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08  8:27                               ` Hongyi Zhao
2021-08-08  8:37                                 ` Hongyi Zhao
2021-08-08 10:14                                   ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 13:09                                     ` Hongyi Zhao
2021-08-08 16:27                                       ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-09  0:06                                         ` Hongyi Zhao
2021-08-09  0:34                                           ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-09  1:31                                             ` Hongyi Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGP6POL_pDezOvypj73CT75xxqS_EAsSQbKpxk259aQ=xr51QQ@mail.gmail.com' \
    --to=hongyi.zhao@gmail.com \
    --cc=help-gnu-emacs@gnu.org \
    --cc=moasenwood@zoho.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.