From: Hongyi Zhao <hongyi.zhao@gmail.com>
To: Emanuel Berg <moasenwood@zoho.eu>,
help-gnu-emacs <help-gnu-emacs@gnu.org>
Subject: Re: Let ispell use use multiple dictionaries stored in different files.
Date: Sun, 8 Aug 2021 12:16:38 +0800 [thread overview]
Message-ID: <CAGP6POL_pDezOvypj73CT75xxqS_EAsSQbKpxk259aQ=xr51QQ@mail.gmail.com> (raw)
In-Reply-To: <87o8a8fuzv.fsf@zoho.eu>
On Sun, Aug 8, 2021 at 11:58 AM Emanuel Berg via Users list for the
GNU Emacs text editor <help-gnu-emacs@gnu.org> wrote:
>
> Hongyi Zhao wrote:
>
> > combine the wamerican-insane and Webster_s_Unabridged_3 into
> > one even bigger word list file, which includes 790592
> > entries at the moment:
> >
> > $ awk '!a[$0]++' /usr/share/dict/american-english-insane
> > Webster_s_Unabridged_3.txt |
> > tee ~/american-english-insane-Webster_s_Unabridged | wc
> > 790592 892589 8649982
>
> Hm, is awk '!a[$0]++' faster than sort -u?
The above awk code keep the occurrences order of all words appeared in
the original word list files.
> $ sort -u A B > AB
> $ wc -l AB
But the sort command will sort them accordingly. Keep in mind that the
ispell and any autocompletion tools/frameworks needs the already well
sorted word list file for achieving affordable performance when the
word list file is huge, at least this is the case for Emacs's ispell
initialization stage. Out of this consideration, I don't want to
change the occurrences order of the words given in the original word
list files.
> Where did you find Webster_s_Unabridged_3.txt ?
This file is built by myself, and I have given the specific steps to
create it [1]:
$ sudo apt-get install python3-tk tix
# pyenv python environment for this operation:
$ pyenv shell datasci
$ pip install gobject PyGObject pyglossary
$ mkdir -p ~/.stardict/dic && cd $_
$ curl -O http://download.huzheng.org/bigdict/stardict-Webster_s_Unabridged_3-2.4.2.tar.bz2
$ tar xvf stardict-Webster_s_Unabridged_3-2.4.2.tar.bz2
$ cd stardict-Webster_s_Unabridged_3-2.4.2
$ pyglossary Webster_s_Unabridged_3.ifo Webster_s_Unabridged_3.csv
$ awk -F, 'NR > 8 {sub(/^["]/,"",$1);sub(/["]$/,"",$1);print $1}'
Webster_s_Unabridged_3.csv > Webster_s_Unabridged_3.txt
[1] https://github.com/company-mode/company-mode/issues/1146#issuecomment-886172208
Best regards
--
Assoc. Prof. Hongyi Zhao <hongyi.zhao@gmail.com>
Theory and Simulation of Materials
Hebei Vocational University of Technology and Engineering
No. 473, Quannan West Street, Xindu District, Xingtai, Hebei province
next prev parent reply other threads:[~2021-08-08 4:16 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-07 12:20 Let ispell use use multiple dictionaries stored in different files Hongyi Zhao
2021-08-07 12:41 ` Eli Zaretskii
2021-08-07 13:46 ` Hongyi Zhao
2021-08-07 14:03 ` Eli Zaretskii
2021-08-07 15:54 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 3:10 ` Hongyi Zhao
2021-08-08 3:35 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 4:03 ` Hongyi Zhao
2021-08-08 4:09 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 4:22 ` Hongyi Zhao
2021-08-08 3:58 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 4:16 ` Hongyi Zhao [this message]
2021-08-08 4:40 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 4:52 ` Hongyi Zhao
2021-08-08 4:58 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 5:08 ` Hongyi Zhao
2021-08-08 5:19 ` Hongyi Zhao
2021-08-08 6:53 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 7:14 ` Hongyi Zhao
2021-08-08 7:22 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 7:43 ` Hongyi Zhao
2021-08-08 7:51 ` Hongyi Zhao
2021-08-08 8:02 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 8:27 ` Hongyi Zhao
2021-08-08 8:37 ` Hongyi Zhao
2021-08-08 10:14 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 13:09 ` Hongyi Zhao
2021-08-08 16:27 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-09 0:06 ` Hongyi Zhao
2021-08-09 0:34 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-09 1:31 ` Hongyi Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAGP6POL_pDezOvypj73CT75xxqS_EAsSQbKpxk259aQ=xr51QQ@mail.gmail.com' \
--to=hongyi.zhao@gmail.com \
--cc=help-gnu-emacs@gnu.org \
--cc=moasenwood@zoho.eu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).