* Let ispell use use multiple dictionaries stored in different files.
@ 2021-08-07 12:20 Hongyi Zhao
2021-08-07 12:41 ` Eli Zaretskii
2021-08-07 15:54 ` Emanuel Berg via Users list for the GNU Emacs text editor
0 siblings, 2 replies; 31+ messages in thread
From: Hongyi Zhao @ 2021-08-07 12:20 UTC (permalink / raw)
To: help-gnu-emacs
On Ubuntu 20.04, I noticed that ispell use the following file as its
completion dictionary: "/usr/share/dict/words", which comes from
wamerican package:
$ apt-file search /usr/share/dict/words
wamerican: /usr/share/dict/words
But I want to know if ispell can use multiple dictionaries stored in
different files for this purpose.
Regards,
Hongyi
--
Assoc. Prof. Hongyi Zhao <hongyi.zhao@gmail.com>
Theory and Simulation of Materials
Hebei Vocational University of Technology and Engineering
No. 473, Quannan West Street, Xindu District, Xingtai, Hebei province
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-07 12:20 Let ispell use use multiple dictionaries stored in different files Hongyi Zhao
@ 2021-08-07 12:41 ` Eli Zaretskii
2021-08-07 13:46 ` Hongyi Zhao
2021-08-07 15:54 ` Emanuel Berg via Users list for the GNU Emacs text editor
1 sibling, 1 reply; 31+ messages in thread
From: Eli Zaretskii @ 2021-08-07 12:41 UTC (permalink / raw)
To: help-gnu-emacs
> From: Hongyi Zhao <hongyi.zhao@gmail.com>
> Date: Sat, 7 Aug 2021 20:20:02 +0800
>
> On Ubuntu 20.04, I noticed that ispell use the following file as its
> completion dictionary: "/usr/share/dict/words", which comes from
> wamerican package:
>
> $ apt-file search /usr/share/dict/words
> wamerican: /usr/share/dict/words
>
> But I want to know if ispell can use multiple dictionaries stored in
> different files for this purpose.
ispell-alternate-dictionary is a defcustom, so you can customized the
value to point to any other file. But it must to be a single file.
Why do you need to use more than one file at once? This file is not
really used as a dictionary, it is used to complete words you type as
replacements, or when you are looking for a word you cannot remember.
It isn't a dictionary used by the speller for finding spelling
mistakes.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-07 12:41 ` Eli Zaretskii
@ 2021-08-07 13:46 ` Hongyi Zhao
2021-08-07 14:03 ` Eli Zaretskii
0 siblings, 1 reply; 31+ messages in thread
From: Hongyi Zhao @ 2021-08-07 13:46 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: help-gnu-emacs
On Sat, Aug 7, 2021 at 8:41 PM Eli Zaretskii <eliz@gnu.org> wrote:
>
> > From: Hongyi Zhao <hongyi.zhao@gmail.com>
> > Date: Sat, 7 Aug 2021 20:20:02 +0800
> >
> > On Ubuntu 20.04, I noticed that ispell use the following file as its
> > completion dictionary: "/usr/share/dict/words", which comes from
> > wamerican package:
> >
> > $ apt-file search /usr/share/dict/words
> > wamerican: /usr/share/dict/words
> >
> > But I want to know if ispell can use multiple dictionaries stored in
> > different files for this purpose.
>
> ispell-alternate-dictionary is a defcustom, so you can customized the
> value to point to any other file. But it must to be a single file.
Thank you for letting me know this restriction.
> Why do you need to use more than one file at once?
Say, use different files to store vocabulary of different disciplines.
> This file is not really used as a dictionary, it is used to complete words you type as
> replacements, or when you are looking for a word you cannot remember.
> It isn't a dictionary used by the speller for finding spelling
> mistakes.
Thank you for your explanation.
Regards
--
Assoc. Prof. Hongyi Zhao <hongyi.zhao@gmail.com>
Theory and Simulation of Materials
Hebei Vocational University of Technology and Engineering
No. 473, Quannan West Street, Xindu District, Xingtai, Hebei province
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-07 13:46 ` Hongyi Zhao
@ 2021-08-07 14:03 ` Eli Zaretskii
0 siblings, 0 replies; 31+ messages in thread
From: Eli Zaretskii @ 2021-08-07 14:03 UTC (permalink / raw)
To: help-gnu-emacs
> From: Hongyi Zhao <hongyi.zhao@gmail.com>
> Date: Sat, 7 Aug 2021 21:46:20 +0800
> Cc: help-gnu-emacs <help-gnu-emacs@gnu.org>
>
> > Why do you need to use more than one file at once?
>
> Say, use different files to store vocabulary of different disciplines.
You can switch dictionaries if and when needed. But my recommendation
is to find the largest and most extensive word list you can put your
hands on, and use that: it should have words from every discipline out
there.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-07 12:20 Let ispell use use multiple dictionaries stored in different files Hongyi Zhao
2021-08-07 12:41 ` Eli Zaretskii
@ 2021-08-07 15:54 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 3:10 ` Hongyi Zhao
1 sibling, 1 reply; 31+ messages in thread
From: Emanuel Berg via Users list for the GNU Emacs text editor @ 2021-08-07 15:54 UTC (permalink / raw)
To: help-gnu-emacs
Hongyi Zhao wrote:
> On Ubuntu 20.04, I noticed that ispell use the following
> file as its completion dictionary: "/usr/share/dict/words",
> which comes from wamerican package:
>
> $ apt-file search /usr/share/dict/words wamerican:
> /usr/share/dict/words
>
> But I want to know if ispell can use multiple dictionaries
> stored in different files for this purpose.
Install wamerican-insane - that should be the biggest ... then
just start type and spell, insert everything that is correct,
using the interface. It should be more than enough
pretty soon.
As for using multiple files, you can do a little Elisp loop
that iterates the dictionaries.
Another alternative is this, see this discussion of virtual
files on Linux:
https://unix.stackexchange.com/questions/94041/a-virtual-file-containing-the-concatenation-of-other-files
But I think it is overkill for this purpose (could be cool to
try tho), if I wanted to use several files, I'd just have
a simple script or Makefile that merged them into a new,
regular file, and then that's what I would use.
--
underground experts united
https://dataswamp.org/~incal
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-07 15:54 ` Emanuel Berg via Users list for the GNU Emacs text editor
@ 2021-08-08 3:10 ` Hongyi Zhao
2021-08-08 3:35 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 3:58 ` Emanuel Berg via Users list for the GNU Emacs text editor
0 siblings, 2 replies; 31+ messages in thread
From: Hongyi Zhao @ 2021-08-08 3:10 UTC (permalink / raw)
To: Emanuel Berg, help-gnu-emacs
On Sat, Aug 7, 2021 at 11:54 PM Emanuel Berg via Users list for the
GNU Emacs text editor <help-gnu-emacs@gnu.org> wrote:
>
> Hongyi Zhao wrote:
>
> > On Ubuntu 20.04, I noticed that ispell use the following
> > file as its completion dictionary: "/usr/share/dict/words",
> > which comes from wamerican package:
> >
> > $ apt-file search /usr/share/dict/words wamerican:
> > /usr/share/dict/words
> >
> > But I want to know if ispell can use multiple dictionaries
> > stored in different files for this purpose.
>
> Install wamerican-insane - that should be the biggest ... then
> just start type and spell, insert everything that is correct,
> using the interface. It should be more than enough
> pretty soon.
In fact, I have already installed this package some time ago. Here are
some additional comments:
There are 3 similar packages available in Ubuntu, and they all come
from wordlist project [1], as shown below:
$ apt-cache pkgnames | grep -- '^w.*-insane'
wbritish-insane
wamerican-insane
wcanadian-insane
But I still haven't figured out the convenient way to build/create the
latest insane version of English dictionary words lists from its
upstream repo, for which I've filed an issue there [2].
In my case, I first create a word list based on
Webster_s_Unabridged_3, see here [3] for more detailed steps. Then I
combine the wamerican-insane and Webster_s_Unabridged_3 into one even
bigger word list file, which includes 790592 entries at the moment:
$ awk '!a[$0]++' /usr/share/dict/american-english-insane
Webster_s_Unabridged_3.txt |
tee ~/american-english-insane-Webster_s_Unabridged | wc
790592 892589 8649982
[1] https://github.com/en-wl/wordlist
[2] https://github.com/en-wl/wordlist/issues/329
[3] https://github.com/company-mode/company-mode/issues/1146#issuecomment-886172208
> As for using multiple files, you can do a little Elisp loop
> that iterates the dictionaries.
Thank you for your instruction.
> Another alternative is this, see this discussion of virtual
> files on Linux:
>
> https://unix.stackexchange.com/questions/94041/a-virtual-file-containing-the-concatenation-of-other-files
>
> But I think it is overkill for this purpose (could be cool to
> try tho), if I wanted to use several files, I'd just have
> a simple script or Makefile that merged them into a new,
> regular file, and then that's what I would use.
I agree, and thank you again for sharing your views.
Hongyi
--
Assoc. Prof. Hongyi Zhao <hongyi.zhao@gmail.com>
Theory and Simulation of Materials
Hebei Vocational University of Technology and Engineering
No. 473, Quannan West Street, Xindu District, Xingtai, Hebei province
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-08 3:10 ` Hongyi Zhao
@ 2021-08-08 3:35 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 4:03 ` Hongyi Zhao
2021-08-08 3:58 ` Emanuel Berg via Users list for the GNU Emacs text editor
1 sibling, 1 reply; 31+ messages in thread
From: Emanuel Berg via Users list for the GNU Emacs text editor @ 2021-08-08 3:35 UTC (permalink / raw)
To: help-gnu-emacs
Hongyi Zhao wrote:
> $ apt-cache pkgnames | grep -- '^w.*-insane'
> wbritish-insane
> wamerican-insane
> wcanadian-insane
Yeah, in Debian the same:
$ aptitude search w.\*-insane
i wamerican-insane - American English dictionary words for /us
p wbritish-insane - British English dictionary words for /usr
p wcanadian-insane - Canadian English dictionary words for /us
> But I still haven't figured out the convenient way to
> build/create the latest insane version of English dictionary
> words lists from its upstream repo, for which I've filed an
> issue there [2].
you can get the source with
$ git clone https://github.com/en-wl/wordlist
but 'make' (as it says "To build simply type [...] make" in
the README.md) gives me just errors.
--
underground experts united
https://dataswamp.org/~incal
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-08 3:10 ` Hongyi Zhao
2021-08-08 3:35 ` Emanuel Berg via Users list for the GNU Emacs text editor
@ 2021-08-08 3:58 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 4:16 ` Hongyi Zhao
1 sibling, 1 reply; 31+ messages in thread
From: Emanuel Berg via Users list for the GNU Emacs text editor @ 2021-08-08 3:58 UTC (permalink / raw)
To: help-gnu-emacs
Hongyi Zhao wrote:
> combine the wamerican-insane and Webster_s_Unabridged_3 into
> one even bigger word list file, which includes 790592
> entries at the moment:
>
> $ awk '!a[$0]++' /usr/share/dict/american-english-insane
> Webster_s_Unabridged_3.txt |
> tee ~/american-english-insane-Webster_s_Unabridged | wc
> 790592 892589 8649982
Hm, is awk '!a[$0]++' faster than sort -u?
$ sort -u A B > AB
$ wc -l AB
Where did you find Webster_s_Unabridged_3.txt ?
--
underground experts united
https://dataswamp.org/~incal
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-08 3:35 ` Emanuel Berg via Users list for the GNU Emacs text editor
@ 2021-08-08 4:03 ` Hongyi Zhao
2021-08-08 4:09 ` Emanuel Berg via Users list for the GNU Emacs text editor
0 siblings, 1 reply; 31+ messages in thread
From: Hongyi Zhao @ 2021-08-08 4:03 UTC (permalink / raw)
To: Emanuel Berg, help-gnu-emacs
On Sun, Aug 8, 2021 at 11:35 AM Emanuel Berg via Users list for the
GNU Emacs text editor <help-gnu-emacs@gnu.org> wrote:
>
> Hongyi Zhao wrote:
>
> > $ apt-cache pkgnames | grep -- '^w.*-insane'
> > wbritish-insane
> > wamerican-insane
> > wcanadian-insane
>
> Yeah, in Debian the same:
>
> $ aptitude search w.\*-insane
> i wamerican-insane - American English dictionary words for /us
> p wbritish-insane - British English dictionary words for /usr
> p wcanadian-insane - Canadian English dictionary words for /us
Same as yours:
$ aptitude search w.\*-insane
i A wamerican-insane
- American English dictionary words for /usr/share/dict
p wbritish-insane
- British English dictionary words for /usr/share/dict
p wcanadian-insane
- Canadian English dictionary words for /usr/share/dict
> > But I still haven't figured out the convenient way to
> > build/create the latest insane version of English dictionary
> > words lists from its upstream repo, for which I've filed an
> > issue there [2].
>
> you can get the source with
>
> $ git clone https://github.com/en-wl/wordlist
>
> but 'make' (as it says "To build simply type [...] make" in
> the README.md) gives me just errors.
I can run make without any errors, but the insane word list file is
created by a Debian package rule, which is not available directly from
the GitHub repo.
Hongyi
--
Assoc. Prof. Hongyi Zhao <hongyi.zhao@gmail.com>
Theory and Simulation of Materials
Hebei Vocational University of Technology and Engineering
No. 473, Quannan West Street, Xindu District, Xingtai, Hebei province
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-08 4:03 ` Hongyi Zhao
@ 2021-08-08 4:09 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 4:22 ` Hongyi Zhao
0 siblings, 1 reply; 31+ messages in thread
From: Emanuel Berg via Users list for the GNU Emacs text editor @ 2021-08-08 4:09 UTC (permalink / raw)
To: help-gnu-emacs
Hongyi Zhao wrote:
>> you can get the source with
>>
>> $ git clone https://github.com/en-wl/wordlist
>>
>> but 'make' (as it says "To build simply type [...] make" in
>> the README.md) gives me just errors.
>
> I can run make without any errors, but the insane word list
> file is created by a Debian package rule, which is not
> available directly from the GitHub repo.
This one?
#!/usr/bin/make -f
# -*- Makefile -*- $Id: rules,v 1.18 2005/10/15 03:03:48 david Exp $
# Sample debian/rules that uses debhelper.
# GNU copyright 1997 to 1999 by Joey Hess.
# Customized 27 Oct 1999 by David Coe for wenglish, later moved to scowl
# This version is for packages that are architecture independent.
# Uncomment this to turn on verbose mode.
export DH_VERBOSE=1
# This has to be exported to make some magic below work.
export DH_OPTIONS
# In addition to the scowl binary package, we create wamerican-small, wamerican, wamerican-large, wamerican-huge, and the
# corresponding packages for wbritish and wcanadian.
# The medium size packages have no -size part in their names
# These are the scowl extensions (complexity numbers?) that contribute to each word list (i.e. each size);
# the -size parts "-small", "", "-large", and "-huge" correspond to the end of the binary package name:
empty:=
SIZES=small "" large huge insane
SIZE_EXTENSIONS_small:=10 20 35
SIZE_EXTENSIONS:=$(SIZE_EXTENSIONS_small) 40 50
SIZE_EXTENSIONS_large:=$(SIZE_EXTENSIONS) 55 60 70
SIZE_EXTENSIONS_huge:=$(SIZE_EXTENSIONS_large) 80
SIZE_EXTENSIONS_insane:=$(SIZE_EXTENSIONS_huge) 95
export SIZE_EXTENSIONS_small
export SIZE_EXTENSIONS
export SIZE_EXTENSIONS_large
export SIZE_EXTENSIONS_huge
export SIZE_EXTENSIONS_insane
SPELLINGS:= american british canadian
# These are the scowl word list classes we use:
CLASSES:=words proper-names upper contractions
VARIANTS:=1 2
include /usr/share/dpkg/pkg-info.mk
%:
dh $@
override_dh_auto_build:
set -e; \
mkdir -p final_utf8;\
for file in final/*.[0-9][0-9]; do\
iconv -f 'iso8859-1' -t 'utf-8' < $${file} > final_utf8/$$(basename $${file}); \
done;
set -e;\
for SPELLING in $(SPELLINGS); do\
for SIZE in $(SIZES); do\
if [ -n "$$SIZE" ]; then SIZE_NAME="_$$SIZE"; SIZE="-$$SIZE"; else SIZE_NAME=""; SIZE=""; fi; \
echo "The following SCOWL word lists were concatenated and sorted (with duplicates" > w$$SPELLING$$SIZE.scowl-word-lists-used;\
echo "removed) to create this word list (see README.Debian for more details):" >> w$$SPELLING$$SIZE.scowl-word-lists-used;\
for CLASS in $(CLASSES); do\
for EXT in $$(eval echo "\$$""SIZE_EXTENSIONS$$SIZE_NAME"); do\
echo "class $$CLASS ext $$EXT size name $$SIZE_NAME"; \
if [ -f final/english-$$CLASS.$$EXT ]; then\
echo "cat final/english-$$CLASS.$$EXT >> $$SPELLING-english$$SIZE.unsorted";\
cat final/english-$$CLASS.$$EXT >> $$SPELLING-english$$SIZE.unsorted;\
echo " english-$$CLASS.$$EXT" >> w$$SPELLING$$SIZE.scowl-word-lists-used;\
fi;\
for VARIANT in $(VARIANTS); do\
VARIANT_FILE="$${SPELLING}_"; \
if [ "$$VARIANT_FILE" = "american_" ]; then \
VARIANT_FILE=""; \
fi; \
if [ -f final/$${VARIANT_FILE}variant_$$VARIANT-$$CLASS.$$EXT ]; then\
echo "cat final/$${VARIANT_FILE}variant_$$VARIANT-$$CLASS.$$EXT >> $$SPELLING-english$$SIZE.unsorted";\
cat final/$${VARIANT_FILE}variant_$$VARIANT-$$CLASS.$$EXT >> $$SPELLING-english$$SIZE.unsorted;\
echo " $${VARIANT_FILE}variant_$$VARIANT-$$CLASS.$$EXT" >> w$$SPELLING$$SIZE.scowl-word-lists-used;\
fi;\
done;\
if [ "$$SIZE" = "insane" ]; then\
for VARIANT in $(VARIANTS); do\
for VARIANT_FILE in $(SPELLINGS); do \
VARIANT_FILE="$${VARIANT_FILE}_"; \
if [ "$$VARIANT_FILE" = "american_" ]; then \
VARIANT_FILE=""; \
fi; \
if [ -f final/$${VARIANT_FILE}variant_$$VARIANT-$$CLASS.$$EXT ]; then\
echo "cat final/$${VARIANT_FILE}variant_$$VARIANT-$$CLASS.$$EXT >> $$SPELLING-english$$SIZE.unsorted";\
cat final/$${VARIANT_FILE}variant_$$VARIANT-$$CLASS.$$EXT >> $$SPELLING-english$$SIZE.unsorted;\
echo " $${VARIANT_FILE}variant_$$VARIANT-$$CLASS.$$EXT" >> w$$SPELLING$$SIZE.scowl-word-lists-used;\
fi;\
done; \
done; \
for special in final/special_*.$$CLASS; do \
echo "cat $$special >> $$SPELLING-english$$SIZE.unsorted";\
cat $$special >> $$SPELLING-english$$SIZE.unsorted;\
echo " $$special" >> w$$SPELLING$$SIZE.scowl-word-lists-used;\
done;\
fi;\
if [ -f final/$$SPELLING-$$CLASS.$$EXT ]; then\
echo "cat final/$$SPELLING-$$CLASS.$$EXT >> $$SPELLING-english$$SIZE.unsorted";\
cat final/$$SPELLING-$$CLASS.$$EXT >> $$SPELLING-english$$SIZE.unsorted;\
echo " $$SPELLING-$$CLASS.$$EXT" >> w$$SPELLING$$SIZE.scowl-word-lists-used;\
fi;\
done;\
done;\
echo "cat $$SPELLING-english$$SIZE.unsorted | LC_ALL=C sort -u | iconv -t 'utf-8' > $$SPELLING-english$$SIZE; rm $$SPELLING-english$$SIZE.unsorted";\
cat $$SPELLING-english$$SIZE.unsorted | LC_ALL=C sort -u | iconv -f 'iso8859-1' -t 'utf-8' > $$SPELLING-english$$SIZE; rm $$SPELLING-english$$SIZE.unsorted;\
done;\
done
cd speller && $(MAKE) hunspell
override_dh_auto_clean:
rm -rf final_utf8;
set -e;\
for SIZE in $(SIZES); do\
if [ -n "$$SIZE" ]; then SIZE="-$$SIZE"; fi; \
for SPELLING in $(SPELLINGS); do\
rm -f $$SPELLING-english$$SIZE.unsorted $$SPELLING-english$$SIZE $$SPELLING-english$$SIZE.5 w$$SPELLING$$SIZE.scowl-word-lists-used;\
done;\
done
cd speller && $(MAKE) clean
INSTALL_WORDLISTS=$(patsubst %-"",%,$(foreach spelling,$(SPELLINGS),$(foreach size,$(SIZES),install-w$(spelling)-$(size))))
override_dh_auto_install: install-scowl install-hunspell $(INSTALL_WORDLISTS)
installdeb-wordlist -pwamerican --noscripts
installdeb-wordlist --no-package=wamerican --no-package=scowl --no-package=hunspell-en-us --no-package=hunspell-en-au --no-package=hunspell-en-ca
installdeb-hunspell -phunspell-en-ca -phunspell-en-au -phunspell-en-us
install-scowl:
dh_installdirs --package=scowl
dh_install --package=scowl final_utf8/*.[0-9][0-9] usr/share/dict/scowl
dh_installdocs --package=scowl README debian/README.Debian
install-hunspell:
dh_install --package=hunspell-en-us
dh_install --package=hunspell-en-au
dh_install --package=hunspell-en-ca
override_dh_auto_test:
echo "doing nothing";
override_dh_gencontrol:
dh_gencontrol -Nhunspell-en-us -Nhunspell-en-au -Nhunspell-en-ca
dh_gencontrol -phunspell-en-us -phunspell-en-au -phunspell-en-ca -- -v1:$(DEB_VERSION)
WORDLIST=$(shell echo $(*)|sed -re 's/^w([a-z]*)(-*[a-z]*)/\1-english\2/')
$(INSTALL_WORDLISTS): install-%: install-scowl
dh_testdir
dh_testroot
dh_installdirs --package=$(*) usr/share/dict
dh_install --package=$(*) $(shell echo $(*)|sed -re \
's/^w([a-z]*)(-*[a-z]*)/\1-english\2/') usr/share/dict
dh_installdocs --package=$(*) $(*).scowl-word-lists-used debian/README.Debian
sed "s/WORDLIST/$(WORDLIST)/g" < debian/wordlist_manpage_template > $(WORDLIST).5
dh_installman --package=$(*) $(WORDLIST).5
# this is the install-w$(SPELLING)-$(VARIANT) rule
.PHONY: $(foreach spelling,$(SPELLINGS),$(foreach size,$(SIZES),install-w$(spelling)-$(size)))
--
underground experts united
https://dataswamp.org/~incal
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-08 3:58 ` Emanuel Berg via Users list for the GNU Emacs text editor
@ 2021-08-08 4:16 ` Hongyi Zhao
2021-08-08 4:40 ` Emanuel Berg via Users list for the GNU Emacs text editor
0 siblings, 1 reply; 31+ messages in thread
From: Hongyi Zhao @ 2021-08-08 4:16 UTC (permalink / raw)
To: Emanuel Berg, help-gnu-emacs
On Sun, Aug 8, 2021 at 11:58 AM Emanuel Berg via Users list for the
GNU Emacs text editor <help-gnu-emacs@gnu.org> wrote:
>
> Hongyi Zhao wrote:
>
> > combine the wamerican-insane and Webster_s_Unabridged_3 into
> > one even bigger word list file, which includes 790592
> > entries at the moment:
> >
> > $ awk '!a[$0]++' /usr/share/dict/american-english-insane
> > Webster_s_Unabridged_3.txt |
> > tee ~/american-english-insane-Webster_s_Unabridged | wc
> > 790592 892589 8649982
>
> Hm, is awk '!a[$0]++' faster than sort -u?
The above awk code keep the occurrences order of all words appeared in
the original word list files.
> $ sort -u A B > AB
> $ wc -l AB
But the sort command will sort them accordingly. Keep in mind that the
ispell and any autocompletion tools/frameworks needs the already well
sorted word list file for achieving affordable performance when the
word list file is huge, at least this is the case for Emacs's ispell
initialization stage. Out of this consideration, I don't want to
change the occurrences order of the words given in the original word
list files.
> Where did you find Webster_s_Unabridged_3.txt ?
This file is built by myself, and I have given the specific steps to
create it [1]:
$ sudo apt-get install python3-tk tix
# pyenv python environment for this operation:
$ pyenv shell datasci
$ pip install gobject PyGObject pyglossary
$ mkdir -p ~/.stardict/dic && cd $_
$ curl -O http://download.huzheng.org/bigdict/stardict-Webster_s_Unabridged_3-2.4.2.tar.bz2
$ tar xvf stardict-Webster_s_Unabridged_3-2.4.2.tar.bz2
$ cd stardict-Webster_s_Unabridged_3-2.4.2
$ pyglossary Webster_s_Unabridged_3.ifo Webster_s_Unabridged_3.csv
$ awk -F, 'NR > 8 {sub(/^["]/,"",$1);sub(/["]$/,"",$1);print $1}'
Webster_s_Unabridged_3.csv > Webster_s_Unabridged_3.txt
[1] https://github.com/company-mode/company-mode/issues/1146#issuecomment-886172208
Best regards
--
Assoc. Prof. Hongyi Zhao <hongyi.zhao@gmail.com>
Theory and Simulation of Materials
Hebei Vocational University of Technology and Engineering
No. 473, Quannan West Street, Xindu District, Xingtai, Hebei province
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-08 4:09 ` Emanuel Berg via Users list for the GNU Emacs text editor
@ 2021-08-08 4:22 ` Hongyi Zhao
0 siblings, 0 replies; 31+ messages in thread
From: Hongyi Zhao @ 2021-08-08 4:22 UTC (permalink / raw)
To: Emanuel Berg, help-gnu-emacs
On Sun, Aug 8, 2021 at 12:10 PM Emanuel Berg via Users list for the
GNU Emacs text editor <help-gnu-emacs@gnu.org> wrote:
>
> Hongyi Zhao wrote:
>
> >> you can get the source with
> >>
> >> $ git clone https://github.com/en-wl/wordlist
> >>
> >> but 'make' (as it says "To build simply type [...] make" in
> >> the README.md) gives me just errors.
> >
> > I can run make without any errors, but the insane word list
> > file is created by a Debian package rule, which is not
> > available directly from the GitHub repo.
>
> This one?
Similar, but I'm using the following method:
$ apt source wamerican-insane
$ cd scowl-2018.04.16/
$ rg -uu -l american-insane .
./debian/control
./debian/wamerican-insane.info-wordlist
$ rg -uu -l _insane .
./.pc/applied-patches
./debian/patches/series
./debian/rules
> #!/usr/bin/make -f
> # -*- Makefile -*- $Id: rules,v 1.18 2005/10/15 03:03:48 david Exp $
> # Sample debian/rules that uses debhelper.
> # GNU copyright 1997 to 1999 by Joey Hess.
> # Customized 27 Oct 1999 by David Coe for wenglish, later moved to scowl
> # This version is for packages that are architecture independent.
>
> # Uncomment this to turn on verbose mode.
> export DH_VERBOSE=1
>
> # This has to be exported to make some magic below work.
> export DH_OPTIONS
>
> # In addition to the scowl binary package, we create wamerican-small, wamerican, wamerican-large, wamerican-huge, and the
> # corresponding packages for wbritish and wcanadian.
> # The medium size packages have no -size part in their names
> # These are the scowl extensions (complexity numbers?) that contribute to each word list (i.e. each size);
> # the -size parts "-small", "", "-large", and "-huge" correspond to the end of the binary package name:
> empty:=
> SIZES=small "" large huge insane
> SIZE_EXTENSIONS_small:=10 20 35
> SIZE_EXTENSIONS:=$(SIZE_EXTENSIONS_small) 40 50
> SIZE_EXTENSIONS_large:=$(SIZE_EXTENSIONS) 55 60 70
> SIZE_EXTENSIONS_huge:=$(SIZE_EXTENSIONS_large) 80
> SIZE_EXTENSIONS_insane:=$(SIZE_EXTENSIONS_huge) 95
> export SIZE_EXTENSIONS_small
> export SIZE_EXTENSIONS
> export SIZE_EXTENSIONS_large
> export SIZE_EXTENSIONS_huge
> export SIZE_EXTENSIONS_insane
>
> SPELLINGS:= american british canadian
>
> # These are the scowl word list classes we use:
> CLASSES:=words proper-names upper contractions
> VARIANTS:=1 2
>
> include /usr/share/dpkg/pkg-info.mk
>
> %:
> dh $@
>
> override_dh_auto_build:
> set -e; \
> mkdir -p final_utf8;\
> for file in final/*.[0-9][0-9]; do\
> iconv -f 'iso8859-1' -t 'utf-8' < $${file} > final_utf8/$$(basename $${file}); \
> done;
> set -e;\
> for SPELLING in $(SPELLINGS); do\
> for SIZE in $(SIZES); do\
> if [ -n "$$SIZE" ]; then SIZE_NAME="_$$SIZE"; SIZE="-$$SIZE"; else SIZE_NAME=""; SIZE=""; fi; \
> echo "The following SCOWL word lists were concatenated and sorted (with duplicates" > w$$SPELLING$$SIZE.scowl-word-lists-used;\
> echo "removed) to create this word list (see README.Debian for more details):" >> w$$SPELLING$$SIZE.scowl-word-lists-used;\
> for CLASS in $(CLASSES); do\
> for EXT in $$(eval echo "\$$""SIZE_EXTENSIONS$$SIZE_NAME"); do\
> echo "class $$CLASS ext $$EXT size name $$SIZE_NAME"; \
> if [ -f final/english-$$CLASS.$$EXT ]; then\
> echo "cat final/english-$$CLASS.$$EXT >> $$SPELLING-english$$SIZE.unsorted";\
> cat final/english-$$CLASS.$$EXT >> $$SPELLING-english$$SIZE.unsorted;\
> echo " english-$$CLASS.$$EXT" >> w$$SPELLING$$SIZE.scowl-word-lists-used;\
> fi;\
> for VARIANT in $(VARIANTS); do\
> VARIANT_FILE="$${SPELLING}_"; \
> if [ "$$VARIANT_FILE" = "american_" ]; then \
> VARIANT_FILE=""; \
> fi; \
> if [ -f final/$${VARIANT_FILE}variant_$$VARIANT-$$CLASS.$$EXT ]; then\
> echo "cat final/$${VARIANT_FILE}variant_$$VARIANT-$$CLASS.$$EXT >> $$SPELLING-english$$SIZE.unsorted";\
> cat final/$${VARIANT_FILE}variant_$$VARIANT-$$CLASS.$$EXT >> $$SPELLING-english$$SIZE.unsorted;\
> echo " $${VARIANT_FILE}variant_$$VARIANT-$$CLASS.$$EXT" >> w$$SPELLING$$SIZE.scowl-word-lists-used;\
> fi;\
> done;\
> if [ "$$SIZE" = "insane" ]; then\
> for VARIANT in $(VARIANTS); do\
> for VARIANT_FILE in $(SPELLINGS); do \
> VARIANT_FILE="$${VARIANT_FILE}_"; \
> if [ "$$VARIANT_FILE" = "american_" ]; then \
> VARIANT_FILE=""; \
> fi; \
> if [ -f final/$${VARIANT_FILE}variant_$$VARIANT-$$CLASS.$$EXT ]; then\
> echo "cat final/$${VARIANT_FILE}variant_$$VARIANT-$$CLASS.$$EXT >> $$SPELLING-english$$SIZE.unsorted";\
> cat final/$${VARIANT_FILE}variant_$$VARIANT-$$CLASS.$$EXT >> $$SPELLING-english$$SIZE.unsorted;\
> echo " $${VARIANT_FILE}variant_$$VARIANT-$$CLASS.$$EXT" >> w$$SPELLING$$SIZE.scowl-word-lists-used;\
> fi;\
> done; \
> done; \
> for special in final/special_*.$$CLASS; do \
> echo "cat $$special >> $$SPELLING-english$$SIZE.unsorted";\
> cat $$special >> $$SPELLING-english$$SIZE.unsorted;\
> echo " $$special" >> w$$SPELLING$$SIZE.scowl-word-lists-used;\
> done;\
> fi;\
> if [ -f final/$$SPELLING-$$CLASS.$$EXT ]; then\
> echo "cat final/$$SPELLING-$$CLASS.$$EXT >> $$SPELLING-english$$SIZE.unsorted";\
> cat final/$$SPELLING-$$CLASS.$$EXT >> $$SPELLING-english$$SIZE.unsorted;\
> echo " $$SPELLING-$$CLASS.$$EXT" >> w$$SPELLING$$SIZE.scowl-word-lists-used;\
> fi;\
> done;\
> done;\
> echo "cat $$SPELLING-english$$SIZE.unsorted | LC_ALL=C sort -u | iconv -t 'utf-8' > $$SPELLING-english$$SIZE; rm $$SPELLING-english$$SIZE.unsorted";\
> cat $$SPELLING-english$$SIZE.unsorted | LC_ALL=C sort -u | iconv -f 'iso8859-1' -t 'utf-8' > $$SPELLING-english$$SIZE; rm $$SPELLING-english$$SIZE.unsorted;\
> done;\
> done
> cd speller && $(MAKE) hunspell
>
> override_dh_auto_clean:
> rm -rf final_utf8;
> set -e;\
> for SIZE in $(SIZES); do\
> if [ -n "$$SIZE" ]; then SIZE="-$$SIZE"; fi; \
> for SPELLING in $(SPELLINGS); do\
> rm -f $$SPELLING-english$$SIZE.unsorted $$SPELLING-english$$SIZE $$SPELLING-english$$SIZE.5 w$$SPELLING$$SIZE.scowl-word-lists-used;\
> done;\
> done
> cd speller && $(MAKE) clean
>
> INSTALL_WORDLISTS=$(patsubst %-"",%,$(foreach spelling,$(SPELLINGS),$(foreach size,$(SIZES),install-w$(spelling)-$(size))))
>
> override_dh_auto_install: install-scowl install-hunspell $(INSTALL_WORDLISTS)
> installdeb-wordlist -pwamerican --noscripts
> installdeb-wordlist --no-package=wamerican --no-package=scowl --no-package=hunspell-en-us --no-package=hunspell-en-au --no-package=hunspell-en-ca
> installdeb-hunspell -phunspell-en-ca -phunspell-en-au -phunspell-en-us
>
> install-scowl:
> dh_installdirs --package=scowl
>
> dh_install --package=scowl final_utf8/*.[0-9][0-9] usr/share/dict/scowl
> dh_installdocs --package=scowl README debian/README.Debian
>
> install-hunspell:
> dh_install --package=hunspell-en-us
> dh_install --package=hunspell-en-au
> dh_install --package=hunspell-en-ca
>
> override_dh_auto_test:
> echo "doing nothing";
>
> override_dh_gencontrol:
> dh_gencontrol -Nhunspell-en-us -Nhunspell-en-au -Nhunspell-en-ca
> dh_gencontrol -phunspell-en-us -phunspell-en-au -phunspell-en-ca -- -v1:$(DEB_VERSION)
>
>
> WORDLIST=$(shell echo $(*)|sed -re 's/^w([a-z]*)(-*[a-z]*)/\1-english\2/')
> $(INSTALL_WORDLISTS): install-%: install-scowl
> dh_testdir
> dh_testroot
> dh_installdirs --package=$(*) usr/share/dict
> dh_install --package=$(*) $(shell echo $(*)|sed -re \
> 's/^w([a-z]*)(-*[a-z]*)/\1-english\2/') usr/share/dict
> dh_installdocs --package=$(*) $(*).scowl-word-lists-used debian/README.Debian
> sed "s/WORDLIST/$(WORDLIST)/g" < debian/wordlist_manpage_template > $(WORDLIST).5
> dh_installman --package=$(*) $(WORDLIST).5
>
> # this is the install-w$(SPELLING)-$(VARIANT) rule
> .PHONY: $(foreach spelling,$(SPELLINGS),$(foreach size,$(SIZES),install-w$(spelling)-$(size)))
>
> --
> underground experts united
> https://dataswamp.org/~incal
>
>
--
Assoc. Prof. Hongyi Zhao <hongyi.zhao@gmail.com>
Theory and Simulation of Materials
Hebei Vocational University of Technology and Engineering
No. 473, Quannan West Street, Xindu District, Xingtai, Hebei province
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-08 4:16 ` Hongyi Zhao
@ 2021-08-08 4:40 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 4:52 ` Hongyi Zhao
0 siblings, 1 reply; 31+ messages in thread
From: Emanuel Berg via Users list for the GNU Emacs text editor @ 2021-08-08 4:40 UTC (permalink / raw)
To: help-gnu-emacs
Hongyi Zhao wrote:
> But the sort command will sort them accordingly. Keep in
> mind that the ispell and any autocompletion tools/frameworks
> needs the already well sorted word list file for achieving
> affordable performance when the word list file is huge, at
> least this is the case for Emacs's ispell initialization
> stage. Out of this consideration, I don't want to change the
> occurrences order of the words given in the original word
> list files.
They are sorted, here are the first 10 words of
/usr/share/dict/american-english-insane
A
A'asia
A's
AATech
AATech's
AAeE
AAeE's
AAgr
AAgr's
AAvTech
AAvTech's
ABEd
so one just needs to find the correct sort(1) options to do
that :) (And if you can't, just sort and see what happens...)
"awk '!a[$0]++' A B" doesn't sort, but that's what you want as
you say.
--
underground experts united
https://dataswamp.org/~incal
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-08 4:40 ` Emanuel Berg via Users list for the GNU Emacs text editor
@ 2021-08-08 4:52 ` Hongyi Zhao
2021-08-08 4:58 ` Emanuel Berg via Users list for the GNU Emacs text editor
0 siblings, 1 reply; 31+ messages in thread
From: Hongyi Zhao @ 2021-08-08 4:52 UTC (permalink / raw)
To: Emanuel Berg, help-gnu-emacs
On Sun, Aug 8, 2021 at 12:40 PM Emanuel Berg via Users list for the
GNU Emacs text editor <help-gnu-emacs@gnu.org> wrote:
>
> Hongyi Zhao wrote:
>
> > But the sort command will sort them accordingly. Keep in
> > mind that the ispell and any autocompletion tools/frameworks
> > needs the already well sorted word list file for achieving
> > affordable performance when the word list file is huge, at
> > least this is the case for Emacs's ispell initialization
> > stage. Out of this consideration, I don't want to change the
> > occurrences order of the words given in the original word
> > list files.
>
> They are sorted, here are the first 10 words of
> /usr/share/dict/american-english-insane
>
> A
> A'asia
> A's
> AATech
> AATech's
> AAeE
> AAeE's
> AAgr
> AAgr's
> AAvTech
> AAvTech's
> ABEd
>
> so one just needs to find the correct sort(1) options to do
> that :) (And if you can't, just sort and see what happens...)
>
> "awk '!a[$0]++' A B" doesn't sort, but that's what you want as
> you say.
Yes, considering that the input word list files have already well
sorted in lexicographically order, I just want to delete duplicates
and leave the rest in their original order.
Best regards,
--
Assoc. Prof. Hongyi Zhao <hongyi.zhao@gmail.com>
Theory and Simulation of Materials
Hebei Vocational University of Technology and Engineering
No. 473, Quannan West Street, Xindu District, Xingtai, Hebei province
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-08 4:52 ` Hongyi Zhao
@ 2021-08-08 4:58 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 5:08 ` Hongyi Zhao
0 siblings, 1 reply; 31+ messages in thread
From: Emanuel Berg via Users list for the GNU Emacs text editor @ 2021-08-08 4:58 UTC (permalink / raw)
To: help-gnu-emacs
Hongyi Zhao wrote:
> Yes, considering that the input word list files have already
> well sorted in lexicographically order, I just want to
> delete duplicates and leave the rest in their
> original order.
Not with two files the end result doesn't get sorted just by
dropping the duplicates,
$ echo '1\n2\n3\na\nd' > one.txt
$ echo 'a\nb\nc\n1\n4' > abc.txt
$ awk '!a[$0]++' one.txt abc.txt
1
2
3
a
d
b
c
4
However ...
$ sort -u one.txt abc.txt
1
2
3
4
a
b
c
d
--
underground experts united
https://dataswamp.org/~incal
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-08 4:58 ` Emanuel Berg via Users list for the GNU Emacs text editor
@ 2021-08-08 5:08 ` Hongyi Zhao
2021-08-08 5:19 ` Hongyi Zhao
0 siblings, 1 reply; 31+ messages in thread
From: Hongyi Zhao @ 2021-08-08 5:08 UTC (permalink / raw)
To: Emanuel Berg, help-gnu-emacs
On Sun, Aug 8, 2021 at 12:59 PM Emanuel Berg via Users list for the
GNU Emacs text editor <help-gnu-emacs@gnu.org> wrote:
>
> Hongyi Zhao wrote:
>
> > Yes, considering that the input word list files have already
> > well sorted in lexicographically order, I just want to
> > delete duplicates and leave the rest in their
> > original order.
>
> Not with two files the end result doesn't get sorted just by
> dropping the duplicates,
>
> $ echo '1\n2\n3\na\nd' > one.txt
> $ echo 'a\nb\nc\n1\n4' > abc.txt
> $ awk '!a[$0]++' one.txt abc.txt
> 1
> 2
> 3
> a
> d
> b
> c
> 4
>
> However ...
>
> $ sort -u one.txt abc.txt
> 1
> 2
> 3
> 4
> a
> b
> c
> d
Thank you for pointing this out. I should switch to your sort method:
$ sort -u /usr/share/dict/american-english-insane
Webster_s_Unabridged_3.txt >
american-english-insane-Webster_s_Unabridged_3
Best regards
--
Assoc. Prof. Hongyi Zhao <hongyi.zhao@gmail.com>
Theory and Simulation of Materials
Hebei Vocational University of Technology and Engineering
No. 473, Quannan West Street, Xindu District, Xingtai, Hebei province
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-08 5:08 ` Hongyi Zhao
@ 2021-08-08 5:19 ` Hongyi Zhao
2021-08-08 6:53 ` Emanuel Berg via Users list for the GNU Emacs text editor
0 siblings, 1 reply; 31+ messages in thread
From: Hongyi Zhao @ 2021-08-08 5:19 UTC (permalink / raw)
To: Emanuel Berg, help-gnu-emacs
On Sun, Aug 8, 2021 at 1:08 PM Hongyi Zhao <hongyi.zhao@gmail.com> wrote:
>
> On Sun, Aug 8, 2021 at 12:59 PM Emanuel Berg via Users list for the
> GNU Emacs text editor <help-gnu-emacs@gnu.org> wrote:
> >
> > Hongyi Zhao wrote:
> >
> > > Yes, considering that the input word list files have already
> > > well sorted in lexicographically order, I just want to
> > > delete duplicates and leave the rest in their
> > > original order.
> >
> > Not with two files the end result doesn't get sorted just by
> > dropping the duplicates,
> >
> > $ echo '1\n2\n3\na\nd' > one.txt
> > $ echo 'a\nb\nc\n1\n4' > abc.txt
> > $ awk '!a[$0]++' one.txt abc.txt
> > 1
> > 2
> > 3
> > a
> > d
> > b
> > c
> > 4
> >
> > However ...
> >
> > $ sort -u one.txt abc.txt
> > 1
> > 2
> > 3
> > 4
> > a
> > b
> > c
> > d
>
> Thank you for pointing this out. I should switch to your sort method:
>
> $ sort -u /usr/share/dict/american-english-insane
> Webster_s_Unabridged_3.txt >
> american-english-insane-Webster_s_Unabridged_3
According to the discussion here [1], maybe the following is better:
$ LC_ALL=C sort -u /usr/share/dict/american-english-insane
Webster_s_Unabridged_3.txt >
american-english-insane-Webster_s_Unabridged_3
[1] https://superuser.com/questions/631402/sort-lexicographically-in-bash
Best regards
--
Assoc. Prof. Hongyi Zhao <hongyi.zhao@gmail.com>
Theory and Simulation of Materials
Hebei Vocational University of Technology and Engineering
No. 473, Quannan West Street, Xindu District, Xingtai, Hebei province
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-08 5:19 ` Hongyi Zhao
@ 2021-08-08 6:53 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 7:14 ` Hongyi Zhao
0 siblings, 1 reply; 31+ messages in thread
From: Emanuel Berg via Users list for the GNU Emacs text editor @ 2021-08-08 6:53 UTC (permalink / raw)
To: help-gnu-emacs
Hongyi Zhao wrote:
> According to the discussion here [1], maybe the following is
> better:
>
> $ LC_ALL=C sort -u /usr/share/dict/american-english-insane
> Webster_s_Unabridged_3.txt >
> american-english-insane-Webster_s_Unabridged_3
You can always sort a consecutive subset compare the result
from the original file to see if it sorts the same way.
I'm unsure it has to be sorted exactly the same way, I mean
why? And how would that work? Just sort it :)
--
underground experts united
https://dataswamp.org/~incal
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-08 6:53 ` Emanuel Berg via Users list for the GNU Emacs text editor
@ 2021-08-08 7:14 ` Hongyi Zhao
2021-08-08 7:22 ` Emanuel Berg via Users list for the GNU Emacs text editor
0 siblings, 1 reply; 31+ messages in thread
From: Hongyi Zhao @ 2021-08-08 7:14 UTC (permalink / raw)
To: Emanuel Berg, help-gnu-emacs
On Sun, Aug 8, 2021 at 2:54 PM Emanuel Berg via Users list for the GNU
Emacs text editor <help-gnu-emacs@gnu.org> wrote:
>
> Hongyi Zhao wrote:
>
> > According to the discussion here [1], maybe the following is
> > better:
> >
> > $ LC_ALL=C sort -u /usr/share/dict/american-english-insane
> > Webster_s_Unabridged_3.txt >
> > american-english-insane-Webster_s_Unabridged_3
>
> You can always sort a consecutive subset compare the result
> from the original file to see if it sorts the same way.
> I'm unsure it has to be sorted exactly the same way, I mean
> why? And how would that work? Just sort it :)
I'm unsure too. Based on my tries, it seems that the results obtained
by default sort options works well in Emacs will a decent response
time.
Regards
--
Assoc. Prof. Hongyi Zhao <hongyi.zhao@gmail.com>
Theory and Simulation of Materials
Hebei Vocational University of Technology and Engineering
No. 473, Quannan West Street, Xindu District, Xingtai, Hebei province
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-08 7:14 ` Hongyi Zhao
@ 2021-08-08 7:22 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 7:43 ` Hongyi Zhao
0 siblings, 1 reply; 31+ messages in thread
From: Emanuel Berg via Users list for the GNU Emacs text editor @ 2021-08-08 7:22 UTC (permalink / raw)
To: help-gnu-emacs
Hongyi Zhao wrote:
>> You can always sort a consecutive subset compare the result
>> from the original file to see if it sorts the same way.
>> I'm unsure it has to be sorted exactly the same way, I mean
>> why? And how would that work? Just sort it :)
>
> I'm unsure too. Based on my tries, it seems that the results
> obtained by default sort options works well in Emacs will
> a decent response time.
Then just start collecting every word list you can find.
-insane isn't that insane. We can call the next one something
more positive, perhaps. wamerican-healthy?
--
underground experts united
https://dataswamp.org/~incal
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-08 7:22 ` Emanuel Berg via Users list for the GNU Emacs text editor
@ 2021-08-08 7:43 ` Hongyi Zhao
2021-08-08 7:51 ` Hongyi Zhao
0 siblings, 1 reply; 31+ messages in thread
From: Hongyi Zhao @ 2021-08-08 7:43 UTC (permalink / raw)
To: Emanuel Berg, help-gnu-emacs
On Sun, Aug 8, 2021 at 3:23 PM Emanuel Berg via Users list for the GNU
Emacs text editor <help-gnu-emacs@gnu.org> wrote:
>
> Hongyi Zhao wrote:
>
> >> You can always sort a consecutive subset compare the result
> >> from the original file to see if it sorts the same way.
> >> I'm unsure it has to be sorted exactly the same way, I mean
> >> why? And how would that work? Just sort it :)
> >
> > I'm unsure too. Based on my tries, it seems that the results
> > obtained by default sort options works well in Emacs will
> > a decent response time.
>
> Then just start collecting every word list you can find.
>
> -insane isn't that insane. We can call the next one something
> more positive, perhaps. wamerican-healthy?
Good, see the following more comprehensive way:
1. Generate the latest version of the insane word list using its
official tool [1].
2. git clone https://github.com/dwyl/english-words.git dwyl/english-words.git
3. Generate the Webster_s_Unabridged_3.txt as I've said before.
4. Combine all the above:
$ sort -u ilius/Webster_s_Unabridged_3.txt
dwyl/english-words.git/words.txt en-wl/SCOWL-wl/words.txt -o
english-words-healthy
[1] http://app.aspell.net/create
Regards
--
Assoc. Prof. Hongyi Zhao <hongyi.zhao@gmail.com>
Theory and Simulation of Materials
Hebei Vocational University of Technology and Engineering
No. 473, Quannan West Street, Xindu District, Xingtai, Hebei province
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-08 7:43 ` Hongyi Zhao
@ 2021-08-08 7:51 ` Hongyi Zhao
2021-08-08 8:02 ` Emanuel Berg via Users list for the GNU Emacs text editor
0 siblings, 1 reply; 31+ messages in thread
From: Hongyi Zhao @ 2021-08-08 7:51 UTC (permalink / raw)
To: Emanuel Berg, help-gnu-emacs
On Sun, Aug 8, 2021 at 3:43 PM Hongyi Zhao <hongyi.zhao@gmail.com> wrote:
>
> On Sun, Aug 8, 2021 at 3:23 PM Emanuel Berg via Users list for the GNU
> Emacs text editor <help-gnu-emacs@gnu.org> wrote:
> >
> > Hongyi Zhao wrote:
> >
> > >> You can always sort a consecutive subset compare the result
> > >> from the original file to see if it sorts the same way.
> > >> I'm unsure it has to be sorted exactly the same way, I mean
> > >> why? And how would that work? Just sort it :)
> > >
> > > I'm unsure too. Based on my tries, it seems that the results
> > > obtained by default sort options works well in Emacs will
> > > a decent response time.
> >
> > Then just start collecting every word list you can find.
> >
> > -insane isn't that insane. We can call the next one something
> > more positive, perhaps. wamerican-healthy?
>
> Good, see the following more comprehensive way:
>
> 1. Generate the latest version of the insane word list using its
> official tool [1].
> 2. git clone https://github.com/dwyl/english-words.git dwyl/english-words.git
> 3. Generate the Webster_s_Unabridged_3.txt as I've said before.
> 4. Combine all the above:
>
> $ sort -u ilius/Webster_s_Unabridged_3.txt
> dwyl/english-words.git/words.txt en-wl/SCOWL-wl/words.txt -o
> english-words-healthy
Now, the total entries are as follows:
$ wc -l english-words-healthy
841181 english-words-healthy
Hongyi
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-08 7:51 ` Hongyi Zhao
@ 2021-08-08 8:02 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 8:27 ` Hongyi Zhao
0 siblings, 1 reply; 31+ messages in thread
From: Emanuel Berg via Users list for the GNU Emacs text editor @ 2021-08-08 8:02 UTC (permalink / raw)
To: help-gnu-emacs
Hongyi Zhao wrote:
>> $ sort -u ilius/Webster_s_Unabridged_3.txt
>> dwyl/english-words.git/words.txt en-wl/SCOWL-wl/words.txt -o
>> english-words-healthy
>
> Now, the total entries are as follows:
>
> $ wc -l english-words-healthy
> 841181 english-words-healthy
28% more than -insane! (65493)
(format "%.0f%%" (- (* 100 (/ 841181 654936.0)) 100))
... I forgot how one is supposed to do that :$
--
underground experts united
https://dataswamp.org/~incal
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-08 8:02 ` Emanuel Berg via Users list for the GNU Emacs text editor
@ 2021-08-08 8:27 ` Hongyi Zhao
2021-08-08 8:37 ` Hongyi Zhao
0 siblings, 1 reply; 31+ messages in thread
From: Hongyi Zhao @ 2021-08-08 8:27 UTC (permalink / raw)
To: Emanuel Berg, help-gnu-emacs
On Sun, Aug 8, 2021 at 4:02 PM Emanuel Berg via Users list for the GNU
Emacs text editor <help-gnu-emacs@gnu.org> wrote:
>
> Hongyi Zhao wrote:
>
> >> $ sort -u ilius/Webster_s_Unabridged_3.txt
> >> dwyl/english-words.git/words.txt en-wl/SCOWL-wl/words.txt -o
> >> english-words-healthy
> >
> > Now, the total entries are as follows:
> >
> > $ wc -l english-words-healthy
> > 841181 english-words-healthy
>
> 28% more than -insane! (65493)
In the steps I posted above to build this comprehensive word list, the
latest American insane word list, created using the official online
tool [1], includes more entries:
$ wc -l words.txt
664758 words.txt
[1] http://app.aspell.net/create
Regards,
Hongyi
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-08 8:27 ` Hongyi Zhao
@ 2021-08-08 8:37 ` Hongyi Zhao
2021-08-08 10:14 ` Emanuel Berg via Users list for the GNU Emacs text editor
0 siblings, 1 reply; 31+ messages in thread
From: Hongyi Zhao @ 2021-08-08 8:37 UTC (permalink / raw)
To: Emanuel Berg, help-gnu-emacs
[-- Attachment #1: Type: text/plain, Size: 1140 bytes --]
On Sun, Aug 8, 2021 at 4:27 PM Hongyi Zhao <hongyi.zhao@gmail.com> wrote:
>
> On Sun, Aug 8, 2021 at 4:02 PM Emanuel Berg via Users list for the GNU
> Emacs text editor <help-gnu-emacs@gnu.org> wrote:
> >
> > Hongyi Zhao wrote:
> >
> > >> $ sort -u ilius/Webster_s_Unabridged_3.txt
> > >> dwyl/english-words.git/words.txt en-wl/SCOWL-wl/words.txt -o
> > >> english-words-healthy
> > >
> > > Now, the total entries are as follows:
> > >
> > > $ wc -l english-words-healthy
> > > 841181 english-words-healthy
> >
> > 28% more than -insane! (65493)
>
> In the steps I posted above to build this comprehensive word list, the
> latest American insane word list, created using the official online
> tool [1], includes more entries:
>
> $ wc -l words.txt
> 664758 words.txt
>
> [1] http://app.aspell.net/create
See the attached file for the corresponding options used by me to
build the latest American insane word list.
Regards
--
Assoc. Prof. Hongyi Zhao <hongyi.zhao@gmail.com>
Theory and Simulation of Materials
Hebei Vocational University of Technology and Engineering
No. 473, Quannan West Street, Xindu District, Xingtai, Hebei province
[-- Attachment #2: wamerican-insane.png --]
[-- Type: image/png, Size: 182541 bytes --]
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-08 8:37 ` Hongyi Zhao
@ 2021-08-08 10:14 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 13:09 ` Hongyi Zhao
0 siblings, 1 reply; 31+ messages in thread
From: Emanuel Berg via Users list for the GNU Emacs text editor @ 2021-08-08 10:14 UTC (permalink / raw)
To: help-gnu-emacs
Hongyi Zhao wrote:
> See the attached file for the corresponding options used by
> me to build the latest American insane word list.
But everyone doesn't have to build the list, right? You can
just publish it. Maybe zip it.
--
underground experts united
https://dataswamp.org/~incal
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-08 10:14 ` Emanuel Berg via Users list for the GNU Emacs text editor
@ 2021-08-08 13:09 ` Hongyi Zhao
2021-08-08 16:27 ` Emanuel Berg via Users list for the GNU Emacs text editor
0 siblings, 1 reply; 31+ messages in thread
From: Hongyi Zhao @ 2021-08-08 13:09 UTC (permalink / raw)
To: Emanuel Berg, help-gnu-emacs
On Sun, Aug 8, 2021 at 6:14 PM Emanuel Berg via Users list for the GNU
Emacs text editor <help-gnu-emacs@gnu.org> wrote:
>
> Hongyi Zhao wrote:
>
> > See the attached file for the corresponding options used by
> > me to build the latest American insane word list.
>
> But everyone doesn't have to build the list, right? You can
> just publish it. Maybe zip it.
I've created a new GitHub repository [1] for this purpose.
[1] https://github.com/hongyi-zhao/english-wordlist
Best regards
--
Assoc. Prof. Hongyi Zhao <hongyi.zhao@gmail.com>
Theory and Simulation of Materials
Hebei Vocational University of Technology and Engineering
No. 473, Quannan West Street, Xindu District, Xingtai, Hebei province
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-08 13:09 ` Hongyi Zhao
@ 2021-08-08 16:27 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-09 0:06 ` Hongyi Zhao
0 siblings, 1 reply; 31+ messages in thread
From: Emanuel Berg via Users list for the GNU Emacs text editor @ 2021-08-08 16:27 UTC (permalink / raw)
To: help-gnu-emacs
Hongyi Zhao wrote:
> I've created a new GitHub repository [1] for this purpose.
>
> [1] https://github.com/hongyi-zhao/english-wordlist
\o/
Straight gangsta!
> --
> Assoc. Prof. Hongyi Zhao <hongyi.zhao@gmail.com>
> Theory and Simulation of Materials
> Hebei Vocational University of Technology and Engineering
> No. 473, Quannan West Street, Xindu District, Xingtai, Hebei
> province
Oups, impressive, but incorrect signature.
RFC 3676, section 4.3 (Usenet Signature Convention)
https://www.ietf.org/rfc/rfc3676.txt
BTW what does "[v]ocational" mean?
https://en.wikipedia.org/wiki/Vocational_education
--
underground experts united
https://dataswamp.org/~incal
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-08 16:27 ` Emanuel Berg via Users list for the GNU Emacs text editor
@ 2021-08-09 0:06 ` Hongyi Zhao
2021-08-09 0:34 ` Emanuel Berg via Users list for the GNU Emacs text editor
0 siblings, 1 reply; 31+ messages in thread
From: Hongyi Zhao @ 2021-08-09 0:06 UTC (permalink / raw)
To: Emanuel Berg, help-gnu-emacs
On Mon, Aug 9, 2021 at 12:30 AM Emanuel Berg via Users list for the
GNU Emacs text editor <help-gnu-emacs@gnu.org> wrote:
>
> Hongyi Zhao wrote:
>
> > I've created a new GitHub repository [1] for this purpose.
> >
> > [1] https://github.com/hongyi-zhao/english-wordlist
>
> \o/
>
> Straight gangsta!
What's the meaning of this phrase?
> > --
> > Assoc. Prof. Hongyi Zhao <hongyi.zhao@gmail.com>
> > Theory and Simulation of Materials
> > Hebei Vocational University of Technology and Engineering
> > No. 473, Quannan West Street, Xindu District, Xingtai, Hebei
> > province
>
> Oups, impressive, but incorrect signature.
>
> RFC 3676, section 4.3 (Usenet Signature Convention)
> https://www.ietf.org/rfc/rfc3676.txt
I've read through RFC 3676, section 4.3, but still can't figure out
the problem of mine. Any more revisions/hints/comments/enhancements
will be highly appreciated?
> BTW what does "[v]ocational" mean?
> https://en.wikipedia.org/wiki/Vocational_education
Yes.
Hongyi
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-09 0:06 ` Hongyi Zhao
@ 2021-08-09 0:34 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-09 1:31 ` Hongyi Zhao
0 siblings, 1 reply; 31+ messages in thread
From: Emanuel Berg via Users list for the GNU Emacs text editor @ 2021-08-09 0:34 UTC (permalink / raw)
To: help-gnu-emacs
Hongyi Zhao wrote:
>> Straight gangsta!
>
> What's the meaning of this phrase?
"Awesome!"
> I've read through RFC 3676, section 4.3, but still can't
> figure out the problem of mine. Any more
> revisions/hints/comments/enhancements will be
> highly appreciated?
There is a long-standing convention in Usenet news which
also commonly appears in Internet mail of using "-- " as the
separator line between the body and the signature of
a message. [1]
Note the space after the two "hyphen-minuses" (or dashes).
Sometimes that space disappears, I think this is what happened
to you because you have two dashes (and no space), maybe the
mail client removes whitespace or whatever in a misguided
attempt to clean things up (well, in that case at least).
Anyway without the whitespace it doesn't work, it has to be
there: dash dash space newline signature.
With Gnus for example, you can have an article buffer hide the
signature and make it a button. So you can check it out if you
want to. But most often it is the same signatures over and
over so it can remain hidden just as well.
(gnus-article-hide-signature nil 1) ; [2]
[1] https://www.ietf.org/rfc/rfc3676.txt lines 462-464
[2] https://dataswamp.org/~incal/emacs-init/gnus/article.el
--
underground experts united
https://dataswamp.org/~incal
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Let ispell use use multiple dictionaries stored in different files.
2021-08-09 0:34 ` Emanuel Berg via Users list for the GNU Emacs text editor
@ 2021-08-09 1:31 ` Hongyi Zhao
0 siblings, 0 replies; 31+ messages in thread
From: Hongyi Zhao @ 2021-08-09 1:31 UTC (permalink / raw)
To: Emanuel Berg, help-gnu-emacs
On Mon, Aug 9, 2021 at 8:34 AM Emanuel Berg via Users list for the GNU
Emacs text editor <help-gnu-emacs@gnu.org> wrote:
>
> Hongyi Zhao wrote:
>
> >> Straight gangsta!
> >
> > What's the meaning of this phrase?
>
> "Awesome!"
>
> > I've read through RFC 3676, section 4.3, but still can't
> > figure out the problem of mine. Any more
> > revisions/hints/comments/enhancements will be
> > highly appreciated?
>
> There is a long-standing convention in Usenet news which
> also commonly appears in Internet mail of using "-- " as the
> separator line between the body and the signature of
> a message. [1]
>
> Note the space after the two "hyphen-minuses" (or dashes).
> Sometimes that space disappears, I think this is what happened
> to you because you have two dashes (and no space), maybe the
> mail client removes whitespace or whatever in a misguided
> attempt to clean things up (well, in that case at least).
>
> Anyway without the whitespace it doesn't work, it has to be
> there: dash dash space newline signature.
>
> With Gnus for example, you can have an article buffer hide the
> signature and make it a button. So you can check it out if you
> want to. But most often it is the same signatures over and
> over so it can remain hidden just as well.
>
> (gnus-article-hide-signature nil 1) ; [2]
>
> [1] https://www.ietf.org/rfc/rfc3676.txt lines 462-464
> [2] https://dataswamp.org/~incal/emacs-init/gnus/article.el
>
> --
> underground experts united
> https://dataswamp.org/~incal
>
>
--
It has the space at the end of the above line. You can check to confirm.
Another thing to not, the ``dash dash space newline'' is automatically
added by Google Mail server. I myself only set the following signature
text in the settings:
Assoc. Prof. Hongyi Zhao <hongyi.zhao@gmail.com>
Theory and Simulation of Materials
Hebei Vocational University of Technology and Engineering
No. 473, Quannan West Street, Xindu District, Xingtai, Hebei province
Regards,
Hongyi
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2021-08-09 1:31 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-08-07 12:20 Let ispell use use multiple dictionaries stored in different files Hongyi Zhao
2021-08-07 12:41 ` Eli Zaretskii
2021-08-07 13:46 ` Hongyi Zhao
2021-08-07 14:03 ` Eli Zaretskii
2021-08-07 15:54 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 3:10 ` Hongyi Zhao
2021-08-08 3:35 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 4:03 ` Hongyi Zhao
2021-08-08 4:09 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 4:22 ` Hongyi Zhao
2021-08-08 3:58 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 4:16 ` Hongyi Zhao
2021-08-08 4:40 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 4:52 ` Hongyi Zhao
2021-08-08 4:58 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 5:08 ` Hongyi Zhao
2021-08-08 5:19 ` Hongyi Zhao
2021-08-08 6:53 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 7:14 ` Hongyi Zhao
2021-08-08 7:22 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 7:43 ` Hongyi Zhao
2021-08-08 7:51 ` Hongyi Zhao
2021-08-08 8:02 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 8:27 ` Hongyi Zhao
2021-08-08 8:37 ` Hongyi Zhao
2021-08-08 10:14 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-08 13:09 ` Hongyi Zhao
2021-08-08 16:27 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-09 0:06 ` Hongyi Zhao
2021-08-09 0:34 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-08-09 1:31 ` Hongyi Zhao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).