From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Hongyi Zhao Newsgroups: gmane.emacs.help Subject: Re: Let ispell use use multiple dictionaries stored in different files. Date: Sun, 8 Aug 2021 12:16:38 +0800 Message-ID: References: <874kc1i72n.fsf@zoho.eu> <87o8a8fuzv.fsf@zoho.eu> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="34976"; mail-complaints-to="usenet@ciao.gmane.io" To: Emanuel Berg , help-gnu-emacs Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Sun Aug 08 06:17:26 2021 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mCaFZ-0008sg-L8 for geh-help-gnu-emacs@m.gmane-mx.org; Sun, 08 Aug 2021 06:17:25 +0200 Original-Received: from localhost ([::1]:53684 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mCaFX-0005FD-Li for geh-help-gnu-emacs@m.gmane-mx.org; Sun, 08 Aug 2021 00:17:23 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:43326) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mCaF4-0005F1-32 for help-gnu-emacs@gnu.org; Sun, 08 Aug 2021 00:16:54 -0400 Original-Received: from mail-lj1-x230.google.com ([2a00:1450:4864:20::230]:41958) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mCaF2-0001xB-0b for help-gnu-emacs@gnu.org; Sun, 08 Aug 2021 00:16:53 -0400 Original-Received: by mail-lj1-x230.google.com with SMTP id h9so18449041ljq.8 for ; Sat, 07 Aug 2021 21:16:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=DZc34VS32Snj5vdwtTRnEjlaRYwZYlIjl5kBRsPtgm8=; b=q2Lwq+okKdBQKiMlo8muOz6kyAUpqNczg7VhXqvHMX4eoR5aSjpgR4/4unSD70BbtK +5GvtPlGqh33172h1OYXMHa235kjR8NDii+uuQy2IJCMNbVWjSyAfzQx58dU3EEzKDPP 1gBXisFQCIC0Lz1gKnStWn3KNHXnZkDA0a2IY2IRT7W618+MrrDEvc7kzKYAk/Di1U9k vViaC52wibPHm8ZTsOnx4taOltGhSy6EqUuX4CRx60iS3JKxVKb9g5tocQz/tzcvCRZ0 h/UH9upF7o3fujub5zdLiJ96k+aryBKRxQYH6loSCJPgQ50vm8cTgEAxN92X0myCzBzC 8TZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=DZc34VS32Snj5vdwtTRnEjlaRYwZYlIjl5kBRsPtgm8=; b=RhV2pmkCB8u+fZ7JA0DFJVKLq53YE+rjZ4/tdQgxHnpEFUoGw9hRAzw9q/RCFvoQ04 8qkR1x7Qy6k79TzEPHXqvtZSOVgBswIhV4CNMIjIvLlF4XFfBNXKqB7Cun894sMIcVpa g1b0rl2eFF/d9QZKfjE2fW5v9gQXkF89ShSmdOd6iZdjD7YFnbDzF6WtlxHKXAhjA4ax QR9DT39BhNfVM5/PoIqSJLTS497HAInDPPsxLwu9LY7WorSxm24N9DSmd2QWlfDk78QS yx0yuY1uOleOuoShSTEa/WfLaLFfwm72UQqPlfD+QwBo9KtLKYH2DWYoaCZDFwwj9Yt4 +tEg== X-Gm-Message-State: AOAM5306FTWhHrwBuD2D89a3dHOSd+dc+k68W70GStY+ECVkUTCgg1Oz 2jUdX+VQ90SVNyNM+mBVCCQTdYo7sygsrmZ+Tvw= X-Google-Smtp-Source: ABdhPJyOkDy9ga2A9VOyh2u4B/fApqdvW8ZswyHMd/oXdiwGZkVQkw2HvqSTjgZ+WJZBm2Pd5KcdbOZn1Yo3ITlaTuo= X-Received: by 2002:a05:651c:1307:: with SMTP id u7mr3814099lja.413.1628396209933; Sat, 07 Aug 2021 21:16:49 -0700 (PDT) In-Reply-To: <87o8a8fuzv.fsf@zoho.eu> Received-SPF: pass client-ip=2a00:1450:4864:20::230; envelope-from=hongyi.zhao@gmail.com; helo=mail-lj1-x230.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, WEIRD_QUOTING=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.io gmane.emacs.help:132402 Archived-At: On Sun, Aug 8, 2021 at 11:58 AM Emanuel Berg via Users list for the GNU Emacs text editor wrote: > > Hongyi Zhao wrote: > > > combine the wamerican-insane and Webster_s_Unabridged_3 into > > one even bigger word list file, which includes 790592 > > entries at the moment: > > > > $ awk '!a[$0]++' /usr/share/dict/american-english-insane > > Webster_s_Unabridged_3.txt | > > tee ~/american-english-insane-Webster_s_Unabridged | wc > > 790592 892589 8649982 > > Hm, is awk '!a[$0]++' faster than sort -u? The above awk code keep the occurrences order of all words appeared in the original word list files. > $ sort -u A B > AB > $ wc -l AB But the sort command will sort them accordingly. Keep in mind that the ispell and any autocompletion tools/frameworks needs the already well sorted word list file for achieving affordable performance when the word list file is huge, at least this is the case for Emacs's ispell initialization stage. Out of this consideration, I don't want to change the occurrences order of the words given in the original word list files. > Where did you find Webster_s_Unabridged_3.txt ? This file is built by myself, and I have given the specific steps to create it [1]: $ sudo apt-get install python3-tk tix # pyenv python environment for this operation: $ pyenv shell datasci $ pip install gobject PyGObject pyglossary $ mkdir -p ~/.stardict/dic && cd $_ $ curl -O http://download.huzheng.org/bigdict/stardict-Webster_s_Unabridged_3-2.4.2.tar.bz2 $ tar xvf stardict-Webster_s_Unabridged_3-2.4.2.tar.bz2 $ cd stardict-Webster_s_Unabridged_3-2.4.2 $ pyglossary Webster_s_Unabridged_3.ifo Webster_s_Unabridged_3.csv $ awk -F, 'NR > 8 {sub(/^["]/,"",$1);sub(/["]$/,"",$1);print $1}' Webster_s_Unabridged_3.csv > Webster_s_Unabridged_3.txt [1] https://github.com/company-mode/company-mode/issues/1146#issuecomment-886172208 Best regards -- Assoc. Prof. Hongyi Zhao Theory and Simulation of Materials Hebei Vocational University of Technology and Engineering No. 473, Quannan West Street, Xindu District, Xingtai, Hebei province