From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Eric Abrahamsen Newsgroups: gmane.emacs.bugs Subject: bug#33653: 27.0.50; Change Gnus obarrays-as-hash-tables into real hash tables Date: Mon, 25 Mar 2019 10:35:32 -0700 Message-ID: <8736naj1qz.fsf@ericabrahamsen.net> References: <8736raz3ec.fsf@ericabrahamsen.net> <87y392xoht.fsf@ericabrahamsen.net> <87imxzxa4s.fsf@ericabrahamsen.net> <87a7hn3h3w.fsf@ericabrahamsen.net> <87woknisuu.fsf@ericabrahamsen.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="194943"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) Cc: 33653@debbugs.gnu.org To: Andy Moreton Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon Mar 25 18:36:12 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1h8TW7-000oZF-Vm for geb-bug-gnu-emacs@m.gmane.org; Mon, 25 Mar 2019 18:36:12 +0100 Original-Received: from localhost ([127.0.0.1]:46045 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h8TW6-0002pP-Vb for geb-bug-gnu-emacs@m.gmane.org; Mon, 25 Mar 2019 13:36:10 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:53802) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h8TW0-0002pG-ID for bug-gnu-emacs@gnu.org; Mon, 25 Mar 2019 13:36:05 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1h8TVz-0007In-7j for bug-gnu-emacs@gnu.org; Mon, 25 Mar 2019 13:36:04 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:45071) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1h8TVy-0007IH-QE for bug-gnu-emacs@gnu.org; Mon, 25 Mar 2019 13:36:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1h8TVy-0005Pj-Gj for bug-gnu-emacs@gnu.org; Mon, 25 Mar 2019 13:36:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eric Abrahamsen Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 25 Mar 2019 17:36:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 33653 X-GNU-PR-Package: emacs Original-Received: via spool by 33653-submit@debbugs.gnu.org id=B33653.155353534220776 (code B ref 33653); Mon, 25 Mar 2019 17:36:02 +0000 Original-Received: (at 33653) by debbugs.gnu.org; 25 Mar 2019 17:35:42 +0000 Original-Received: from localhost ([127.0.0.1]:58615 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1h8TVd-0005P2-Hx for submit@debbugs.gnu.org; Mon, 25 Mar 2019 13:35:41 -0400 Original-Received: from ericabrahamsen.net ([52.70.2.18]:41120 helo=mail.ericabrahamsen.net) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1h8TVb-0005Om-VG for 33653@debbugs.gnu.org; Mon, 25 Mar 2019 13:35:40 -0400 Original-Received: from localhost (97-113-60-14.tukw.qwest.net [97.113.60.14]) (Authenticated sender: eric@ericabrahamsen.net) by mail.ericabrahamsen.net (Postfix) with ESMTPSA id E1AFAFA02A; Mon, 25 Mar 2019 17:35:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ericabrahamsen.net; s=mail; t=1553535334; bh=Os0YluyLfpkqicXeCfEQEOLSrDoyXLfZRvibiBW3St8=; h=From:To:Cc:Subject:References:Date:In-Reply-To:From; b=rUXZIwIFMF06+HaxbxptQR5mgcJbDrf6mU0IVnvbXepFfjBDvfFYXCgek1R4pV3EN WvtwAb0Km6j3yPunz2MaUUTKw6jNBeugAqlJdTldQgY4Sr9e5YuCU8N6KGE/b3nJhf xw2GhskBgSD9rQ1rDcu9nr1/vX2cj292eyIspS5A= In-Reply-To: (Andy Moreton's message of "Mon, 25 Mar 2019 14:45:24 +0000") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:156770 Archived-At: Andy Moreton writes: > On Sun 24 Mar 2019, Eric Abrahamsen wrote: > >> Katsumi Yamaoka writes: >> >>> Hi, >>> >>> Gnus got not to work for groups of which the group name contains >>> non-ASCII letters. For instance, I got this error when trying >>> to update the "nnml:=E3=83=86=E3=82=B9=E3=83=88" group using `M-g'[1]: >>> >>> nnml:\343\203\206\343\202\271\343\203\210 error: No such group: =E3=83= =86=E3=82=B9=E3=83=88 >>> >>> When trying to enter the group using `0 RET'[2] I got: >>> >>> Group nnml:\343\203\206\343\202\271\343\203\210 couldn't be activated >>> >>> Those raw bytes are utf-8 encoded "=E3=83=86=E3=82=B9=E3=83=88", that i= s also used in >>> the group entry in gnus-newsrc-alist saved in the ~/.newsrc.eld >>> file as follows: >>> >>> ("nnml:\343\203\206\343\202\271\343\203\210" 1 nil ((unexist) (seen (1 >>> . 5))) "nnml:" ((timestamp 23704 11958))) >> >> Yes, this is something I screwed up in c1b63af445. Gnus has always >> stored group names as raw bytes in.newsrc.eld (at least I believe it >> has, you probably know better than I do, it does in my experiments with >> Emacs 26, anyway), and only encodes during display. But obviously I've >> messed something up between file persistence and display, and I'm >> working on sorting it out. > > Perhaps it would be better to revert and reintroduce your changes after > further testing ? Taking time over this is better than causing data loss > for gnus users. That's the conclusion I'm coming to, yes. > Other notes from reading the code: > > 1) In `gnus-gnus-to-quick-newsrc-format' you ignore the contents of > `gnus-newsrc-alist' when saving "newsrc.eld", and replace it with the > details from `gnus-newsrc-hashtb'. Why ? The rest of the gnus code > appears to treat `gnus-newsrc-alist' as the single source of truth, > with the hash tables being used only for faster access to it. Eventually I would like to reduce the number of data structures so that groups are held in `gnus-newsrc-hashtb', and ordering is kept in `gnus-group-list', and that's it. `gnus-newsrc-alist' would only be used when persisting to disk. My next proposed change (once I've recovered my confidence) is to turn groups into actual objects, in which case the alist would really just be a kind of serialization format. The hash table ought to be in sync with the rest of the data structure -- if it isn't, that's another bug. > 2) In `gnus-gnus-to-quick-newsrc-format' you dropped the code to remove > the dummy group from `gnus-newsrc-alist'. Why ? This internal dummy > group is now saved in "newsrc.eld", which is not needed. This was an error. (Though in my case, I've had the dummy group in my newsrc.eld for months, and it hasn't done any harm. I don't know why it's necessary.) > 3) The format of the entries in `gnus-newsrc-hashtb' has changed, > removing the second element. Why ? Because the old `gnus-gethash' call returned a slice of `gnus-newsrc-alist', where the second element was actually the group *before* the group you wanted, and the third element was the cdr of `gnus-newsrc-alist', starting with the group you wanted. This was undocumented, and took a bit to figure out. Now, the gethash call just gives you the group. Ideally, in the next set of changes, it will give you an object. > 4) You changed several hash tale sizesfrom 4096 to 4000, and 1024 to > 1000. Why ? My understanding is that using a prime number is significant when it comes to vector access, but that the hash table implementation is higher-level, where a prime number is no longer significant. If that's incorrect I would like to know! > Your patch contains several logical changes that would be easier to > understand (and bisect) as a series of patches with one logical change > in each patch: > - code layout changes > - add missing doc strings and code comments > - change hash table implementation > - change format of `gnus-newsrc-hashtb' entries > - change usage of `gnus-group-change-level' > - change coding of group names > While it can take extra work to split things up, the end result is much > easier to understand. In principle I agree with this completely. In practice I found it extraordinarily difficult to touch one part of Gnus without running into knock-on repercussions. The ultimate goal of the changes I have in mind for Gnus is to address exactly this: to make it more modular, to improve isolation of code paths, and to reduce the number of semi-redundant data structures. But the process is evidently even messier than I thought. I held back another commit to group name encoding in an attempt to keep things simple, but that seems to have made things even worse. But yes, if I end up backing this change out, I'll try to break it up into smaller commits. > Thanks for working on gnus, Thanks for the code review! I wish I'd gotten this to begin with. Eric