From: Eric Abrahamsen <eric@ericabrahamsen.net>
To: 33653@debbugs.gnu.org
Subject: bug#33653: 27.0.50; Change Gnus obarrays-as-hash-tables into real hash tables
Date: Tue, 26 Mar 2019 12:58:32 -0700 [thread overview]
Message-ID: <87ftr9tnkn.fsf@ericabrahamsen.net> (raw)
In-Reply-To: <8736raz3ec.fsf@ericabrahamsen.net>
Andy Moreton <andrewjmoreton@gmail.com> writes:
> On Mon 25 Mar 2019, Eric Abrahamsen wrote:
>
>> Andy Moreton <andrewjmoreton@gmail.com> writes:
>>> Other notes from reading the code:
>>>
>>> 1) In `gnus-gnus-to-quick-newsrc-format' you ignore the contents of
>>> `gnus-newsrc-alist' when saving "newsrc.eld", and replace it with the
>>> details from `gnus-newsrc-hashtb'. Why ? The rest of the gnus code
>>> appears to treat `gnus-newsrc-alist' as the single source of truth,
>>> with the hash tables being used only for faster access to it.
>>
>> Eventually I would like to reduce the number of data structures so that
>> groups are held in `gnus-newsrc-hashtb', and ordering is kept in
>> `gnus-group-list', and that's it. `gnus-newsrc-alist' would only be used
>> when persisting to disk. My next proposed change (once I've recovered my
>> confidence) is to turn groups into actual objects, in which case the
>> alist would really just be a kind of serialization format.
>
> ok, but if the hash table and the alist reference the same lisp objects
> then the existing setup is not so bad.
There's nothing wrong with the existing setup here! I'm just laying the
groundwork for the next round of changes.
>> The hash table ought to be in sync with the rest of the data structure
>> -- if it isn't, that's another bug.
>>
>>> 2) In `gnus-gnus-to-quick-newsrc-format' you dropped the code to remove
>>> the dummy group from `gnus-newsrc-alist'. Why ? This internal dummy
>>> group is now saved in "newsrc.eld", which is not needed.
>>
>> This was an error. (Though in my case, I've had the dummy group in my
>> newsrc.eld for months, and it hasn't done any harm. I don't know why
>> it's necessary.)
>
> I agree that it's harmless, it just seemed to be an unnecessary change.
> I expect it is there to ensure that the alist is never empty to avoid
> the need to check that everywhere.
Yes, for sure, and I didn't mean to leave it in there, that's just a bug.
>>> 3) The format of the entries in `gnus-newsrc-hashtb' has changed,
>>> removing the second element. Why?
>>
>> Because the old `gnus-gethash' call returned a slice of
>> `gnus-newsrc-alist', where the second element was actually the group
>> *before* the group you wanted, and the third element was the cdr of
>> `gnus-newsrc-alist', starting with the group you wanted. This was
>> undocumented, and took a bit to figure out. Now, the gethash call just
>> gives you the group. Ideally, in the next set of changes, it will give
>> you an object.
>
> ok, so it sounds like the old code was trying hard to use the same lisp
> objects in both the hash table and the alist. That avoids alloacting
> twice the storage, and ensures that the hash table and the alist stay in
> sync.
Yes, I'm looking forward to when the groups are actual objects, . But in
the meantime, the lisp objects are still the same, I haven't doubled
storage (at least `eq' returns t, so I assume that's what that means).
>>> 4) You changed several hash tale sizesfrom 4096 to 4000, and 1024 to
>>> 1000. Why?
>>
>> My understanding is that using a prime number is significant when it
>> comes to vector access, but that the hash table implementation is
>> higher-level, where a prime number is no longer significant. If that's
>> incorrect I would like to know!
>
> None of these numbers are prime. The old numbers are powers of two,
> which are reasonable for allocation sizes on a binary machine. Again,
> not a terribly important change, but not really needed either.
Yes, sorry, powers of two. It's true it's a do-nothing change, I guess
I've been erring on the side of trying to make the code more
self-explanatory, less "odd". Probably this change didn't need to be
made, but it seems about as un-risky as a code change could be.
>>> Your patch contains several logical changes that would be easier to
>>> understand (and bisect) as a series of patches with one logical change
>>> in each patch:
>>> - code layout changes
>>> - add missing doc strings and code comments
>>> - change hash table implementation
>>> - change format of `gnus-newsrc-hashtb' entries
>>> - change usage of `gnus-group-change-level'
>>> - change coding of group names
>>> While it can take extra work to split things up, the end result is much
>>> easier to understand.
>>
>> In principle I agree with this completely. In practice I found it
>> extraordinarily difficult to touch one part of Gnus without running into
>> knock-on repercussions.
>
> Agreed. I find that this sort of work usually goes in two phases: some
> exploratory programming to decide on the path forward and the eventual
> goal, followed by rearranging the changes into a logical set of
> (bisectable) patches that get to that goal in small, self-contained
> steps. The second half is extra work, but worth it to make it easier for
> other people to review the patches (and to simplify any archaeology
> needed by a later maintainer).
>
> If changing gnus is hard, then perhaps writing new tests to document
> what you have discovered about the code will help to ensure that later
> changes do not change the observed behaviour.
The previous commit did add some new Gnus tests -- perhaps the first!
I'm planning more, once Gnus group are actual objects. Everything's so
interconnected now (and data-reliant) that it's very hard to write
effective tests.
>> The ultimate goal of the changes I have in mind for Gnus is to address
>> exactly this: to make it more modular, to improve isolation of code
>> paths, and to reduce the number of semi-redundant data structures. But
>> the process is evidently even messier than I thought. I held back
>> another commit to group name encoding in an attempt to keep things
>> simple, but that seems to have made things even worse.
>
> A worthwhile goal - please do not get discouraged.
Thank you,
Eric
next prev parent reply other threads:[~2019-03-26 19:58 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-06 22:39 bug#33653: 27.0.50; Change Gnus obarrays-as-hash-tables into real hash tables Eric Abrahamsen
2018-12-06 22:46 ` Eric Abrahamsen
[not found] ` <m35zvzq34a.fsf@gnus.org>
2018-12-11 23:30 ` Eric Abrahamsen
2019-02-05 2:05 ` Eric Abrahamsen
2019-03-22 0:09 ` Eric Abrahamsen
2019-03-22 9:20 ` Robert Pluim
2019-03-22 17:21 ` Eric Abrahamsen
2019-03-22 19:54 ` Eric Abrahamsen
2019-03-22 21:07 ` Eli Zaretskii
2019-03-22 22:10 ` Eric Abrahamsen
2019-03-23 14:52 ` Andy Moreton
2019-03-23 16:14 ` Eric Abrahamsen
2019-03-26 18:28 ` Andy Moreton
2019-03-26 19:49 ` Eric Abrahamsen
2019-03-22 22:40 ` Glenn Morris
2019-03-22 22:51 ` Eric Abrahamsen
2019-03-23 6:46 ` Eli Zaretskii
2019-03-24 22:29 ` Bastien
2019-03-24 23:40 ` Eric Abrahamsen
2019-03-30 12:09 ` Deus Max
2019-03-31 23:27 ` Eric Abrahamsen
2019-04-01 22:39 ` Deus Max
2019-04-02 5:23 ` Eric Abrahamsen
2019-03-25 2:14 ` Katsumi Yamaoka
2019-03-25 2:35 ` Eric Abrahamsen
2019-03-25 14:45 ` Andy Moreton
2019-03-25 17:35 ` Eric Abrahamsen
2019-03-25 17:51 ` Robert Pluim
2019-03-25 18:17 ` Basil L. Contovounesios
2019-03-25 19:04 ` Bastien
2019-03-25 20:15 ` Andy Moreton
2019-03-26 19:58 ` Eric Abrahamsen [this message]
2019-03-26 21:44 ` Eric Abrahamsen
2019-03-27 4:54 ` Katsumi Yamaoka
2019-03-27 18:47 ` Eric Abrahamsen
2019-03-27 21:27 ` Eric Abrahamsen
2019-03-27 22:10 ` Eric Abrahamsen
2019-03-31 22:55 ` Eric Abrahamsen
2019-04-01 20:18 ` Adam Sjøgren
2019-04-01 20:57 ` Eric Abrahamsen
2019-04-02 16:43 ` Adam Sjøgren
2019-04-03 22:16 ` Katsumi Yamaoka
2019-04-03 22:36 ` Eric Abrahamsen
2019-04-05 4:25 ` Katsumi Yamaoka
2019-04-05 6:44 ` Katsumi Yamaoka
2019-04-05 11:02 ` Basil L. Contovounesios
2019-04-08 1:47 ` Katsumi Yamaoka
2019-04-05 20:18 ` Eric Abrahamsen
2019-04-08 1:58 ` Katsumi Yamaoka
2019-04-08 4:31 ` Katsumi Yamaoka
2019-04-11 21:29 ` Basil L. Contovounesios
2019-04-11 23:56 ` Katsumi Yamaoka
2019-04-12 11:05 ` Basil L. Contovounesios
2019-06-22 13:11 ` Lars Ingebrigtsen
2019-04-05 11:01 ` Basil L. Contovounesios
2019-04-08 8:13 ` Katsumi Yamaoka
2019-04-08 8:57 ` Andreas Schwab
2019-04-09 0:55 ` Katsumi Yamaoka
2019-04-08 18:31 ` Eric Abrahamsen
2019-04-09 0:55 ` Katsumi Yamaoka
2019-04-09 2:01 ` Eric Abrahamsen
2019-04-09 4:18 ` Katsumi Yamaoka
2019-04-09 4:30 ` Eric Abrahamsen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ftr9tnkn.fsf@ericabrahamsen.net \
--to=eric@ericabrahamsen.net \
--cc=33653@debbugs.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.