* Q: V2 format @ 2018-07-11 20:01 Eric W. Biederman 2018-07-11 21:18 ` Konstantin Ryabitsev 2018-07-12 1:47 ` Eric Wong 0 siblings, 2 replies; 21+ messages in thread From: Eric W. Biederman @ 2018-07-11 20:01 UTC (permalink / raw) To: Eric Wong; +Cc: meta I have been digging through the code looking so I can understand the v2 format and I have some ideas on how things might be improved, and some questions so that I understand. V1 supported the concept of messages being added and deleted from the git repository all while keeping a full history of everything that went on. The V2 code appears to have the name 'm' for added and 'd' for deleted, but the public-inbox-index code appears to expect deletes to happen by way of an altered history that totally purge the commits, and does not process the 'd' entries. What is the thinking about deleted entries, and for v2 what is the preferred way to delete mail from a public inbox git repository and why? Size. Reading the history of the public inbox meta mailling list and playing around I discovered that I can shave off about 100M of the V2 size of the git public inbox git repository but pushing all of the messages into a single commit. Not great for day to day operation, but if rebasses are part of the plan, and old archives part of the challenge I see quite a lot of potential for old archives to be reduced to a git repository with a single commit. Names. Is there a good reason not to use message numbers as the names in the git repositories? (Other than the cost to change the code?) That would remove the need for treat the sqlite msgmap database as precious, and it would make it easier to recover if an nntp server goes away. In V2 format the git mailing list git repository is only about 2M larger if each message has it's msg number as it's name. Plus the git log is easier to read as messages are all + or -. xapian. Can the Xapian database be made optional in V2? I absolutely think a quick search for terms and other things very valuable, so I would never suggest giving up Xapian. On the other hand on my personal laptop the xapian database for lkml takes ages and ages to build, and it pushes the system into swap. Which is all around unpleasant. That seems to eat into the distributed nature of the goal of public inbox. I have tried to see what could be done that might shrink the size of the xapian database. The only think I could think of is perhaps sharding the xapian database by time/msgnum ranges. That would allow the old xapians databases to be compacted and forgotten about, and I think it would allow less wastage in the current xapian database as it would be smaller, so wasting 50% space (or whatever the btrees waste) would be less of an issue. And as smaller databases are faster I think that would in general be a help. Time permitting I am willing to do some of this work so that public-inbox works well for me. I want to see what your vision is for the code before I start anything. Eric ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Q: V2 format 2018-07-11 20:01 Q: V2 format Eric W. Biederman @ 2018-07-11 21:18 ` Konstantin Ryabitsev 2018-07-11 21:41 ` Eric W. Biederman 2018-07-12 1:47 ` Eric Wong 1 sibling, 1 reply; 21+ messages in thread From: Konstantin Ryabitsev @ 2018-07-11 21:18 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Eric Wong, meta On Wed, Jul 11, 2018 at 03:01:53PM -0500, Eric W. Biederman wrote: > Names. Is there a good reason not to use message numbers as the names > in the git repositories? (Other than the cost to change the code?) That > would remove the need for treat the sqlite msgmap database as precious, > and it would make it easier to recover if an nntp server goes away. In > V2 format the git mailing list git repository is only about 2M larger if > each message has it's msg number as it's name. Plus the git log > is easier to read as messages are all + or -. As in, instead of changes happening to the same file "m", the message is saved into a new file and the old file deleted in each commit? -K ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Q: V2 format 2018-07-11 21:18 ` Konstantin Ryabitsev @ 2018-07-11 21:41 ` Eric W. Biederman 0 siblings, 0 replies; 21+ messages in thread From: Eric W. Biederman @ 2018-07-11 21:41 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: Eric Wong, meta Konstantin Ryabitsev <konstantin@linuxfoundation.org> writes: > On Wed, Jul 11, 2018 at 03:01:53PM -0500, Eric W. Biederman wrote: >> Names. Is there a good reason not to use message numbers as the names >> in the git repositories? (Other than the cost to change the code?) That >> would remove the need for treat the sqlite msgmap database as precious, >> and it would make it easier to recover if an nntp server goes away. In >> V2 format the git mailing list git repository is only about 2M larger if >> each message has it's msg number as it's name. Plus the git log >> is easier to read as messages are all + or -. > > As in, instead of changes happening to the same file "m", the message is > saved into a new file and the old file deleted in each commit? Yes. I believe from a git object perspective it is exactly the same. 1 tree object per commit with exactly one file in it. The only difference is that the files have different names. Eric ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Q: V2 format 2018-07-11 20:01 Q: V2 format Eric W. Biederman 2018-07-11 21:18 ` Konstantin Ryabitsev @ 2018-07-12 1:47 ` Eric Wong 2018-07-12 13:58 ` Eric W. Biederman 1 sibling, 1 reply; 21+ messages in thread From: Eric Wong @ 2018-07-12 1:47 UTC (permalink / raw) To: Eric W. Biederman; +Cc: meta "Eric W. Biederman" <ebiederm@xmission.com> wrote: > I have been digging through the code looking so I can understand the v2 > format and I have some ideas on how things might be improved, and some > questions so that I understand. Great to know you're interested! Fwiw, I've still been meaning to turn my v2 docs into a POD manpage: https://public-inbox.org/meta/20180419015813.GA20051@dcvr/ > V1 supported the concept of messages being added and deleted from > the git repository all while keeping a full history of everything that > went on. The V2 code appears to have the name 'm' for added and 'd' for > deleted, but the public-inbox-index code appears to expect deletes to > happen by way of an altered history that totally purge the commits, > and does not process the 'd' entries. "Purge" is a new concept for v2 and not even exposed (yet) in via tools. Normal operations to remove files using 'd' (via -watch or -rm) don't rewrite old history so it won't disrupt non-force fetches. > What is the thinking about deleted entries, and for v2 what is the > preferred way to delete mail from a public inbox git repository and why? Definitely prefer the normal way with 'd' files to not break people using non-force fetches. "Purge" is too disruptive and reserved for extraordinary cases (e.g. legal reasons). > Size. Reading the history of the public inbox meta mailling list and > playing around I discovered that I can shave off about 100M of the V2 > size of the git public inbox git repository but pushing all of the > messages into a single commit. Not great for day to day operation, > but if rebasses are part of the plan, and old archives part of the > challenge I see quite a lot of potential for old archives to be reduced > to a git repository with a single commit. Rebases/rewriting history is definitely not part of the plan and a last resort. > Names. Is there a good reason not to use message numbers as the names > in the git repositories? (Other than the cost to change the code?) That > would remove the need for treat the sqlite msgmap database as precious, > and it would make it easier to recover if an nntp server goes away. In > V2 format the git mailing list git repository is only about 2M larger if > each message has it's msg number as it's name. Plus the git log > is easier to read as messages are all + or -. Big trees in git were a scalability problem in v1 because of the long 2/38 names. With shorter names you propose (base-10 serial number?, the scalability problem gets pushed off a bit, I suppose. But not indefinitely; and later v2 partitions will suffer more from longer names. I also want to limit the use and exposure of serial numbers as much as possible. It's unavoidable with the NNTP interface; but reliance on serial numbers in public interfaces leads to centralization. The current v2 is also better for inode-starved users in case somebody forgets to type "--mirror" or "--bare" with clone. For the most part (unless purge is used), the SQLite database is actually recoverable. So no, I don't think having serial numbers stored in filenames is the right thing. > xapian. Can the Xapian database be made optional in V2? Definitely in the TODO :) > I absolutely > think a quick search for terms and other things very valuable, so I > would never suggest giving up Xapian. On the other hand on my personal > laptop the xapian database for lkml takes ages and ages to build, and it > pushes the system into swap. Which is all around unpleasant. That > seems to eat into the distributed nature of the goal of public inbox. > I have tried to see what could be done that might shrink the size of > the xapian database. The only think I could think of is perhaps > sharding the xapian database by time/msgnum ranges. That would allow > the old xapians databases to be compacted and forgotten about, and I > think it would allow less wastage in the current xapian database as it > would be smaller, so wasting 50% space (or whatever the btrees waste) > would be less of an issue. And as smaller databases are faster I think > that would in general be a help. One big killer for Xapian is position information required for "quoted phrase searches". I seem to remember deleting the position.* files was safe as it would only break phrase searches (but I haven't tried it). So there should be an option to toggle between the "index_text" and routines in Xapian "index_text_without_positions". Given the way the indexing only works on the most recent data; I think one could also write a script to delete old data/results from Xapian without affecting current/future indexing. That would pop back up if/when there's schema upgrades requiring a rebuild, though... I believe there should be 3 levels of v2 operation: 1) SQLite-only (NNTP and all the threading stuff works) 2) SQLite + Xapian w/o positions (good enough for most things) 3) SQLite + Xapian w/ positions (current, default) 2) seems like a reasonable trade-off for most sites; I'm not sure how often phrase searching gets used. > Time permitting I am willing to do some of this work so that > public-inbox works well for me. I want to see what your vision is for > the code before I start anything. Thanks for running this by, first. I'm not convinced git layout changes are warranted at this point for v2. Making Xapian optional and configurable to use index_text_without_positions is something I definitely want to see happen, though. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Q: V2 format 2018-07-12 1:47 ` Eric Wong @ 2018-07-12 13:58 ` Eric W. Biederman 2018-07-12 23:09 ` Eric Wong 0 siblings, 1 reply; 21+ messages in thread From: Eric W. Biederman @ 2018-07-12 13:58 UTC (permalink / raw) To: Eric Wong; +Cc: meta Eric Wong <e@80x24.org> writes: > "Eric W. Biederman" <ebiederm@xmission.com> wrote: >> I have been digging through the code looking so I can understand the v2 >> format and I have some ideas on how things might be improved, and some >> questions so that I understand. > > Great to know you're interested! Fwiw, I've still been meaning > to turn my v2 docs into a POD manpage: > > https://public-inbox.org/meta/20180419015813.GA20051@dcvr/ I have some personal mail archives that I need to do something better with. My goal is for day-to-day operations (aka mail delivery and archiving) to be able to run on a smallish 32bit machine. But archives are not valuable unless you have a fast search capability which makes all of the features of xapian very interesting. I need to compare message id's to see if I have content missing from the public linux-kernel archive. It is probably Konrad's cleanup of the headers but my linux-kernel archive when imported into public-inbox is slightly larger than Konrads. I also like the idea of being able to read and archive public lists that I care about with just a git fetch and local tools. Public mailing lists and their archives are more important, but on my radar is also IMAP/regular email support. With it's little bit of extra state. >> V1 supported the concept of messages being added and deleted from >> the git repository all while keeping a full history of everything that >> went on. The V2 code appears to have the name 'm' for added and 'd' for >> deleted, but the public-inbox-index code appears to expect deletes to >> happen by way of an altered history that totally purge the commits, >> and does not process the 'd' entries. > > "Purge" is a new concept for v2 and not even exposed (yet) in > via tools. Normal operations to remove files using 'd' (via > -watch or -rm) don't rewrite old history so it won't disrupt > non-force fetches. This helps a lot in understanding the intent of the code. Konrad had mentioned something about being able to rebase when I pointed out the buggy git commits in linux-kernel. >> What is the thinking about deleted entries, and for v2 what is the >> preferred way to delete mail from a public inbox git repository and why? > > Definitely prefer the normal way with 'd' files to not break > people using non-force fetches. "Purge" is too disruptive > and reserved for extraordinary cases (e.g. legal reasons). Then I am going to report a probable bug. In V2 in public-inbox-index I can not find a path from finding a 'd' file and a call to unindex. V1 unindexes deleted files. Rebased heads for purges call unindex. I don't see that for ordinary d files though. >> Size. Reading the history of the public inbox meta mailling list and >> playing around I discovered that I can shave off about 100M of the V2 >> size of the git public inbox git repository but pushing all of the >> messages into a single commit. Not great for day to day operation, >> but if rebasses are part of the plan, and old archives part of the >> challenge I see quite a lot of potential for old archives to be reduced >> to a git repository with a single commit. > > Rebases/rewriting history is definitely not part of the plan and > a last resort. > >> Names. Is there a good reason not to use message numbers as the names >> in the git repositories? (Other than the cost to change the code?) That >> would remove the need for treat the sqlite msgmap database as precious, >> and it would make it easier to recover if an nntp server goes away. In >> V2 format the git mailing list git repository is only about 2M larger if >> each message has it's msg number as it's name. Plus the git log >> is easier to read as messages are all + or -. > > Big trees in git were a scalability problem in v1 because of the > long 2/38 names. With shorter names you propose (base-10 serial > number?, the scalability problem gets pushed off a bit, I suppose. > But not indefinitely; and later v2 partitions will suffer more > from longer names. Bit trees were a scalability problem in git becuase they are quadratic. Every commit mentioned every email. So a walk of the history would have to visit every file on every commit. I expect those tree objects in the history compress well with their parents but it doesn't simplify the tree walker. Would you like my test conversion script from V1 so you can take a look? > I also want to limit the use and exposure of serial numbers as > much as possible. It's unavoidable with the NNTP interface; > but reliance on serial numbers in public interfaces leads to > centralization. I completely agree about public web interfaces. Message-ID is a much better key to messages as it was generated by the message sender. > The current v2 is also better for inode-starved users in case > somebody forgets to type "--mirror" or "--bare" with clone. For > the most part (unless purge is used), the SQLite database is > actually recoverable. Because of the parallelism in V2 I have noticed messages in numbered in an order that does not correspond to their commit order. So the SQLite database isn't as recoverable as it might be. Especially as the parallelism introduces an element of non-determinancy. > So no, I don't think having serial numbers stored in filenames > is the right thing. I won't push it but I at the present time I respectfully disagree. The big advantage I see with serial numbers (other than msgmap) is that you can include multiple emails per commit (without going quadratic). I am also looking at potentially storing the other email states that IMAP and maildir mailboxes track. I can imagine that much more easily with message numbers. Still I want to avoid something that makes git go quadratic again. >> xapian. Can the Xapian database be made optional in V2? > > Definitely in the TODO :) > >> I absolutely >> think a quick search for terms and other things very valuable, so I >> would never suggest giving up Xapian. On the other hand on my personal >> laptop the xapian database for lkml takes ages and ages to build, and it >> pushes the system into swap. Which is all around unpleasant. That >> seems to eat into the distributed nature of the goal of public inbox. >> I have tried to see what could be done that might shrink the size of >> the xapian database. The only think I could think of is perhaps >> sharding the xapian database by time/msgnum ranges. That would allow >> the old xapians databases to be compacted and forgotten about, and I >> think it would allow less wastage in the current xapian database as it >> would be smaller, so wasting 50% space (or whatever the btrees waste) >> would be less of an issue. And as smaller databases are faster I think >> that would in general be a help. > > One big killer for Xapian is position information required for > "quoted phrase searches". I seem to remember deleting the position.* > files was safe as it would only break phrase searches (but I > haven't tried it). I have a very ugly patch that removed all of Xapian. So for day to day nntp use. It is certainly safe. > So there should be an option to toggle between the "index_text" > and routines in Xapian "index_text_without_positions". I might take a look at that. I just looked and the position database is huge. > Given the way the indexing only works on the most recent data; > I think one could also write a script to delete old data/results > from Xapian without affecting current/future indexing. > That would pop back up if/when there's schema upgrades requiring > a rebuild, though... Good for testing. Not for long term as it is the actual indexing that is painful. > I believe there should be 3 levels of v2 operation: > > 1) SQLite-only (NNTP and all the threading stuff works) > 2) SQLite + Xapian w/o positions (good enough for most things) > 3) SQLite + Xapian w/ positions (current, default) > > 2) seems like a reasonable trade-off for most sites; I'm not > sure how often phrase searching gets used. I will take a look at that. That seems a straight forward place to start that we can easily agree upon. >> Time permitting I am willing to do some of this work so that >> public-inbox works well for me. I want to see what your vision is for >> the code before I start anything. > > Thanks for running this by, first. I'm not convinced git layout > changes are warranted at this point for v2. > > Making Xapian optional and configurable to use > index_text_without_positions is something I definitely want to > see happen, though. I will clean up my patches for that then. Eric ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Q: V2 format 2018-07-12 13:58 ` Eric W. Biederman @ 2018-07-12 23:09 ` Eric Wong 2018-07-13 13:39 ` Eric W. Biederman 0 siblings, 1 reply; 21+ messages in thread From: Eric Wong @ 2018-07-12 23:09 UTC (permalink / raw) To: Eric W. Biederman; +Cc: meta "Eric W. Biederman" <ebiederm@xmission.com> wrote: > Eric Wong <e@80x24.org> writes: > > "Eric W. Biederman" <ebiederm@xmission.com> wrote: > >> I have been digging through the code looking so I can understand the v2 > >> format and I have some ideas on how things might be improved, and some > >> questions so that I understand. > > > > Great to know you're interested! Fwiw, I've still been meaning > > to turn my v2 docs into a POD manpage: > > > > https://public-inbox.org/meta/20180419015813.GA20051@dcvr/ > > I have some personal mail archives that I need to do something better > with. My goal is for day-to-day operations (aka mail delivery and > archiving) to be able to run on a smallish 32bit machine. Great to hear your interest in that! public-inbox.org is still 32-bit on a $20/month VPS. Xapian really does better with an SSD (freshly TRIM-ed), though; so my low-end netbook with HDD struggles on big inboxes at the moment. > But archives are not valuable unless you have a fast search capability > which makes all of the features of xapian very interesting. Agreed. > I need to compare message id's to see if I have content missing from the > public linux-kernel archive. It is probably Konrad's cleanup of the > headers but my linux-kernel archive when imported into public-inbox is > slightly larger than Konrads. Konrad == Konstantin? I haven't looked at what's in lore, yet, but there were numerous header differences from the archives he gave me for v2 development vs what I got from my own archives. Off the top of my head: * addresses in To:/Cc: lists rewritten for some old list addresses * some addressee formatting/quoting changes as a result * last (most recent) Received: header removed (but not actually enough to anonymize the original recipient in most cases). This affects sorting comparisons in search results * reencoded some MIME parts to different encodings (to 8bit, I think) Maybe some others. > I also like the idea of being able to read and archive public lists that > I care about with just a git fetch and local tools. Yes. I still use "git log -p -B" etc. That said; I don't want to give up too much to support that (the SQLite dependency doesn't seem too expensive); and try to keep public-inbox easy-to-install. Making Xapian optional will be a huge part of that. > Public mailing lists and their archives are more important, but on my > radar is also IMAP/regular email support. With it's little bit of extra > state. Cool. I've been thinking about something for personal mail, too. mairix is killing my beefier personal machine (because it needs to rewrite the entire index every time) and Maildirs+notmuch is a non-starter due to dentry cache overheads and inode consumption. > >> What is the thinking about deleted entries, and for v2 what is the > >> preferred way to delete mail from a public inbox git repository and why? > > > > Definitely prefer the normal way with 'd' files to not break > > people using non-force fetches. "Purge" is too disruptive > > and reserved for extraordinary cases (e.g. legal reasons). > > Then I am going to report a probable bug. In V2 in public-inbox-index > I can not find a path from finding a 'd' file and a call to unindex. V1 > unindexes deleted files. Rebased heads for purges call unindex. I > don't see that for ordinary d files though. It shouldn't need to call unindex because they never get indexed on rebuilds. V2 indexing walks history backwards (normal "git log" behavior) so it remembers 'd' paths in the "$D" hash; and skips blobs as it encounters them. v1 needed to unindex because it used "git log --reverse" to walk forward in history. > >> Size. Reading the history of the public inbox meta mailling list and > >> playing around I discovered that I can shave off about 100M of the V2 > >> size of the git public inbox git repository but pushing all of the > >> messages into a single commit. Not great for day to day operation, > >> but if rebasses are part of the plan, and old archives part of the > >> challenge I see quite a lot of potential for old archives to be reduced > >> to a git repository with a single commit. > > > > Rebases/rewriting history is definitely not part of the plan and > > a last resort. > > > >> Names. Is there a good reason not to use message numbers as the names > >> in the git repositories? (Other than the cost to change the code?) That > >> would remove the need for treat the sqlite msgmap database as precious, > >> and it would make it easier to recover if an nntp server goes away. In > >> V2 format the git mailing list git repository is only about 2M larger if > >> each message has it's msg number as it's name. Plus the git log > >> is easier to read as messages are all + or -. > > > > Big trees in git were a scalability problem in v1 because of the > > long 2/38 names. With shorter names you propose (base-10 serial > > number?, the scalability problem gets pushed off a bit, I suppose. > > But not indefinitely; and later v2 partitions will suffer more > > from longer names. > > Bit trees were a scalability problem in git becuase they are quadratic. > Every commit mentioned every email. So a walk of the history would > have to visit every file on every commit. I expect those tree objects > in the history compress well with their parents but it doesn't simplify > the tree walker. > > Would you like my test conversion script from V1 so you can take a look? Sure, but I can't guarantee I can find the time to spend on it; but others might be interested. > > The current v2 is also better for inode-starved users in case > > somebody forgets to type "--mirror" or "--bare" with clone. For > > the most part (unless purge is used), the SQLite database is > > actually recoverable. > > Because of the parallelism in V2 I have noticed messages in numbered > in an order that does not correspond to their commit order. So the > SQLite database isn't as recoverable as it might be. Especially as the > parallelism introduces an element of non-determinancy. *puzzled* were you able to reproduce that? The serial number generation + threading happens in the main process and the parallelism is limited to Xapian text indexing. -index generates serial numbers by walking backwards with v2, and complains on unexpected results. As far as personal mail goes, I wouldn't want serial numbers at all (more unnecessary state to keep track of). > > So no, I don't think having serial numbers stored in filenames > > is the right thing. > > I won't push it but I at the present time I respectfully disagree. > > The big advantage I see with serial numbers (other than msgmap) is that > you can include multiple emails per commit (without going quadratic). I > am also looking at potentially storing the other email states that IMAP > and maildir mailboxes track. I can imagine that much more easily with > message numbers. Still I want to avoid something that makes git go > quadratic again. You'd want deeper trees; still. I'd still use hex, and maybe truncate the blob hash to avoid having to keep track of any serial number state. Maybe 2/2/4 naming is enough while using git history to resolve collisions. Multiple emails per-commit doesn't make sense for public archives. For personal archives, you could probably snap off 1-file-per-commit history periodically to make make a big tree to reduce commit objects. The cost of losing compatibility, rewriting history + repacking, to save 100M there out of 1G(?) or so doesn't seem like a great trade-off, though. I wonder how much can be saved with short author/committer info and empty commit messages, even. I'd rather do that than break history and require repacking. If I wanted to track replied/seen/etc... state in git for personal mail, I'd probably use 'r', 's', etc filenames; but I'm not sure it'd be in the same or different git repo from the public one. That said; I don't know if I want to store state in git or SQLite or something else... Looking forward to making Xapian and position data optional :> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Q: V2 format 2018-07-12 23:09 ` Eric Wong @ 2018-07-13 13:39 ` Eric W. Biederman 2018-07-13 20:03 ` Eric W. Biederman ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: Eric W. Biederman @ 2018-07-13 13:39 UTC (permalink / raw) To: Eric Wong; +Cc: meta [-- Attachment #1: Type: text/plain, Size: 10900 bytes --] Eric Wong <e@80x24.org> writes: > "Eric W. Biederman" <ebiederm@xmission.com> wrote: >> Eric Wong <e@80x24.org> writes: >> > "Eric W. Biederman" <ebiederm@xmission.com> wrote: >> >> I have been digging through the code looking so I can understand the v2 >> >> format and I have some ideas on how things might be improved, and some >> >> questions so that I understand. >> > >> > Great to know you're interested! Fwiw, I've still been meaning >> > to turn my v2 docs into a POD manpage: >> > >> > https://public-inbox.org/meta/20180419015813.GA20051@dcvr/ >> >> I have some personal mail archives that I need to do something better >> with. My goal is for day-to-day operations (aka mail delivery and >> archiving) to be able to run on a smallish 32bit machine. > > Great to hear your interest in that! public-inbox.org is still > 32-bit on a $20/month VPS. Xapian really does better with an > SSD (freshly TRIM-ed), though; so my low-end netbook with HDD > struggles on big inboxes at the moment. I am leery of SSDs at the moment. It was probably bad luck but my last mail setup using cyrus (1 message per file) managed to kill an SSD in under a year. >> But archives are not valuable unless you have a fast search capability >> which makes all of the features of xapian very interesting. > > Agreed. > >> I need to compare message id's to see if I have content missing from the >> public linux-kernel archive. It is probably Konrad's cleanup of the >> headers but my linux-kernel archive when imported into public-inbox is >> slightly larger than Konrads. > > Konrad == Konstantin? Yes. Konstantin Ryabitsev. Konstantin, my apologies I did not mean to scramble your name. > I haven't looked at what's in lore, yet, > but there were numerous header differences from the archives he > gave me for v2 development vs what I got from my own archives. > > Off the top of my head: > > * addresses in To:/Cc: lists rewritten for some old list addresses > > * some addressee formatting/quoting changes as a result > > * last (most recent) Received: header removed (but not actually > enough to anonymize the original recipient in most cases). > This affects sorting comparisons in search results > > * reencoded some MIME parts to different encodings (to 8bit, I think) > > Maybe some others. > >> I also like the idea of being able to read and archive public lists that >> I care about with just a git fetch and local tools. > > Yes. I still use "git log -p -B" etc. That said; I don't want > to give up too much to support that (the SQLite dependency doesn't > seem too expensive); and try to keep public-inbox easy-to-install. > Making Xapian optional will be a huge part of that. What I meant is that it is very useful not to have to not need to sync anything other than the git repository between machines. >> Public mailing lists and their archives are more important, but on my >> radar is also IMAP/regular email support. With it's little bit of extra >> state. > > Cool. I've been thinking about something for personal mail, > too. mairix is killing my beefier personal machine (because it > needs to rewrite the entire index every time) and > Maildirs+notmuch is a non-starter due to dentry cache overheads > and inode consumption. > >> >> What is the thinking about deleted entries, and for v2 what is the >> >> preferred way to delete mail from a public inbox git repository and why? >> > >> > Definitely prefer the normal way with 'd' files to not break >> > people using non-force fetches. "Purge" is too disruptive >> > and reserved for extraordinary cases (e.g. legal reasons). >> >> Then I am going to report a probable bug. In V2 in public-inbox-index >> I can not find a path from finding a 'd' file and a call to unindex. V1 >> unindexes deleted files. Rebased heads for purges call unindex. I >> don't see that for ordinary d files though. > > It shouldn't need to call unindex because they never get indexed > on rebuilds. V2 indexing walks history backwards (normal "git log" > behavior) so it remembers 'd' paths in the "$D" hash; and skips blobs > as it encounters them. > > v1 needed to unindex because it used "git log --reverse" to walk > forward in history. This assumes that you see them in the same git pull. I would think ideally anything that is going to be deleted that quickly you can just skip archiving. What is the time window of you expecting 'd' messages to appear? >> >> Size. Reading the history of the public inbox meta mailling list and >> >> playing around I discovered that I can shave off about 100M of the V2 >> >> size of the git public inbox git repository but pushing all of the >> >> messages into a single commit. Not great for day to day operation, >> >> but if rebasses are part of the plan, and old archives part of the >> >> challenge I see quite a lot of potential for old archives to be reduced >> >> to a git repository with a single commit. >> > >> > Rebases/rewriting history is definitely not part of the plan and >> > a last resort. >> > >> >> Names. Is there a good reason not to use message numbers as the names >> >> in the git repositories? (Other than the cost to change the code?) That >> >> would remove the need for treat the sqlite msgmap database as precious, >> >> and it would make it easier to recover if an nntp server goes away. In >> >> V2 format the git mailing list git repository is only about 2M larger if >> >> each message has it's msg number as it's name. Plus the git log >> >> is easier to read as messages are all + or -. >> > >> > Big trees in git were a scalability problem in v1 because of the >> > long 2/38 names. With shorter names you propose (base-10 serial >> > number?, the scalability problem gets pushed off a bit, I suppose. >> > But not indefinitely; and later v2 partitions will suffer more >> > from longer names. >> >> Bit trees were a scalability problem in git becuase they are quadratic. >> Every commit mentioned every email. So a walk of the history would >> have to visit every file on every commit. I expect those tree objects >> in the history compress well with their parents but it doesn't simplify >> the tree walker. >> >> Would you like my test conversion script from V1 so you can take a look? > > Sure, but I can't guarantee I can find the time to spend on it; > but others might be interested. > >> > The current v2 is also better for inode-starved users in case >> > somebody forgets to type "--mirror" or "--bare" with clone. For >> > the most part (unless purge is used), the SQLite database is >> > actually recoverable. >> >> Because of the parallelism in V2 I have noticed messages in numbered >> in an order that does not correspond to their commit order. So the >> SQLite database isn't as recoverable as it might be. Especially as the >> parallelism introduces an element of non-determinancy. > > *puzzled* were you able to reproduce that? The serial number > generation + threading happens in the main process and the > parallelism is limited to Xapian text indexing. -index > generates serial numbers by walking backwards with v2, and > complains on unexpected results. I will have to look a bit deeper. It was just something I noticed in passing as I was rewriting mail boxes with msgnum extracted from sqllite. I will see if I can track that one done. I very much value retaining enough information in the git archive to reconstruct the serial numbers. So that all that is needs to be backed up is the git archive. Even purge can insert a dummy entry so I don't think there is any time when we would not be able to preserve them with the current setup. > As far as personal mail goes, I wouldn't want serial numbers at all > (more unnecessary state to keep track of). At least imap requires serial numbers, and I imagine the easy transition for mail clients is to have an imap server. As you have mentioned an ordered list of commits is good enough to reconstruct the msgnum reliably so it is unlikely we would need to do anything special there. >> > So no, I don't think having serial numbers stored in filenames >> > is the right thing. >> >> I won't push it but I at the present time I respectfully disagree. >> >> The big advantage I see with serial numbers (other than msgmap) is that >> you can include multiple emails per commit (without going quadratic). I >> am also looking at potentially storing the other email states that IMAP >> and maildir mailboxes track. I can imagine that much more easily with >> message numbers. Still I want to avoid something that makes git go >> quadratic again. > > You'd want deeper trees; still. I'd still use hex, and maybe > truncate the blob hash to avoid having to keep track of any > serial number state. Maybe 2/2/4 naming is enough while using > git history to resolve collisions. The key fundamental difference is if you keep the same files from one commit to another. To demonstrate this I have attached a quick conversion script I used to test this. It uses h{40} names. Totally flat. "time git rev-list --objects --all | wc -l" on the git mailling list archive takes just over 5 seconds. Compared to your one file name case: $ du -hs git/git/0.git/ git-long-names/git/0.git/ 759M git/git/0.git/ 772M git-long-names/git/0.git/ So the only difference is using shorter filenames you save 13M. The original git tree in V1 format is 1001M so still 30M larger. And "time git rev-list --objects --all | wc -l" takes 1m14s. Making it definitely slower. > Multiple emails per-commit doesn't make sense for public > archives. I am not certain. For a maillist like linux kernel especially when someone sends a patch series to the list and it arrives all at once I imagine there is potential there. I believe this is visible in the mail delivery pipeline if you implement LMTP. > For personal archives, you could probably snap off > 1-file-per-commit history periodically to make make a big tree > to reduce commit objects. The cost of losing compatibility, > rewriting history + repacking, to save 100M there out of 1G(?) > or so doesn't seem like a great trade-off, though. It is significant. Mostly it seems to make sense for importing archives or really compacting archives for storage. > I wonder how much can be saved with short author/committer info > and empty commit messages, even. I'd rather do that than break > history and require repacking. You seem to have saved 13M with one character file names. > If I wanted to track replied/seen/etc... state in git for > personal mail, I'd probably use 'r', 's', etc filenames; but I'm > not sure it'd be in the same or different git repo from the > public one. > > That said; I don't know if I want to store state in git or > SQLite or something else... Agreed. That all bears some careful looking into. > Looking forward to making Xapian and position data optional :> [-- Attachment #2: public-inbox-convert-long-names --] [-- Type: text/plain, Size: 4277 bytes --] #!/usr/bin/perl -w # Copyright (C) 2018 all contributors <meta@public-inbox.org> # License: AGPL-3.0+ <http://www.gnu.org/licenses/agpl-3.0.txt> use strict; use warnings; use Getopt::Long qw(:config gnu_getopt no_ignore_case auto_abbrev); use PublicInbox::MIME; use PublicInbox::InboxWritable; use PublicInbox::Config; use PublicInbox::V2Writable; use PublicInbox::Import; use PublicInbox::Spawn qw(spawn); use Cwd 'abs_path'; use File::Copy 'cp'; # preserves permissions: my $usage = "Usage: public-inbox-convert OLD NEW\n"; my $jobs; my $index = 1; my %opts = ( '--jobs|j=i' => \$jobs, '--index!' => \$index, ); GetOptions(%opts) or die "bad command-line args\n$usage"; GetOptions(%opts) or die "bad command-line args\n$usage"; my $old_dir = shift or die $usage; my $new_dir = shift or die $usage; die "$new_dir exists\n" if -d $new_dir; die "$old_dir not a directory\n" unless -d $old_dir; my $config = eval { PublicInbox::Config->new }; $old_dir = abs_path($old_dir); my $old; if ($config) { $config->each_inbox(sub { $old = $_[0] if abs_path($_[0]->{mainrepo}) eq $old_dir; }); } unless ($old) { warn "W: $old_dir not configured in " . PublicInbox::Config::default_file() . "\n"; $old = { mainrepo => $old_dir, name => 'ignored', address => [ 'old@example.com' ], }; $old = PublicInbox::Inbox->new($old); } $old = PublicInbox::InboxWritable->new($old); if (($old->{version} || 1) >= 2) { die "Only conversion from v1 inboxes is supported\n"; } my $new = { %$old }; $new->{mainrepo} = abs_path($new_dir); $new->{version} = 2; $new = PublicInbox::InboxWritable->new($new); my $v2w; $old->umask_prepare; sub link_or_copy ($$) { my ($src, $dst) = @_; link($src, $dst) and return; $!{EXDEV} or warn "link $src, $dst failed: $!, trying cp\n"; cp($src, $dst) or die "cp $src, $dst failed: $!\n"; } $old->with_umask(sub { my $old_cfg = "$old->{mainrepo}/config"; local $ENV{GIT_CONFIG} = $old_cfg; my $new_cfg = "$new->{mainrepo}/all.git/config"; $v2w = PublicInbox::V2Writable->new($new, 1); $v2w->init_inbox($jobs); unlink $new_cfg; link_or_copy($old_cfg, $new_cfg); if (my $alt = $new->{altid}) { require PublicInbox::AltId; foreach my $i (0..$#$alt) { my $src = PublicInbox::AltId->new($old, $alt->[$i], 0); $src->mm_alt or next; my $dst = PublicInbox::AltId->new($new, $alt->[$i], 1); $dst = $dst->{filename}; $src->mm_alt->{dbh}->sqlite_backup_to_file($dst); } } my $desc = "$old->{mainrepo}/description"; link_or_copy($desc, "$new->{mainrepo}/description") if -e $desc; my $clone = "$old->{mainrepo}/cloneurl"; if (-e $clone) { warn <<""; $clone may not be valid after migrating to v2, not copying } }); my $state = ''; my ($prev, $from); my $head = $old->{ref_head} || 'HEAD'; my ($rd, $pid) = $old->git->popen(qw(fast-export --use-done-feature), $head); $v2w->idx_init; my $im = $v2w->importer; my ($r, $w) = $im->gfi_start; my $h = '[0-9a-f]'; my %D; my $purged = 0; while (<$rd>) { if ($_ eq "blob\n") { $state = 'blob'; } elsif (/^commit /) { $state = 'commit'; $purged = 0; } elsif (/^data (\d+)/) { my $len = $1; $w->print($_) or $im->wfail; while ($len) { my $n = read($rd, my $tmp, $len) or die "read: $!"; warn "$n != $len\n" if $n != $len; $len -= $n; $w->print($tmp) or $im->wfail; } next; } elsif ($state eq 'commit') { if (m/^([MDcRN] | deleteall)/) { if (!$purged) { $purged = 1; $w->print("deleteall\n") or $im->wfail; } } if (m{^M 100644 :(\d+) (${h}{2})/(${h}{38})}o) { my ($mark, $path) = ($1, $2 . $3); ${D}{$path} = $mark; $w->print("M 100644 :$mark $path\n") or $im->wfail; next; } if (m{^D (${h}{2})/(${h}{38})}o) { my $path = $1 . $2; my $mark = delete $D{$path}; defined $mark or die "undeleted path: $1\n"; $w->print("M 100644 :$mark d\n") or $im->wfail; next; } if (m{^from (:\d+)}) { $prev = $from; $from = $1; # no next } } last if $_ eq "done\n"; $w->print($_) or $im->wfail; } $w = $r = undef; close $rd or die "close fast-export: $!\n"; waitpid($pid, 0) or die "waitpid failed: $!\n"; $? == 0 or die "fast-export failed: $?\n"; my $mm = $old->mm; $mm->{dbh}->sqlite_backup_to_file("$new_dir/msgmap.sqlite3") if $mm; $v2w->done; if ($index) { $v2w->index_sync; $v2w->done; } ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Q: V2 format 2018-07-13 13:39 ` Eric W. Biederman @ 2018-07-13 20:03 ` Eric W. Biederman 2018-07-13 22:22 ` msgmap serial number regeneration [was: Q: V2 format] Eric Wong 2018-07-13 22:02 ` bug: v2 deletes on incremental fetch " Eric Wong 2018-07-13 23:07 ` IMAP server [was: Q: V2 format] Eric Wong 2 siblings, 1 reply; 21+ messages in thread From: Eric W. Biederman @ 2018-07-13 20:03 UTC (permalink / raw) To: Eric Wong; +Cc: meta ebiederm@xmission.com (Eric W. Biederman) writes: > Eric Wong <e@80x24.org> writes: > >> "Eric W. Biederman" <ebiederm@xmission.com> wrote: >>> >>> Because of the parallelism in V2 I have noticed messages in numbered >>> in an order that does not correspond to their commit order. So the >>> SQLite database isn't as recoverable as it might be. Especially as the >>> parallelism introduces an element of non-determinancy. >> >> *puzzled* were you able to reproduce that? The serial number >> generation + threading happens in the main process and the >> parallelism is limited to Xapian text indexing. -index >> generates serial numbers by walking backwards with v2, and >> complains on unexpected results. Digging into this I have found consistenly non-reproducible numbering, because of deleted files. Apparently in both V1 and V2 an a worst-case estimate is made of the total numbers that are going to be needed and numbers are assigned backwards from there. A fresh indexing of the git mailling list archive on v1 gives me numbers starting with 360 and on v2 numbers starting with 355. Which corresponds with the number of deleted messages. I am still looking to see if there are any other weird things here. I definitely do not like not being able to reconstruct message numbers from a backup. Eric ^ permalink raw reply [flat|nested] 21+ messages in thread
* msgmap serial number regeneration [was: Q: V2 format] 2018-07-13 20:03 ` Eric W. Biederman @ 2018-07-13 22:22 ` Eric Wong 2018-07-14 19:01 ` Eric W. Biederman 0 siblings, 1 reply; 21+ messages in thread From: Eric Wong @ 2018-07-13 22:22 UTC (permalink / raw) To: Eric W. Biederman; +Cc: meta "Eric W. Biederman" <ebiederm@xmission.com> wrote: > ebiederm@xmission.com (Eric W. Biederman) writes: > > Eric Wong <e@80x24.org> writes: > >> "Eric W. Biederman" <ebiederm@xmission.com> wrote: > >>> > >>> Because of the parallelism in V2 I have noticed messages in numbered > >>> in an order that does not correspond to their commit order. So the > >>> SQLite database isn't as recoverable as it might be. Especially as the > >>> parallelism introduces an element of non-determinancy. > >> > >> *puzzled* were you able to reproduce that? The serial number > >> generation + threading happens in the main process and the > >> parallelism is limited to Xapian text indexing. -index > >> generates serial numbers by walking backwards with v2, and > >> complains on unexpected results. > > Digging into this I have found consistenly non-reproducible numbering, > because of deleted files. Apparently in both V1 and V2 an a worst-case > estimate is made of the total numbers that are going to be needed and > numbers are assigned backwards from there. > > A fresh indexing of the git mailling list archive on v1 gives me numbers > starting with 360 and on v2 numbers starting with 355. Which > corresponds with the number of deleted messages. > > I am still looking to see if there are any other weird things here. Ah, yes, you're correct deletes don't get accounted for when regenerating. Oh well. I guess it was correct to document msgmap as something important to backup and not break for instances of particular servers. (emphasis on "particular servers") So I think you'd need to walk revision history twice to account for deleted messages... Across different machines, it should not matter to preserve serials. > I definitely do not like not being able to reconstruct message numbers > from a backup. For v2, I see serial numbers are an internal optimization which happens to map to NNTP. If the git repo is cloned and the cloner sets up a different server, it'll have a different address and clients won't know to deduplicate them anyways. I suppose it makes the load-balanced case a little more complex to sync(*) And this can't even account for independently started mirrors with no common git ancestry, as SMTP has zero guarantees on ordering. (*) But optimizing for load-balanced instances isn't ideal, I'd rather see more independently-run servers than giant load-balanced instances which everybody relies on. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: msgmap serial number regeneration [was: Q: V2 format] 2018-07-13 22:22 ` msgmap serial number regeneration [was: Q: V2 format] Eric Wong @ 2018-07-14 19:01 ` Eric W. Biederman 2018-07-15 3:18 ` Eric Wong 0 siblings, 1 reply; 21+ messages in thread From: Eric W. Biederman @ 2018-07-14 19:01 UTC (permalink / raw) To: Eric Wong; +Cc: meta Eric Wong <e@80x24.org> writes: > "Eric W. Biederman" <ebiederm@xmission.com> wrote: >> ebiederm@xmission.com (Eric W. Biederman) writes: >> > Eric Wong <e@80x24.org> writes: >> >> "Eric W. Biederman" <ebiederm@xmission.com> wrote: >> >>> >> >>> Because of the parallelism in V2 I have noticed messages in numbered >> >>> in an order that does not correspond to their commit order. So the >> >>> SQLite database isn't as recoverable as it might be. Especially as the >> >>> parallelism introduces an element of non-determinancy. >> >> >> >> *puzzled* were you able to reproduce that? The serial number >> >> generation + threading happens in the main process and the >> >> parallelism is limited to Xapian text indexing. -index >> >> generates serial numbers by walking backwards with v2, and >> >> complains on unexpected results. >> >> Digging into this I have found consistenly non-reproducible numbering, >> because of deleted files. Apparently in both V1 and V2 an a worst-case >> estimate is made of the total numbers that are going to be needed and >> numbers are assigned backwards from there. >> >> A fresh indexing of the git mailling list archive on v1 gives me numbers >> starting with 360 and on v2 numbers starting with 355. Which >> corresponds with the number of deleted messages. >> >> I am still looking to see if there are any other weird things here. > > Ah, yes, you're correct deletes don't get accounted for when > regenerating. Oh well. I guess it was correct to document msgmap > as something important to backup and not break for instances of > particular servers. (emphasis on "particular servers") > > So I think you'd need to walk revision history twice to account > for deleted messages... > > Across different machines, it should not matter to preserve > serials. I believe we can modify the msg number assignment to assign numbers to deletes as well as adds. Short of the same Message-ID coming up twice that should be enough for the current backwards loop to assign message ids reliably. And even Message-IDs comming up twice is handle-able. >> I definitely do not like not being able to reconstruct message numbers >> from a backup. > > For v2, I see serial numbers are an internal optimization which > happens to map to NNTP. > > If the git repo is cloned and the cloner sets up a different > server, it'll have a different address and clients won't know to > deduplicate them anyways. I suppose it makes the load-balanced > case a little more complex to sync(*) But if the server hardware fails. The case I am dealing with at the moment I can stand up a new server with the same ip address. Further if we can make everything but the git repository non-essential it yields more flexibility for changing and optimizing things in the future. > (*) But optimizing for load-balanced instances isn't ideal, > I'd rather see more independently-run servers than giant > load-balanced instances which everybody relies on. True. At this point I am just optimizing for my own operational simplicity of my own indpendentyly-run server. Eric ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: msgmap serial number regeneration [was: Q: V2 format] 2018-07-14 19:01 ` Eric W. Biederman @ 2018-07-15 3:18 ` Eric Wong 2018-07-16 15:20 ` Eric W. Biederman 0 siblings, 1 reply; 21+ messages in thread From: Eric Wong @ 2018-07-15 3:18 UTC (permalink / raw) To: Eric W. Biederman; +Cc: meta "Eric W. Biederman" <ebiederm@xmission.com> wrote: > I believe we can modify the msg number assignment to assign numbers to > deletes as well as adds. Short of the same Message-ID coming up twice > that should be enough for the current backwards loop to assign message > ids reliably. And even Message-IDs comming up twice is handle-able. OK, I would likely accept a patch to fix that. A note about Message-ID uniqueness... The v2 code will generate a new, truly unique Message-ID on duplicates and use that in msgmap instead what was in the message. It's gross, but I needed to do that to allow all messages to be accessible via Message-ID over NNTP, because: a) some legit messages reuse Message-IDs :< b) some broken mailers (including some versions of git-send-email) put multiple Message-IDs in the same message, so the code needs to handle messages with any number of Message-IDs anyways. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: msgmap serial number regeneration [was: Q: V2 format] 2018-07-15 3:18 ` Eric Wong @ 2018-07-16 15:20 ` Eric W. Biederman 0 siblings, 0 replies; 21+ messages in thread From: Eric W. Biederman @ 2018-07-16 15:20 UTC (permalink / raw) To: Eric Wong; +Cc: meta Eric Wong <e@80x24.org> writes: > "Eric W. Biederman" <ebiederm@xmission.com> wrote: >> I believe we can modify the msg number assignment to assign numbers to >> deletes as well as adds. Short of the same Message-ID coming up twice >> that should be enough for the current backwards loop to assign message >> ids reliably. And even Message-IDs comming up twice is handle-able. > > OK, I would likely accept a patch to fix that. > > A note about Message-ID uniqueness... The v2 code will generate > a new, truly unique Message-ID on duplicates and use that in > msgmap instead what was in the message. It's gross, but I needed > to do that to allow all messages to be accessible via Message-ID > over NNTP, because: > > a) some legit messages reuse Message-IDs :< > > b) some broken mailers (including some versions of git-send-email) > put multiple Message-IDs in the same message, so the code > needs to handle messages with any number of Message-IDs > anyways. I will send the patch along shortly. I mispoke when I said the problem could be fixed by assigning numbers to deletes. The actual problem was that not every add was assigned a number. So the fix simpler than I expected. It is interesting to note that INSERT DEL INSERT In sqlite does not reuse numbers in the primary key. So not reassigning numbers is what the local sqlite data base does as well. I need to track down what the v1 bug with add-remove-add was. I think the way I have updated the code I won't need the bug fix for v1. But I haven't checked that scenario yet. I also need to write a test case sigh. But in practice I have this working for git mailling list archive. Eric ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug: v2 deletes on incremental fetch [was: Q: V2 format] 2018-07-13 13:39 ` Eric W. Biederman 2018-07-13 20:03 ` Eric W. Biederman @ 2018-07-13 22:02 ` Eric Wong 2018-07-13 22:51 ` Eric W. Biederman 2018-07-14 0:46 ` [PATCH] v2writable: unindex deleted messages after incremental fetch Eric Wong 2018-07-13 23:07 ` IMAP server [was: Q: V2 format] Eric Wong 2 siblings, 2 replies; 21+ messages in thread From: Eric Wong @ 2018-07-13 22:02 UTC (permalink / raw) To: Eric W. Biederman; +Cc: meta "Eric W. Biederman" <ebiederm@xmission.com> wrote: > Eric Wong <e@80x24.org> writes: > > "Eric W. Biederman" <ebiederm@xmission.com> wrote: > >> Then I am going to report a probable bug. In V2 in public-inbox-index > >> I can not find a path from finding a 'd' file and a call to unindex. V1 > >> unindexes deleted files. Rebased heads for purges call unindex. I > >> don't see that for ordinary d files though. > > > > It shouldn't need to call unindex because they never get indexed > > on rebuilds. V2 indexing walks history backwards (normal "git log" > > behavior) so it remembers 'd' paths in the "$D" hash; and skips blobs > > as it encounters them. > > > > v1 needed to unindex because it used "git log --reverse" to walk > > forward in history. > > This assumes that you see them in the same git pull. I would think > ideally anything that is going to be deleted that quickly you can just > skip archiving. > > What is the time window of you expecting 'd' messages to appear? Ah, this is definitely a bug when using incremental fetch + -index. Right now, it only warns on unseen entries in $D but won't reach beyond the current "git log" window. I'll take a lookg at it later today/this weekend unless you're already working on it. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: bug: v2 deletes on incremental fetch [was: Q: V2 format] 2018-07-13 22:02 ` bug: v2 deletes on incremental fetch " Eric Wong @ 2018-07-13 22:51 ` Eric W. Biederman 2018-07-14 0:46 ` [PATCH] v2writable: unindex deleted messages after incremental fetch Eric Wong 1 sibling, 0 replies; 21+ messages in thread From: Eric W. Biederman @ 2018-07-13 22:51 UTC (permalink / raw) To: Eric Wong; +Cc: meta Eric Wong <e@80x24.org> writes: > "Eric W. Biederman" <ebiederm@xmission.com> wrote: >> Eric Wong <e@80x24.org> writes: >> > "Eric W. Biederman" <ebiederm@xmission.com> wrote: >> >> Then I am going to report a probable bug. In V2 in public-inbox-index >> >> I can not find a path from finding a 'd' file and a call to unindex. V1 >> >> unindexes deleted files. Rebased heads for purges call unindex. I >> >> don't see that for ordinary d files though. >> > >> > It shouldn't need to call unindex because they never get indexed >> > on rebuilds. V2 indexing walks history backwards (normal "git log" >> > behavior) so it remembers 'd' paths in the "$D" hash; and skips blobs >> > as it encounters them. >> > >> > v1 needed to unindex because it used "git log --reverse" to walk >> > forward in history. >> >> This assumes that you see them in the same git pull. I would think >> ideally anything that is going to be deleted that quickly you can just >> skip archiving. >> >> What is the time window of you expecting 'd' messages to appear? > > Ah, this is definitely a bug when using incremental fetch + -index. > Right now, it only warns on unseen entries in $D but won't reach > beyond the current "git log" window. > > I'll take a lookg at it later today/this weekend unless you're > already working on it. I am not. Eric ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH] v2writable: unindex deleted messages after incremental fetch 2018-07-13 22:02 ` bug: v2 deletes on incremental fetch " Eric Wong 2018-07-13 22:51 ` Eric W. Biederman @ 2018-07-14 0:46 ` Eric Wong 1 sibling, 0 replies; 21+ messages in thread From: Eric Wong @ 2018-07-14 0:46 UTC (permalink / raw) To: Eric W. Biederman; +Cc: meta Eric Wong <e@80x24.org> wrote: > "Eric W. Biederman" <ebiederm@xmission.com> wrote: > > Eric Wong <e@80x24.org> writes: > > > "Eric W. Biederman" <ebiederm@xmission.com> wrote: > > >> Then I am going to report a probable bug. In V2 in public-inbox-index > > >> I can not find a path from finding a 'd' file and a call to unindex. V1 > > >> unindexes deleted files. Rebased heads for purges call unindex. I > > >> don't see that for ordinary d files though. > > > > > > It shouldn't need to call unindex because they never get indexed > > > on rebuilds. V2 indexing walks history backwards (normal "git log" > > > behavior) so it remembers 'd' paths in the "$D" hash; and skips blobs > > > as it encounters them. > > > > > > v1 needed to unindex because it used "git log --reverse" to walk > > > forward in history. > > > > This assumes that you see them in the same git pull. I would think > > ideally anything that is going to be deleted that quickly you can just > > skip archiving. > > > > What is the time window of you expecting 'd' messages to appear? > > Ah, this is definitely a bug when using incremental fetch + -index. > Right now, it only warns on unseen entries in $D but won't reach > beyond the current "git log" window. The following should fix it, thanks for the bug report. -------8<------- Subject: [PATCH] v2writable: unindex deleted messages after incremental fetch The normal behavior is to prevent the deleted messages from being indexed in the first place. However, when fetching incrementally via git; public-inbox-index needs to account for deleted files which were created outside of the most recent fetch/reindexing window. Reported-by: Eric W. Biederman <ebiederm@xmission.com> --- lib/PublicInbox/V2Writable.pm | 20 ++++++++++---------- t/v2mirror.t | 28 +++++++++++++++++++++++++++- 2 files changed, 37 insertions(+), 11 deletions(-) diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm index 412eb6a..934640e 100644 --- a/lib/PublicInbox/V2Writable.pm +++ b/lib/PublicInbox/V2Writable.pm @@ -653,7 +653,7 @@ sub mark_deleted { my $mids = mids($mime->header_obj); my $cid = content_id($mime); foreach my $mid (@$mids) { - $D->{"$mid\0$cid"} = 1; + $D->{"$mid\0$cid"} = $oid; } } @@ -671,7 +671,7 @@ sub reindex_oid { my $num = -1; my $del = 0; foreach my $mid (@$mids) { - $del += (delete $D->{"$mid\0$cid"} || 0); + $del += delete($D->{"$mid\0$cid"}) ? 1 : 0; my $n = $mm_tmp->num_for($mid); if (defined $n && $n > $num) { $mid0 = $mid; @@ -882,7 +882,7 @@ sub index_sync { my ($min, $max) = $mm_tmp->minmax; my $regen = $self->index_prepare($opts, $epoch_max, $ranges); $$regen += $max if $max; - my $D = {}; + my $D = {}; # "$mid\0$cid" => $oid my @cmd = qw(log --raw -r --pretty=tformat:%H --no-notes --no-color --no-abbrev --no-renames); @@ -912,13 +912,13 @@ sub index_sync { delete $self->{reindex_pipe}; $self->update_last_commit($git, $i, $cmt) if defined $cmt; } - my @d = sort keys %$D; - if (@d) { - warn "BUG: ", scalar(@d)," unseen deleted messages marked\n"; - foreach (@d) { - my ($mid, undef) = split(/\0/, $_, 2); - warn "<$mid>\n"; - } + + # unindex is required for leftovers if "deletes" affect messages + # in a previous fetch+index window: + if (scalar keys %$D) { + my $git = $self->{-inbox}->git; + $self->unindex_oid($git, $_) for values %$D; + $git->cleanup; } $self->done; } diff --git a/t/v2mirror.t b/t/v2mirror.t index c0c329c..f95ad0f 100644 --- a/t/v2mirror.t +++ b/t/v2mirror.t @@ -182,7 +182,33 @@ is($mibx->git->check($to_purge), undef, 'unindex+prune successful in mirror'); is_deeply(\@warn, [], 'no warnings from index_sync after purge'); } -$v2w->done; +# deletes happen in a different fetch window +{ + $mset = $mibx->search->reopen->query('m:1@example.com', {mset => 1}); + is(scalar($mset->items), 1, '1@example.com visible in mirror'); + $mime->header_set('Message-ID', '<1@example.com>'); + $mime->header_set('Subject', 'subject = 1'); + ok($v2w->remove($mime), 'removed <1@example.com> from source'); + $v2w->done; + fetch_each_epoch(); + + open my $err, '+>', "$tmpdir/index-err" or die "open: $!"; + my $ipid = fork; + if ($ipid == 0) { + dup2(fileno($err), 2) or die "dup2 failed: $!"; + exec("$script-index", "$tmpdir/m"); + die "exec fail: $!"; + } + ok($ipid, 'running index'); + is(waitpid($ipid, 0), $ipid, 'index done'); + is($?, 0, 'no error from index'); + ok(seek($err, 0, 0), 'rewound stderr'); + $err = eval { local $/; <$err> }; + is($err, '', 'no errors reported by index'); + $mset = $mibx->search->reopen->query('m:1@example.com', {mset => 1}); + is(scalar($mset->items), 0, '1@example.com no longer visible in mirror'); +} + ok(kill('TERM', $pid), 'killed httpd'); $pid = undef; waitpid(-1, 0); -- EW ^ permalink raw reply related [flat|nested] 21+ messages in thread
* IMAP server [was: Q: V2 format] 2018-07-13 13:39 ` Eric W. Biederman 2018-07-13 20:03 ` Eric W. Biederman 2018-07-13 22:02 ` bug: v2 deletes on incremental fetch " Eric Wong @ 2018-07-13 23:07 ` Eric Wong 2018-07-13 23:12 ` Eric W. Biederman 2018-09-28 20:10 ` Johannes Berg 2 siblings, 2 replies; 21+ messages in thread From: Eric Wong @ 2018-07-13 23:07 UTC (permalink / raw) To: Eric W. Biederman; +Cc: meta "Eric W. Biederman" <ebiederm@xmission.com> wrote: > > "Eric W. Biederman" <ebiederm@xmission.com> wrote: > >> Eric Wong <e@80x24.org> writes: > > As far as personal mail goes, I wouldn't want serial numbers at all > > (more unnecessary state to keep track of). > > At least imap requires serial numbers, and I imagine the easy transition > for mail clients is to have an imap server. As you have mentioned an > ordered list of commits is good enough to reconstruct the msgnum > reliably so it is unlikely we would need to do anything special there. I would rather layer IMAP (and POP3) on top of NNTP than to tie it to any git/SQLite/Xapian parts in public-inbox. We could ship it with public-inbox, of course; but I don't see why an IMAP or POP3 server could not work by using innd (or similar) as a backend. I don't think any design compromises need to be made to existing the git/SQLite/Xapian parts to support IMAP/POP3. Hosting an IMAP/POP3 server is way more overhead for the admin as it requires storing user credentials and storing per-reader state. So the preference is to do NNTP as well as possible and layer the complexity of per-user account data on top of it. Right now, none of the NNTP/HTTP parts require write access to the machine it runs on aside from log files. Thus the goal is to promote NNTP usage as it's cheapest/easiest for the server admin; but to still have IMAP/POP3 as stopgaps (similar to the ssoma/mlmmj-replay script I use to allow SMTP subscriptions to this inbox). ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: IMAP server [was: Q: V2 format] 2018-07-13 23:07 ` IMAP server [was: Q: V2 format] Eric Wong @ 2018-07-13 23:12 ` Eric W. Biederman 2018-09-28 20:10 ` Johannes Berg 1 sibling, 0 replies; 21+ messages in thread From: Eric W. Biederman @ 2018-07-13 23:12 UTC (permalink / raw) To: Eric Wong; +Cc: meta Eric Wong <e@80x24.org> writes: > "Eric W. Biederman" <ebiederm@xmission.com> wrote: >> > "Eric W. Biederman" <ebiederm@xmission.com> wrote: >> >> Eric Wong <e@80x24.org> writes: >> > As far as personal mail goes, I wouldn't want serial numbers at all >> > (more unnecessary state to keep track of). >> >> At least imap requires serial numbers, and I imagine the easy transition >> for mail clients is to have an imap server. As you have mentioned an >> ordered list of commits is good enough to reconstruct the msgnum >> reliably so it is unlikely we would need to do anything special there. > > I would rather layer IMAP (and POP3) on top of NNTP than to tie > it to any git/SQLite/Xapian parts in public-inbox. We could > ship it with public-inbox, of course; but I don't see why an > IMAP or POP3 server could not work by using innd (or similar) as > a backend. > > I don't think any design compromises need to be made to existing > the git/SQLite/Xapian parts to support IMAP/POP3. > > Hosting an IMAP/POP3 server is way more overhead for the admin > as it requires storing user credentials and storing per-reader > state. So the preference is to do NNTP as well as possible and > layer the complexity of per-user account data on top of it. > > Right now, none of the NNTP/HTTP parts require write access > to the machine it runs on aside from log files. > > Thus the goal is to promote NNTP usage as it's cheapest/easiest > for the server admin; but to still have IMAP/POP3 as stopgaps > (similar to the ssoma/mlmmj-replay script I use to allow SMTP > subscriptions to this inbox). That makes complete sense. I definitely agree that NNTP should be what is optimized for. Eric ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: IMAP server [was: Q: V2 format] 2018-07-13 23:07 ` IMAP server [was: Q: V2 format] Eric Wong 2018-07-13 23:12 ` Eric W. Biederman @ 2018-09-28 20:10 ` Johannes Berg 2018-09-28 21:01 ` Eric W. Biederman 1 sibling, 1 reply; 21+ messages in thread From: Johannes Berg @ 2018-09-28 20:10 UTC (permalink / raw) To: Eric Wong, Eric W. Biederman; +Cc: meta Sorry to just jump into an old thread; I was wondering about IMAP server support as well, in particular because unlike NNTP that allows pushing the search to the server, and that would be useful for local archives. > Hosting an IMAP/POP3 server is way more overhead for the admin > as it requires storing user credentials and storing per-reader > state. So the preference is to do NNTP as well as possible and > layer the complexity of per-user account data on top of it. I'm not really sure that's true; dovecot, for example, provides their lists archives via anonymous IMAP: https://www.dovecot.org/mailinglists.html They have instructions here on how to do that over dovecot: https://wiki2.dovecot.org/HowTo/ReadOnlyArchive In particular: /var/home/anonymous/control# ls -la drwxr-xr-x 3 root root 4096 May 25 15:43 ./ drwxr-xr-x 3 anondove root 4096 Mar 20 14:39 .imap/ -rw-r--r-- 1 root root 33 May 25 15:43 .subscriptions Create the .subscriptions file manually to contain all the mailboxes you. Note that the control directory isn't writable by anondove, so that the subscriptions can't be changed. [...] * INBOX must always exists even if it's empty. Make sure it's not writable. * Make sure the mail directory isn't writable so users can't create new mailboxes. * The mboxes can be placed in the directory itself, or symlinks can be used. Above you'll see that mailman places all Dovecot archives under /var/home/archives. Make sure none of these files are writable by anondove. They also set up some read-only ACLs, I think to make the read-only state clear to the user agent, but of course a public-inbox IMAP server can hard-code all of this and not accept any write commands to start with. Anyway, just FYI; since I don't know perl at all I don't think I'll be doing any work on this. johannes ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: IMAP server [was: Q: V2 format] 2018-09-28 20:10 ` Johannes Berg @ 2018-09-28 21:01 ` Eric W. Biederman 2018-10-01 7:46 ` Johannes Berg 0 siblings, 1 reply; 21+ messages in thread From: Eric W. Biederman @ 2018-09-28 21:01 UTC (permalink / raw) To: Johannes Berg; +Cc: Eric Wong, meta Johannes Berg <johannes@sipsolutions.net> writes: > Sorry to just jump into an old thread; I was wondering about IMAP server > support as well, in particular because unlike NNTP that allows pushing > the search to the server, and that would be useful for local archives. > >> Hosting an IMAP/POP3 server is way more overhead for the admin >> as it requires storing user credentials and storing per-reader >> state. So the preference is to do NNTP as well as possible and >> layer the complexity of per-user account data on top of it. > > I'm not really sure that's true; dovecot, for example, provides their > lists archives via anonymous IMAP: > https://www.dovecot.org/mailinglists.html > > They have instructions here on how to do that over dovecot: > https://wiki2.dovecot.org/HowTo/ReadOnlyArchive > > In particular: > > /var/home/anonymous/control# ls -la > drwxr-xr-x 3 root root 4096 May 25 15:43 ./ > drwxr-xr-x 3 anondove root 4096 Mar 20 14:39 .imap/ > -rw-r--r-- 1 root root 33 May 25 15:43 .subscriptions > > Create the .subscriptions file manually to contain all the mailboxes > you. Note that the control directory isn't writable by anondove, so > that the subscriptions can't be changed. > > [...] > > * INBOX must always exists even if it's empty. Make sure it's not > writable. > * Make sure the mail directory isn't writable so users can't create new > mailboxes. > * The mboxes can be placed in the directory itself, or symlinks can be > used. Above you'll see that mailman places all Dovecot archives under > /var/home/archives. Make sure none of these files are writable by > anondove. > > They also set up some read-only ACLs, I think to make the read-only > state clear to the user agent, but of course a public-inbox IMAP server > can hard-code all of this and not accept any write commands to start > with. > > Anyway, just FYI; since I don't know perl at all I don't think I'll be > doing any work on this. I have looked at gnus and there is support in there for performing searches via the old gmane web interface. Public inbox already provides an attribute that tells you what the web server is. So all it will really take is someone with a little time to wire up the search interface. Beyond that if you have the archives local (and that is easy) it is quite possible to just git grep through them and find things of interest. I should verify this but I don't think IMAP has a good version of the NNTP overview database. Which seems to make IMAP quite a bit slower for reading news. Certainly gnus+public-inbox locally is running quite a bit faster than my old gnus+cyrus-imap configuration. I tried to read through the IMAP search specification to see how it compares with what public-inbox makes available and I did not get particularly far. It was not easy to match up the various search capabilities. The biggest issue is that IMAP tends to not talk about message-ids. Where that is fundamentally one of the most important fields to index if you are dealing with threaded mail. So long story short while I am not opposed to a read-only IMAP configuration I think NNTP has much to recommend it. I do think we need little things like SSL support for NNTP. Just to prevent inappropriate access to traffic in flight. It won't be for a while yet but I have some scripts I need to push at least to the public-inbox scripts directory that simplify the process taking a single email address subscribing to email and sorting it out into different public-inbox git archives. Currently I have every mailling list I am subscribed to pushed into public-inbox. Eric ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: IMAP server [was: Q: V2 format] 2018-09-28 21:01 ` Eric W. Biederman @ 2018-10-01 7:46 ` Johannes Berg 2018-10-01 8:51 ` Eric W. Biederman 0 siblings, 1 reply; 21+ messages in thread From: Johannes Berg @ 2018-10-01 7:46 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Eric Wong, meta On Fri, 2018-09-28 at 23:01 +0200, Eric W. Biederman wrote: > > I have looked at gnus and there is support in there for performing > searches via the old gmane web interface. Public inbox already provides > an attribute that tells you what the web server is. So all it will > really take is someone with a little time to wire up the search > interface. That's ... interesting, but of course completely out-of-band. I'm not sure it should or could be advocated that every email client actually implement that :-) But if you think broader than that, you don't even necessarily need a web server to run p-i. > Beyond that if you have the archives local (and that is easy) it is > quite possible to just git grep through them and find things of > interest. That also doesn't use the index, not sure how that's any better? > I should verify this but I don't think IMAP has a good version of the > NNTP overview database. Which seems to make IMAP quite a bit slower for > reading news. Certainly gnus+public-inbox locally is running quite a > bit faster than my old gnus+cyrus-imap configuration. IMAP servers typically should do header/MIME parsing, so you should be able to query such a thing - but not as easily as XOVER, I suppose. However, I think FETCH could be made to return the data similar to XOVER, though it may not be backed by a pre-created database file, and it depends on what the client does to show the overview in the first place. > I tried to read through the IMAP search specification to see how it > compares with what public-inbox makes available and I did not get > particularly far. It was not easy to match up the various search > capabilities. The biggest issue is that IMAP tends to not talk > about message-ids. Where that is fundamentally one of the most > important fields to index if you are dealing with threaded mail. You can search for arbitrary headers in search by using HEADER <field-name> <string> where the string is "contains", so you can use it for both Message-Id and References headers. > So long story short while I am not opposed to a read-only IMAP > configuration I think NNTP has much to recommend it. I do think we need > little things like SSL support for NNTP. Just to prevent inappropriate > access to traffic in flight. Sure. I'm not saying NNTP is bad, just saying that the choice of clients is rather limited. Also, posting isn't supported over NNTP, so if I had it all in my email client I could read in the public-inbox archive, and respond via normal email. > It won't be for a while yet but I have some scripts I need to push at > least to the public-inbox scripts directory that simplify the process > taking a single email address subscribing to email and sorting it out > into different public-inbox git archives. Currently I have every > mailling list I am subscribed to pushed into public-inbox. :-) johannes ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: IMAP server [was: Q: V2 format] 2018-10-01 7:46 ` Johannes Berg @ 2018-10-01 8:51 ` Eric W. Biederman 0 siblings, 0 replies; 21+ messages in thread From: Eric W. Biederman @ 2018-10-01 8:51 UTC (permalink / raw) To: Johannes Berg; +Cc: Eric Wong, meta Johannes Berg <johannes@sipsolutions.net> writes: > On Fri, 2018-09-28 at 23:01 +0200, Eric W. Biederman wrote: >> >> I have looked at gnus and there is support in there for performing >> searches via the old gmane web interface. Public inbox already provides >> an attribute that tells you what the web server is. So all it will >> really take is someone with a little time to wire up the search >> interface. > > That's ... interesting, but of course completely out-of-band. I'm not > sure it should or could be advocated that every email client actually > implement that :-) > > But if you think broader than that, you don't even necessarily need a > web server to run p-i. >> Beyond that if you have the archives local (and that is easy) it is >> quite possible to just git grep through them and find things of >> interest. > > That also doesn't use the index, not sure how that's any better? So for linux-kernel. I have 7G for the git email archive and 65G more for the indexes. Which makes the indexes quite expensive. So for personal use I am not certain an archive is a benefit. Especially when the email archive fits in ram and the index does not. I have to wonder if there is a way to make the indexes an order of magnitude smaller. >> I should verify this but I don't think IMAP has a good version of the >> NNTP overview database. Which seems to make IMAP quite a bit slower for >> reading news. Certainly gnus+public-inbox locally is running quite a >> bit faster than my old gnus+cyrus-imap configuration. > > IMAP servers typically should do header/MIME parsing, so you should be > able to query such a thing - but not as easily as XOVER, I suppose. > > However, I think FETCH could be made to return the data similar to > XOVER, though it may not be backed by a pre-created database file, and > it depends on what the client does to show the overview in the first > place. >> I tried to read through the IMAP search specification to see how it >> compares with what public-inbox makes available and I did not get >> particularly far. It was not easy to match up the various search >> capabilities. The biggest issue is that IMAP tends to not talk >> about message-ids. Where that is fundamentally one of the most >> important fields to index if you are dealing with threaded mail. > > You can search for arbitrary headers in search by using > > HEADER <field-name> <string> > > where the string is "contains", so you can use it for both Message-Id > and References headers. >> So long story short while I am not opposed to a read-only IMAP >> configuration I think NNTP has much to recommend it. I do think we need >> little things like SSL support for NNTP. Just to prevent inappropriate >> access to traffic in flight. > > Sure. I'm not saying NNTP is bad, just saying that the choice of clients > is rather limited. Also, posting isn't supported over NNTP, so if I had > it all in my email client I could read in the public-inbox archive, and > respond via normal email. The thing I can confirm and I have gotten as far as is that nntp has a sequential message id, and IMAP has a sequential message id and public-inbox has a sequential message id (now reliably based upon the order of the messages in the git archive). So it is very possible to have a read-only IMAP view. The really noticable downside of IMAP is that it does want to keep the status of messages you have read on the server. That makes a read-only archive a bit of a pain. So I am not certain the choice of clients when you restrict IMAP to what is an advantage. Nor am I certain the general IMAP search functionality maps well to what public-inbox indexes or people want to search for. Which is me again saying while things can make I am not certain IMAP is the best protocol for the job. >> It won't be for a while yet but I have some scripts I need to push at >> least to the public-inbox scripts directory that simplify the process >> taking a single email address subscribing to email and sorting it out >> into different public-inbox git archives. Currently I have every >> mailling list I am subscribed to pushed into public-inbox. > > :-) I do love that public-inbox makes it very easy to archive all my content and still be able to take it all with me when I travel. Eric ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2018-10-01 8:51 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-07-11 20:01 Q: V2 format Eric W. Biederman 2018-07-11 21:18 ` Konstantin Ryabitsev 2018-07-11 21:41 ` Eric W. Biederman 2018-07-12 1:47 ` Eric Wong 2018-07-12 13:58 ` Eric W. Biederman 2018-07-12 23:09 ` Eric Wong 2018-07-13 13:39 ` Eric W. Biederman 2018-07-13 20:03 ` Eric W. Biederman 2018-07-13 22:22 ` msgmap serial number regeneration [was: Q: V2 format] Eric Wong 2018-07-14 19:01 ` Eric W. Biederman 2018-07-15 3:18 ` Eric Wong 2018-07-16 15:20 ` Eric W. Biederman 2018-07-13 22:02 ` bug: v2 deletes on incremental fetch " Eric Wong 2018-07-13 22:51 ` Eric W. Biederman 2018-07-14 0:46 ` [PATCH] v2writable: unindex deleted messages after incremental fetch Eric Wong 2018-07-13 23:07 ` IMAP server [was: Q: V2 format] Eric Wong 2018-07-13 23:12 ` Eric W. Biederman 2018-09-28 20:10 ` Johannes Berg 2018-09-28 21:01 ` Eric W. Biederman 2018-10-01 7:46 ` Johannes Berg 2018-10-01 8:51 ` Eric W. Biederman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).