* [PATCH] TODO: add note for "IMAP IDLE"-like long-polling "git fetch" @ 2018-12-29 3:43 Eric Wong 2018-12-29 3:56 ` Eric Wong 0 siblings, 1 reply; 9+ messages in thread From: Eric Wong @ 2018-12-29 3:43 UTC (permalink / raw) To: meta --- TODO | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/TODO b/TODO index 87cadc9..c9ee756 100644 --- a/TODO +++ b/TODO @@ -90,3 +90,7 @@ all need to be considered for everything we introduce) davfs2 needs Range: request support for this to be feasible: https://savannah.nongnu.org/bugs/?33259 https://savannah.nongnu.org/support/?107649 + +* Contribute something like IMAP IDLE for "git fetch". + Inboxes (and any git repos) can be kept up-to-date without + relying on polling. -- EW ^ permalink raw reply related [flat|nested] 9+ messages in thread
* "IMAP IDLE"-like long-polling "git fetch" 2018-12-29 3:43 [PATCH] TODO: add note for "IMAP IDLE"-like long-polling "git fetch" Eric Wong @ 2018-12-29 3:56 ` Eric Wong 2018-12-29 4:38 ` Konstantin Ryabitsev 0 siblings, 1 reply; 9+ messages in thread From: Eric Wong @ 2018-12-29 3:56 UTC (permalink / raw) To: git; +Cc: meta Hey all, I just added this to the TODO file for public-inbox[1] but obviously it's intended for git.git (meta@public-inbox cc-ed): > +* Contribute something like IMAP IDLE for "git fetch". > + Inboxes (and any git repos) can be kept up-to-date without > + relying on polling. I would've thought somebody had done this by now, but I guess it's dependent on a bunch of things (TLS layer nowadays, maybe HTTP/2), so git-daemon support alone wouldn't cut it... Anyways, until this is implemented, feel free to continue hammering a way on https://public-inbox.org/git/ with frequent "git fetch". I write C10K servers in my sleep -_- [1] https://public-inbox.org/meta/20181229034342.11543-1-e@80x24.org/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "IMAP IDLE"-like long-polling "git fetch" 2018-12-29 3:56 ` Eric Wong @ 2018-12-29 4:38 ` Konstantin Ryabitsev 2018-12-29 6:13 ` Eric Wong 2019-01-09 22:27 ` Stefan Beller 0 siblings, 2 replies; 9+ messages in thread From: Konstantin Ryabitsev @ 2018-12-29 4:38 UTC (permalink / raw) To: Eric Wong; +Cc: git, meta On Sat, Dec 29, 2018 at 03:56:21AM +0000, Eric Wong wrote: > Hey all, I just added this to the TODO file for public-inbox[1] but > obviously it's intended for git.git (meta@public-inbox cc-ed): > > > +* Contribute something like IMAP IDLE for "git fetch". > > + Inboxes (and any git repos) can be kept up-to-date without > > + relying on polling. > > I would've thought somebody had done this by now, but I guess > it's dependent on a bunch of things (TLS layer nowadays, maybe > HTTP/2), so git-daemon support alone wouldn't cut it... Polling is not all bad, especially for large repository collections. I'm not sure you want to "idle" individual repositories when there's thousands of them. We ended up writing grokmirror for replicating repo collections using manifest files. > Anyways, until this is implemented, feel free to continue > hammering a way on https://public-inbox.org/git/ with frequent > "git fetch". I write C10K servers in my sleep -_- The archive is also mirrored at https://git.kernel.org/pub/scm/public-inbox/vger.kernel.org/git.git, and also on kernel.googlesource.com. -K ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "IMAP IDLE"-like long-polling "git fetch" 2018-12-29 4:38 ` Konstantin Ryabitsev @ 2018-12-29 6:13 ` Eric Wong 2019-01-09 22:27 ` Stefan Beller 1 sibling, 0 replies; 9+ messages in thread From: Eric Wong @ 2018-12-29 6:13 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: git, meta Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote: > On Sat, Dec 29, 2018 at 03:56:21AM +0000, Eric Wong wrote: > > Hey all, I just added this to the TODO file for public-inbox[1] but > > obviously it's intended for git.git (meta@public-inbox cc-ed): > > > > > +* Contribute something like IMAP IDLE for "git fetch". > > > + Inboxes (and any git repos) can be kept up-to-date without > > > + relying on polling. > > > > I would've thought somebody had done this by now, but I guess > > it's dependent on a bunch of things (TLS layer nowadays, maybe > > HTTP/2), so git-daemon support alone wouldn't cut it... > > Polling is not all bad, especially for large repository collections. I'm > not sure you want to "idle" individual repositories when there's > thousands of them. We ended up writing grokmirror for replicating > repo collections using manifest files. I wasn't intending it for giant sites like korg, but for individual hackers on their workstations tracking a handful of projects they follow. The cost for a hackers' machine would be the same as the current situation where developers idle on IRC channels for the projects they're involved in. > > Anyways, until this is implemented, feel free to continue > > hammering a way on https://public-inbox.org/git/ with frequent > > "git fetch". I write C10K servers in my sleep -_- > > The archive is also mirrored at > https://git.kernel.org/pub/scm/public-inbox/vger.kernel.org/git.git, and > also on kernel.googlesource.com. Now, I'm wondering if you can make a v2 public-inbox mirror of git@vger and run it on lore. Converting public-inbox.org/git to v2 would break things for everybody fetching, right now :< ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "IMAP IDLE"-like long-polling "git fetch" 2018-12-29 4:38 ` Konstantin Ryabitsev 2018-12-29 6:13 ` Eric Wong @ 2019-01-09 22:27 ` Stefan Beller 2019-01-09 22:49 ` Konstantin Ryabitsev 2019-05-02 8:50 ` Eric Wong 1 sibling, 2 replies; 9+ messages in thread From: Stefan Beller @ 2019-01-09 22:27 UTC (permalink / raw) To: Eric Wong, git, meta On Fri, Dec 28, 2018 at 8:39 PM Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote: > > On Sat, Dec 29, 2018 at 03:56:21AM +0000, Eric Wong wrote: > > Hey all, I just added this to the TODO file for public-inbox[1] but > > obviously it's intended for git.git (meta@public-inbox cc-ed): > > > > > +* Contribute something like IMAP IDLE for "git fetch". > > > + Inboxes (and any git repos) can be kept up-to-date without > > > + relying on polling. > > > > I would've thought somebody had done this by now, but I guess > > it's dependent on a bunch of things (TLS layer nowadays, maybe > > HTTP/2), so git-daemon support alone wouldn't cut it... > > Polling is not all bad, especially for large repository collections. I disagree with that statement. IIRC, More than half the bandwidth of Googles git servers are used for ls-remote calls (i.e. polling a lot of repos, most of them did *not* change, by build bots which are really eager to try again after a minute). That is why we use a superproject, with all other repositories as a submodule for polling, as that would slash the ls-remote traffic approximately by the number of repositories. There was an attempt in JGit to support this type of communication of long polling at https://git.eclipse.org/r/plugins/gitiles/jgit/jgit/+/2adc572628f9382ace5fbd791325dc64f7c968d3 but not a whole lot is left over in JGit as it was refactored at least once again. IIRC the issues where in the lack of protocol definition that made it usable for a wider audience. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "IMAP IDLE"-like long-polling "git fetch" 2019-01-09 22:27 ` Stefan Beller @ 2019-01-09 22:49 ` Konstantin Ryabitsev 2019-05-02 8:50 ` Eric Wong 1 sibling, 0 replies; 9+ messages in thread From: Konstantin Ryabitsev @ 2019-01-09 22:49 UTC (permalink / raw) To: Stefan Beller; +Cc: Eric Wong, git, meta On Wed, Jan 09, 2019 at 02:27:25PM -0800, Stefan Beller wrote: > > > I would've thought somebody had done this by now, but I guess > > > it's dependent on a bunch of things (TLS layer nowadays, maybe > > > HTTP/2), so git-daemon support alone wouldn't cut it... > > > > Polling is not all bad, especially for large repository collections. > > I disagree with that statement. > > IIRC, More than half the bandwidth of Googles git servers are used > for ls-remote calls (i.e. polling a lot of repos, most of them did *not* > change, by build bots which are really eager to try again after a minute). Oh, that's not the kind of polling I meant -- we monitor a single manifest file containing the state of all repositories. It's a static file served directly by any httpd daemon, and the only traffic is usually the "not modified" http header. -K ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "IMAP IDLE"-like long-polling "git fetch" 2019-01-09 22:27 ` Stefan Beller 2019-01-09 22:49 ` Konstantin Ryabitsev @ 2019-05-02 8:50 ` Eric Wong 2019-05-02 9:21 ` Ævar Arnfjörð Bjarmason 1 sibling, 1 reply; 9+ messages in thread From: Eric Wong @ 2019-05-02 8:50 UTC (permalink / raw) To: Stefan Beller; +Cc: git, meta Stefan Beller <sbeller@google.com> wrote: > IIRC, More than half the bandwidth of Googles git servers are used > for ls-remote calls (i.e. polling a lot of repos, most of them did *not* > change, by build bots which are really eager to try again after a minute). Thinking back at that statement; I think polling can be optimized in git, at least. IIRC, your repos have lots of refs; right? (which is why it's a bandwidth problem) Since info/refs is a static file (hopefully updated by a post-update hook), the smart client can make an HTTP request to check If-Modified-Since: to avoid the big response. The client would need to cache the mtime of the last requested refs file; somewhere. IOW, do refs negotiation the "dumb" way; since it's no better than the smart way, really. Keep doing object transfers the smart way. During the initial clone, smart servers could probably have a header informing clients that their info/refs is up-to-date and clients can do dumb refs negotiation. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "IMAP IDLE"-like long-polling "git fetch" 2019-05-02 8:50 ` Eric Wong @ 2019-05-02 9:21 ` Ævar Arnfjörð Bjarmason 2019-05-02 9:42 ` Eric Wong 0 siblings, 1 reply; 9+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2019-05-02 9:21 UTC (permalink / raw) To: Eric Wong; +Cc: Stefan Beller, git, meta On Thu, May 02 2019, Eric Wong wrote: > Stefan Beller <sbeller@google.com> wrote: >> IIRC, More than half the bandwidth of Googles git servers are used >> for ls-remote calls (i.e. polling a lot of repos, most of them did *not* >> change, by build bots which are really eager to try again after a minute). > > Thinking back at that statement; I think polling can be > optimized in git, at least. > > IIRC, your repos have lots of refs; right? > (which is why it's a bandwidth problem) > > Since info/refs is a static file (hopefully updated by a > post-update hook), the smart client can make an HTTP request > to check If-Modified-Since: to avoid the big response. > > The client would need to cache the mtime of the last requested > refs file; somewhere. > > IOW, do refs negotiation the "dumb" way; since it's no better > than the smart way, really. Keep doing object transfers the > smart way. > > During the initial clone, smart servers could probably > have a header informing clients that their info/refs > is up-to-date and clients can do dumb refs negotiation. Doing this with If-Modified-Since sounds like an easier drop-in replacement (just needs a client change), but I wonder if ETag isn't a better fit for this. I.e. we'd document some convention where the ETag is a hash of the refs the client expects to be advertised in some format, it then sends that to the server. That allows the same thing without anyone keeping more state than they keep now in their local ref store On the fancier side I think bloom filters are something that's been discussed (and I believe someone (Twitter?) had such an internal patch), i.e. the client sends a bloom filter of refs they have, and the server advertises things they don't know about yet (and due to how bloom filters work, some things they *do* know about already but tripped up the bloom filter...). ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "IMAP IDLE"-like long-polling "git fetch" 2019-05-02 9:21 ` Ævar Arnfjörð Bjarmason @ 2019-05-02 9:42 ` Eric Wong 0 siblings, 0 replies; 9+ messages in thread From: Eric Wong @ 2019-05-02 9:42 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Stefan Beller, git, meta Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > On Thu, May 02 2019, Eric Wong wrote: > > > Stefan Beller <sbeller@google.com> wrote: > >> IIRC, More than half the bandwidth of Googles git servers are used > >> for ls-remote calls (i.e. polling a lot of repos, most of them did *not* > >> change, by build bots which are really eager to try again after a minute). > > > > Thinking back at that statement; I think polling can be > > optimized in git, at least. > > > > IIRC, your repos have lots of refs; right? > > (which is why it's a bandwidth problem) > > > > Since info/refs is a static file (hopefully updated by a > > post-update hook), the smart client can make an HTTP request > > to check If-Modified-Since: to avoid the big response. > > > > The client would need to cache the mtime of the last requested > > refs file; somewhere. > > > > IOW, do refs negotiation the "dumb" way; since it's no better > > than the smart way, really. Keep doing object transfers the > > smart way. > > > > During the initial clone, smart servers could probably > > have a header informing clients that their info/refs > > is up-to-date and clients can do dumb refs negotiation. > > Doing this with If-Modified-Since sounds like an easier drop-in > replacement (just needs a client change), but I wonder if ETag isn't a > better fit for this. ETags overall could work. > I.e. we'd document some convention where the ETag is a hash of the refs > the client expects to be advertised in some format, it then sends that > to the server. But I was hoping to avoid the overhead of spawning git-http-backend entirely. And there's no consistent way to configure ETags on different static servers. > That allows the same thing without anyone keeping more state than they > keep now in their local ref store I think caching the remote info/refs is useful anyways in case the user changes their fetch refspec, and it could speed up invocations of "git ls-remote". > On the fancier side I think bloom filters are something that's been > discussed (and I believe someone (Twitter?) had such an internal patch), > i.e. the client sends a bloom filter of refs they have, and the server > advertises things they don't know about yet (and due to how bloom > filters work, some things they *do* know about already but tripped up > the bloom filter...). I'm not smart enough to understand such fancy things :) ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2019-05-02 9:42 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-12-29 3:43 [PATCH] TODO: add note for "IMAP IDLE"-like long-polling "git fetch" Eric Wong 2018-12-29 3:56 ` Eric Wong 2018-12-29 4:38 ` Konstantin Ryabitsev 2018-12-29 6:13 ` Eric Wong 2019-01-09 22:27 ` Stefan Beller 2019-01-09 22:49 ` Konstantin Ryabitsev 2019-05-02 8:50 ` Eric Wong 2019-05-02 9:21 ` Ævar Arnfjörð Bjarmason 2019-05-02 9:42 ` Eric Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).