unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Make guix-publish's URL identical to cache file name
@ 2020-11-04  8:46 Peng Mei Yu
  2020-11-04 12:01 ` Jonathan Brielmaier
  2020-11-06  9:55 ` Ludovic Courtès
  0 siblings, 2 replies; 13+ messages in thread
From: Peng Mei Yu @ 2020-11-04  8:46 UTC (permalink / raw)
  To: guix-devel

Hi,

This proposal aims to solve an old problem.  Make it easier to setup a
mirror server for the official substitute server and prevent future
complaints from China residents about network speed.

Due to the national firewall(i.e. The GFW) deployed on every backbone
networks within China and it's aggressive rules on traffic passing
through the wall, the internet connection from inner China to outside
world is extremely unreliable and slow.  This makes it a pain for
newcomers to try and play with Guix, since guix daemon loves to spawn
thousands of HTTP requests to the substitute server and download
thousands of megabytes of packages from the substitute server.  Usually
it takes a whole day to install Guix system on a new computer.  I am
serious.  I have been a Guix user for several years and actually I am
not very much bothered by the problem.  But since V1.0 release of Guix,
Guix is frequently mentioned in tech news.  When tech savvy boys hear
the cool Guix and try to install it on their computer and still cannot
complete the installation process after half a day, they will always be
pissed off by the slow network speed and complain on internet.  I
mentioned Guix several times on some websites, this makes me the first
Chinese Guix user they can find with Google and ask for help.

I am sure maintainers here know these kind of complaints.  Do you feel
it is strange that no Chinese user make new complaints in year 2020
while the Guix project is quickly progressing and becoming more popular?
That is because I decided to setup a mirror server for Chinese Guix
users after receiving several complaints online and realizing that
lobbying academic FLOSS mirror maintainers to support Guix will be
stagnant for at least several years due to these maintainers' laziness
and cowardliness and some ridiculous strict governmental regulations.
These people only want to add mirror for a project if it is as simple as
pulling static files with a cron job (usually with rsync) and serving
static files through a simple HTTP server.  HTTP reverse proxy is not an
option.  So my mirror.guix.org.cn project was started.  It's an HTTP
cache mirror of ci.guix.gnu.org, plus a git mirror of
https://git.savannah.gnu.org/cgit/guix.git.  Yes, the connection to
Savannah is also extremely slow and it makes `guix pull` unusable.

This mirror server started as an experiment and it has been working
well.  If random new user come to me and say `guix pull` is so slow or
`guix install` is so slow, I simply tell them to use mirror.guix.org.cn.
The number of active Chinese Guix users I know has increased from two to
about ten after someone's broadcast in a news group.  It is basically:
"Look.  There is thing called Guix.  Someone has setup a mirror server
for it in China."  The traffic on the server is increasing.  Network
connections are stable.  Everything is fine in this year.  However one
thing worries me.  The bandwidth of mirror.guix.org.cn is only 5Mbps
(still far more better than ci.guix.gnu.org's 30KB/s and constant
connection reset).  This is the highest bandwidth I can afford because
internet bandwidth in China is damn too expensive.  Buying higher
bandwidth is not a financially possible approach for me.  This is not a
problem in the short term but definitely be a problem in the long term.
Persuading academic FLOSS mirror maintainers to support Guix is still
the best solution for Chinese users.  Academic organizations usually
have 100Mbps bandwidth and tens of terabytes of disk.

Now, finally, we are onto the main point.  I look into guix publish's
cache directory and think that nar and narinfo files can be directly
served through a static HTTP server if we make those files' URL
identical to their on-disk file name.  The current directory structure
is like this:

--8<---------------cut here---------------start------------->8---
/var/cache/guix/publish
├── gzip
│   ├── 87kif0bpf0anwbsaw0jvg8fyciw4sz67-bash-5.0.16.nar
│   ├── 87kif0bpf0anwbsaw0jvg8fyciw4sz67-bash-5.0.16.narinfo
│   ├── fa6wj5bxkj5ll1d7292a70knmyl7a0cr-glibc-2.31.nar
│   └── fa6wj5bxkj5ll1d7292a70knmyl7a0cr-glibc-2.31.narinfo
├── hashes
│   ├── 87kif0bpf0anwbsaw0jvg8fyciw4sz67
│   └── fa6wj5bxkj5ll1d7292a70knmyl7a0cr
├── last-expiry-cleanup
└── lzip
    ├── 87kif0bpf0anwbsaw0jvg8fyciw4sz67-bash-5.0.16.nar
    ├── 87kif0bpf0anwbsaw0jvg8fyciw4sz67-bash-5.0.16.narinfo
    ├── fa6wj5bxkj5ll1d7292a70knmyl7a0cr-glibc-2.31.nar
    └── fa6wj5bxkj5ll1d7292a70knmyl7a0cr-glibc-2.31.narinfo
--8<---------------cut here---------------end--------------->8---

--8<---------------cut here---------------start------------->8---
> md5sum /var/cache/guix/publish/*/87kif0bpf0anwbsaw0jvg8fyciw4sz67-bash-5.0.16.narinfo
29cdbf041b9a304bf58f2e75ec23f18f  /var/cache/guix/publish/gzip/87kif0bpf0anwbsaw0jvg8fyciw4sz67-bash-5.0.16.narinfo
29cdbf041b9a304bf58f2e75ec23f18f  /var/cache/guix/publish/lzip/87kif0bpf0anwbsaw0jvg8fyciw4sz67-bash-5.0.16.narinfo
--8<---------------cut here---------------end--------------->8---

When a client tries to download
/gnu/store/87kif0bpf0anwbsaw0jvg8fyciw4sz67-bash-5.0.16, it sends a
request to http://example.com/87kif0bpf0anwbsaw0jvg8fyciw4sz67.narinfo
and gets the content of
/var/cache/guix/publish/gzip/87kif0bpf0anwbsaw0jvg8fyciw4sz67-bash-5.0.16.narinfo:

--8<---------------cut here---------------start------------->8---
StorePath: /gnu/store/87kif0bpf0anwbsaw0jvg8fyciw4sz67-bash-5.0.16
URL: nar/gzip/87kif0bpf0anwbsaw0jvg8fyciw4sz67-bash-5.0.16
Compression: gzip
FileSize: 2284657
URL: nar/lzip/87kif0bpf0anwbsaw0jvg8fyciw4sz67-bash-5.0.16
Compression: lzip
FileSize: 1256260
NarHash: sha256:1ap2s3xz3bbp5n78v826gxagy7pic1wpgzz3ka72jdyk6qpmw3qr
NarSize: 6597040
References: ...
System: x86_64-linux
Deriver: cccyyn4xq59aimybmhlrfl2bi8kslhlm-bash-5.0.16.drv
Signature: ...
--8<---------------cut here---------------end--------------->8---

Client then sends a request to the URL as written in the URL field:
http://example.com/nar/lzip/87kif0bpf0anwbsaw0jvg8fyciw4sz67-bash-5.0.16.
This URL returns the file
/var/cache/guix/publish/lzip/87kif0bpf0anwbsaw0jvg8fyciw4sz67-bash-5.0.16.nar.

I propose we make the URL field in narinfo the same as nar file name on
disk.  We can change the directory structure to:

--8<---------------cut here---------------start------------->8---
/var/cache/guix/publish/nar
├── 87kif0bpf0anwbsaw0jvg8fyciw4sz67.narinfo
├── 87kif0bpf0anwbsaw0jvg8fyciw4sz67-bash-5.0.16.nar.gz
└── 87kif0bpf0anwbsaw0jvg8fyciw4sz67-bash-5.0.16.nar.lz
--8<---------------cut here---------------end--------------->8---

And change the URL field in narinfo to

--8<---------------cut here---------------start------------->8---
URL: 87kif0bpf0anwbsaw0jvg8fyciw4sz67-bash-5.0.16.nar.lz
--8<---------------cut here---------------end--------------->8---

Then a mirror site can simply pull the directory
/var/cache/guix/publish/nar from the Berlin server and serve this
directory through a static HTTP server.  There will be cache misses.
But guix-daemon will safely fallback to the next server in
substitute-urls.

What's your opinion?

I have to decide next year's server specs and budget for
mirror.guix.org.cn before the Chinese shopping festival ends on November
11.  If the proposal above is doable, I will keep mirror.guix.org.cn
running for half a year and help academic mirror sites add support for
Guix in the meantime.  Otherwise I prefer to buy a prepaid three years
VPS with a 90% discount during the shopping festival.  The discount is
huge.  I don't want to miss it.


--
Peng Mei Yu


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-01-11  2:14 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-04  8:46 Make guix-publish's URL identical to cache file name Peng Mei Yu
2020-11-04 12:01 ` Jonathan Brielmaier
2020-11-05  1:55   ` Peng Mei Yu
2020-11-13 12:53     ` zimoun
2020-11-06  9:51   ` Ludovic Courtès
2020-11-06  9:55 ` Ludovic Courtès
2020-11-07  6:03   ` Ricardo Wurmus
2020-11-08 17:08     ` Ludovic Courtès
2020-12-09  8:29       ` Peng Mei Yu
2020-12-14  9:54         ` Ludovic Courtès
2021-01-11  2:13           ` Peng Mei Yu
2020-11-09  2:59   ` Peng Mei Yu
2020-11-12 20:37     ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).