all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: raid5atemyhomework <raid5atemyhomework@protonmail.com>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: "guix-devel@gnu.org" <guix-devel@gnu.org>
Subject: Re: ZFS on Guix, again
Date: Tue, 23 Feb 2021 01:11:45 +0000	[thread overview]
Message-ID: <puHf0apH_Cqq9BKxSFqmnibUbCXHXAB_eWJqbshFeKx3DYhX-ZPzflqjISPyr9wwr-ptd8YLTIKlEuDlSrI93LLYGLTkjiNWoEht-1VE35o=@protonmail.com> (raw)
In-Reply-To: <87mtvweazb.fsf@gnu.org>

Hi Ludo'

> Hi,
>
> Sorry for the delay; this isn’t as simple as it looks!
>
> I agree with 宋文武 regarding ‘file-system-service-type’.
>
> raid5atemyhomework raid5atemyhomework@protonmail.com skribis:
>
> > However, for the case where the user expects the "typical" ZFS style of managing file systems, we need to mount all the ZFS file systems and ensure that they aer all already mounted by the time `file-systems` Shepherd service is started. This means we need to be able to extend the `requirement` of the `file-systems` Shepherd service. And we need to do that without putting any extra `/etc/fstab` entries since for "typical" ZFS style of managing file systems, they are required to not be put in `/etc/fstab`.
>
> Looks like this fstab issue is the main reason why you felt the need to
> define an extra service type. Why is it important that ZFS not be
> listed in /etc/fstab?


Because on all non-Guix operating systems, they aren't listed in `/etc/fstab`:

* https://docs.oracle.com/cd/E19120-01/open.solaris/817-2271/gaztn/index.html

What ZFS users expect is that you just do something as simple as this:

    # zpool create mypool raidz2 /dev/disk/by-id/ata-Generic_M0D3L_53R14LN0 /dev/disk/by-id/ata-Generic_M0D3L_53R14LN1  /dev/disk/by-id/ata-Generic_M0D3L_53R14LN2  /dev/disk/by-id/ata-Generic_M0D3L_53R14LN3 log mirror /dev/disk/by-id/ata-Generic_55DM0D3L_53R14LN0  /dev/disk/by-id/ata-Generic_55DM0D3L_53R14LN1 /dev/disk/by-id/ata-Generic_55DM0D3L_53R14LN2

And what happens is:

* The pool `mypool` is created containing a RAIDZ-2 of the 4 HDDs listed, with a separate log device consisting of a mirror of 3 SSDs.
* A filesystem `mypool` is created on the pool `mypool`.
* The `mypool` filesystem is mounted on `/mypool`.
* On all subsequent bootups, the `mypool` filesystem is mounted on `/mypool`.

In ZFS you are expected to have dozens of filesystems.  If you have a new application, the general expectation is that you create a new filesystem for it.  In general you might have one pool, or maybe two or three, but you host most of your data in multiple filesystems on that same pool.

So for example you might want to create a filesystem for videos, which are sequentially accessed and tend to be fairly large, so setting `recordsize=1M` makes sense (good for sequential access, not so much for random, and good for very large files measurable in dozens of megabytes).

    # zfs create -o recordsize=1M -o mountpoint=/home/raid5atemyhomework/Videos mypool/videos

The above command does:

* The filesystem `videos` is created on the pool `mypool`.
* The `mypool/videos` filesystem is mounted on `/home/raid5atemyhomework/Videos`.
* On all subsequent bootups, the filesystem is mounted on `/home/raid5atemyhomework/Videos`.

Now I might also want to run say a PostgreSQL service.

* PostgreSQL allocates in page sizes of 8k, so `recordsize=8k` is best.
* PostgreSQL uses a journal, which has a different access pattern from the rest of the data.  Journals are written sequentially and read sequentially, while the database itself is accessed randomly.
  * The data should have `logbias=throughput` to optimize and reduce use of the ZIL SLOG, to avoid "log on a log" slowdown effects.
  * The journal itself should continue to use the default "latency".

So I would do:

    # zfs create -o recordsize=8k -o logbias=throughput -o mountpoint=/postgresql mypool/postgresql
    # zfs create -o logbias=latency -o mountpoint=/postgresql/pg_wal mypool/postgresql/pg_wal

That means creating two filesystems for a single application, one for the PostgreSQL data, the other for the PostgreSQL journal.

What the above examples show is:

* The habit for a ZFS user is to create many filesystems.  On my own homelab I have two filesystems (one for documents and code, one for videos and pictures) for data I manage myself, and I have two other filesystems for two different applications I am running as well.
* Each filesystem has different tuning properties.

On a server you might have a dozen or so ZFS filesystems for various applications you need to run.  There are also many other tuning parameters to tweak.  If done by `/etc/fstab` it would lead to a fairly large file.

The base logic here is that `/etc/fstab` has to be stored on disk anyway, and ZFS can just store the same information on the disks it is managing directly.  Then ZFS supports nice tabulated output of properties via `zfs list`:

    # zfs list -o name,recordsize,logbias,atime,relatime
    NAME                RECSIZE  LOGBIAS     ATIME  RELATIME
    hddpool                128K  latency     off    on
    hddpool/bitcoin        128K  latency     off    on
    hddpool/common         128K  latency     off    on
    hddpool/lightning       64K  latency     off    on
    hddpool/media            1M  latency     off    on

And you can change parameters easily with `zfs set`.  There are many dozens of possible properties as well.

Thus, the general expectation among ZFS users is ***not*** to use any kind of `/etc/fstab` at all, because such a `/etc/fstab` would be ludicrously large with a dozen filesystems and several properties.  And the declarative `file-system` Guix syntax is really just `/etc/fstab` in another format.  So the expectation for a ZFS user would be to keep using classic `zpool` and `zfs` commands to manage the filesystems and parameters.

The main purpose of the `operating-system` declaration is to allow the system to be brought back again, but the configuration file has to exist on *some* permanent storage anyway so the information might as well be managed by ZFS directly on the disks it is managing.

ZFS also allows snapshotting of the configuration of the pool, so this isn't really an advantage to keeping the configuration of the pool in the `operating-system` as well.  You can rollback changes to the pool just as well as you can rollback `operating-system`.

Since this is the expected behavior of ZFS, we should support it as much as possible.

If the user wants to really manage ZFS via `file-system` declarations, they can set `mountpoint=legacy` and then the user can put them in `file-system` declarations that then become `/etc/fstab` entries.  But if the user doesn't want to manage them via `file-system` declarations, we should also support that use-case (because that is how ZFS is meant to be used in other operating systems).  So we need to have the `file-system` Shepherd service also wait for non-`/etc/fstab` filesystems like ZFS, not just those listed in `/etc/fstab`.


Thanks
raid5atemyhomework


  reply	other threads:[~2021-02-23  1:12 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-20  8:48 ZFS on Guix, again raid5atemyhomework
2021-02-20 11:44 ` 宋文武
2021-02-22  8:57 ` Ludovic Courtès
2021-02-23  1:11   ` raid5atemyhomework [this message]
2021-02-25  5:08     ` raid5atemyhomework
2021-03-09  2:34       ` raid5atemyhomework

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='puHf0apH_Cqq9BKxSFqmnibUbCXHXAB_eWJqbshFeKx3DYhX-ZPzflqjISPyr9wwr-ptd8YLTIKlEuDlSrI93LLYGLTkjiNWoEht-1VE35o=@protonmail.com' \
    --to=raid5atemyhomework@protonmail.com \
    --cc=guix-devel@gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.