====== ZFS auf Linux ("ZFS on Linux") ======

[[wpde>ZFS]] ist ein von Sun Microsystems entwickeltes transaktionales Dateisystem, das zahlreiche Erweiterungen für die Verwendung im Server- und Rechenzentrumsbereich enthält. Hierzu zählen die vergleichsweise große maximale Dateisystemgröße, eine einfache Verwaltung selbst komplexer Konfigurationen, die integrierten RAID-Funktionalitäten, das Volume-Management sowie der prüfsummenbasierte Schutz vor Datenübertragungsfehlern. Der Name ZFS stand ursprünglich für Zettabyte File System.

Eine direkte Unterstützung innerhalb des Linux-Kernels ist aus Lizenzgründen problematisch (Inkompatibilität von GPLv2 und CDDL), daher gibt es momentan keine Implementierung die direkt im Kernel integriert ist.
Zudem ist es (aus Sicht von ZFS) problematisch das zunehmend Kernel-Funktionen exklusiv für GPL-Software reserviert werden, die Problematik beschreibt der Artikel [[https://www.heise.de/select/ct/2019/05/1551087960739991|Linux 5.0: Lizenzkennzeichnung trifft Nvidia und ZFS on Linux]].

Allerdings gibt es:
  * das Projekt ZFS on FUSE, das ZFS auch unter Linux nutzbar macht. Die Implementierung im Userspace hat verschiedene Nachteile, unter anderem einen verminderten Datendurchsatz.
  * aktuell ist der bevorzugte Lösungsansatz "ZFS on Linux" (ZoL), nach Aussagen der Entwickler ist es seit der im April 2013 veröffentlichen Version 0.6.1 reif für den produktiven Einsatz. Hier werden die nötigen Kernel-Module außerhalb des Kernel-Quellbaums gepflegt und sind daher bei vielen Installationsprogrammen nicht enthalten.

Quelle: [[wpde>ZFS|Wikipedia]].


===== Installation =====

Enthaltene Versionen von ZFS on Linux in den Distributionen

  * **Ubuntu** 16.04 + 18.04: ''0.7.5-1ubuntu16.4''
  * **Debian**:
    * jessie (backports): ''[[https://packages.debian.org/jessie-backports/zfsutils-linux|0.6.5.9]]''
    * stretch (in contrib): ''[[https://packages.debian.org/source/stretch/zfs-linux|0.6.5.9]]''
    * buster:
      * contrib: ''[[https://packages.debian.org/source/buster/zfs-linux|0.7.12]]''
      * backports: ''0.8.x''
    * bullseye (in contrib): ''[[https://packages.debian.org/bullseye/zfs-dkms|2.0.x]]''
    * bookworm (in contrib): ''[[https://packages.debian.org/bookworm/zfs-dkms|2.1.x]]''
  * RHEL/CentOS: Version? https://github.com/zfsonlinux/zfs/wiki/RHEL-and-CentOS

==== Debian11 (bullseye) + Debian 12 (bookworm) ====

  - contrib aktivieren
  - <code>sudo apt install linux-headers-amd64 dkms --no-install-recommends # cloud-kernel: linux-headers-cloud-amd64
sudo apt install zfs-dkms --no-install-recommends # need contrib enabled
# not needed anymore:
# sudo modprobe zfs
sudo apt install zfsutils-linux
</code>
  - [[linux:zfs#monitoring]]


=== Paketquellen für bullseye mit contrib ===

<file>
deb http://deb.debian.org/debian bullseye main contrib
deb-src http://deb.debian.org/debian bullseye main contrib
deb http://security.debian.org/debian-security bullseye-security main contrib
deb-src http://security.debian.org/debian-security bullseye-security main contrib
deb http://deb.debian.org/debian bullseye-updates main contrib
deb-src http://deb.debian.org/debian bullseye-updates main contrib
deb http://deb.debian.org/debian bullseye-backports main contrib
deb-src http://deb.debian.org/debian bullseye-backports main contrib
</file>

==== Debian10 (buster) ====

Leider ist [[https://github.com/zfsonlinux/zfs/wiki/Debian|die Anleitung]] etwas dünn, die Installation von zfs-utils scheitert wenn [[https://github.com/zfsonlinux/zfs/issues/6083|das Kernel-modul vorher nicht geladen worden ist (nicht in debian11)]].

<file># vi /etc/apt/sources.list.d/buster-backports.list
deb http://deb.debian.org/debian buster-backports main contrib
deb-src http://deb.debian.org/debian buster-backports main contrib
</file>

<file># vi /etc/apt/preferences.d/90_zfs
Package: libnvpair1linux libuutil1linux libzfs2linux libzpool2linux spl-dkms zfs-dkms zfs-test zfsutils-linux zfsutils-linux-dev zfs-zed
Pin: release n=buster-backports
Pin-Priority: 990
</file>

<code>sudo apt install linux-headers-amd64 dkms # cloud-kernel: linux-headers-cloud-amd64
sudo apt install zfs-dkms --no-install-recommends
sudo modprobe zfs
sudo apt install zfsutils-linux
</code>

==== Modul bei Start automatisch laden ====

=== debian 10, 11, 12 ===

mindestens unter debian 11 wird das Modul automatisch geladen.

<code bash>echo "zfs" >> /etc/modules-load.d/modules.conf</code>

=== andere Systeme ===
 
<code bash>echo "zfs" >> /etc/modules</code>

==== Ubuntu ====

ab 15.10: <code bash>apt install zfsutils-linux</code>
Ältere Versionen siehe [[https://github.com/zfsonlinux/zfs/wiki/Ubuntu|hier]].

==== CentOS7 ====

Install zfs-release package:
<code bash>
yum localinstall --nogpgcheck http://archive.zfsonlinux.org/epel/zfs-release.el7.noarch.rpm
yum-config-manager --enable "zfs"
yum install zfs
modprobe zfs
</code>

===== Begriffe =====

**ZFS Virtual Devices** (**ZFS VDEVs**) sind meta-Devices die aus den u.g. Geräten bestehen kann. VDEVs werden dynamisch erzeugt, devices/Geräte können hinzugefügt, aber nicht entfernt werden.

^ Bezeichnung ^ Bedeutung ^
| File | Dateien auf anderen Dateisystemen die als Quelle dienen |
| Physical Drive | physikalisches Gerät (HDD, SDD, PCIe NVME, usw.) |
| Mirror | a standard RAID1 mirror |
| ZFS | software raidz1, raidz2, raidz3 'distributed' parity based RAID |
| Hot Spare | hot spare for ZFS software raid. |
| Cache | a device for level 2 adaptive read cache (ZFS L2ARC) |
| Log | ZFS Intent Log (ZFS ZIL) |

**ZFS Pools**: bestehen aus einem oder mehreren VDEVS, auf einem Pool können ZFS-Dateisysteme erzeugt werden.

**ZFS Dataset Types** - es gibt drei Typen:
  * filesystem: Dateisystem und Clones
  * snapshot: Snapshots ((Momentaufnahmen von Dateisystem))
  * volume: ein blockdevice was unter ''/dev/zvol/$Poolname/$Volumename'' auftaucht. Das Volume hat ein fixe Größe, je nach Implementierung kann es auch [[https://github.com/zfsonlinux/zfs/pull/1099|direkt von zfs]] oder [[https://www.linuxquestions.org/questions/linux-general-1/problem-with-iscsi-4175530610/|indirekt]] als iSCSI target exportiert werden. Weitere Informationen finden sich in der Doku zu [[https://docs.oracle.com/cd/E18752_01/html/819-5461/gaypf.html|ZFS Volumes]]

Der folgende Befehl listet die vorhandenen Typen auf: <code bash>zfs list -o name,type,mountpoint</code>
<file>
NAME       TYPE        MOUNTPOINT
pool1      filesystem  /pool1
pool1/fs1  volume      -
pool1/fs2  filesystem  /pool1/fs2
</file>


===== Einschränkungen / Probleme =====

  - Vollbelegung (>90%) macht den Datenspeicher sehr langsam, ab 80% könnte die Geschwindigkeit sinken
  - import Fehler nach reboots (insbesondere bei mehreren pools): Es sollten immer /dev/disk/by-id/* Aliases benutzt werden! <box red round right | **/dev/disk/by-id**>Am besten solche die das Modell und die Seriennummer enthalten (Beispiel: nvme-INTEL_SSDPEKNW512G8_BTNH84242VDF511A-part1) oder die SAS-wwn (Beispiel: wwn-0x5000cca04ec097f1-part1). So lassen sich Geräte mit smartctl -i /dev/GERÄT finden.</box>
  - zfs als root-Dateisystem erfordert einige Anpassungen: https://github.com/zfsonlinux/zfs/wiki/Ubuntu-18.04-Root-on-ZFS
  - race conditions mit anderen daemons beim mounten aufgrund fehlender systemd.mount integration: https://github.com/zfsonlinux/zfs/issues/4898  https://github.com/zfsonlinux/zfs/pull/7329
  - Geräte (top-level VDEV) können erst mit Solaris 11.4 entfernt werden https://docs.oracle.com/cd/E37838_01/html/E61017/remove-devices.html 
  - Swap-devices auf zfs-Volumes können Problem bereiten


==== Arbeitspeicherbelegung begrenzen ====

ZFS benutzt standardmäßig bis zu 50% des RAMs, die aktuelle Belegung anzeigen:<code bash>echo $(( `cat /proc/spl/kmem/slab | tail -n +3 | awk '{ print $3 }' | tr "\n" "+" | sed "s/$/0/"` ))</code>

Beispiel: auf 8G RAM beschränken '' /etc/modprobe.d/zfs.conf''
<file>
options zfs zfs_arc_max=8589934592
</file>

Zur Laufzeit: ''echo 8589934592 > /sys/module/zfs/parameters/zfs_arc_max''
==== zfs Distributed RAID (dRAID) vdev type ====

OpenZFS 2.1.x führt distributed RAID (dRAID) als neuen vdev Typ ein. Dies reduziert bei großen Datenträgerverbünden die Wiederherstellungszeiten.

  * https://arstechnica.com/gadgets/2021/07/a-deep-dive-into-openzfs-2-1s-new-distributed-raid-topology/
  * https://openzfs.github.io/openzfs-docs/Basic%20Concepts/dRAID%20Howto.html


===== Administration =====

==== Pools ====

siehe auch: man zpool

=== pool anlegen ===

pool aus VDEVs anlegen: <code bash>sudo zpool create single-DISK /dev/disk/by-id/DISK1</code>

**Redundanz** (vergleichbar mit RAID-Level)
  * **Striped VDEVS** ("Raid 0") <code bash>sudo zpool create samsung-stripe /dev/disk/by-id/DISK1 /dev/DISK2</code>
  * **Mirrored VDEVs** ("Raid 1") <code bash>sudo zpool create samsung-mirror mirror /dev/DISK1 /dev/DISK2</code>
  * **Striped Mirrored VDEVs** ("RAID10", einen stripe über zwei mirrored pools): <code bash>sudo zpool create meinStripedMirroredPool mirror /dev/DISK1 /dev/disk/DISK2 mirror /dev/DISK3 /dev/DISK4</code>oder<code bash>sudo zpool create meinStripedMirroredPool mirror /dev/DISK1 /dev/DISK2
sudo zpool add meinStripedMirroredPool mirror /dev/DISK3 /dev/DISK4
</code>
  * **RAIDZ** ("Raid 5", verkraftet den Ausfall //eines// VDEVs): <code bash>sudo zpool create meinRAIDZ raidz /dev/disk/by-id/DISK1 /dev/disk/by-id/DISK2 /dev/disk/by-id/DISK3</code>
  * **RAIDZ2** ("RAID6", verkraftet den Ausfall //zweier// VDEVs): <code bash>sudo zpool create meinRAIDZ2 raidz2 /dev/disk/by-id/DISK1 /dev/disk/by-id/DISK2 /dev/disk/by-id/DISK3 /dev/disk/by-id/DISK4</code>
  * **RAIDZ3** (verkraftet den Ausfall //dreier// VDEVs): <code bash>sudo zpool create meinRAIDZ3 raidz3 /dev/disk/by-id/DISK1 /dev/disk/by-id/DISK2 /dev/disk/by-id/DISK3 /dev/disk/by-id/DISK4 /dev/disk/by-id/DISK5</code>
  * **verschachteltes "Nested" RAIDZ** ("RAID50", "RAID60") <code bash>sudo zpool create meinNestedRAIDZ raidz /dev/disk/by-id/DISK1 /dev/disk/by-id/DISK2 /dev/disk/by-id/DISK3 /dev/disk/by-id/DISK4
sudo zpool add example raidz /dev/DISK5 /dev/DISK6 /dev/DISK7 /dev/DISK8</code>


=== pools auflisten ===

  zpool list


=== pool zerstören ===

  sudo zpool destroy $Poolname

=== pool Informationen anzeigen ===

  sudo zpool status

=== Pools erweitern ===

Datenträger können zu einem pool hinzugefügt werden, jedoch nicht zu einem raidz-Verbund (-f würde ein Raid0 mit der neuen Festplatte erzeugen):
  * Alte Partitionen und Dateisystemsignaturen löschen: <code bash>wipefs /dev/DISK2</code>
  * neue GPT-Signatur erzeugen: <code bash>parted /dev/DISK2 mklabel gpt</code>

neue disk hinzufügen:
  * <code bash>sudo zpool add $Poolname /dev/DISK2 -f</code>

das drunterliegende device (hier 1) erweitern:
  * <code bash>parted /dev/xvdf resizepart 1 100%</code>
  * optional die property "autoexpand=on" um nicht export and import des pool oder reboot notwendig zu machen:<code bash>zpool set autoexpand=on $Poolname</code>
  * pool erweitern auf volle Größe des drunterliegenden devices:<code bash>zpool online -e $Poolname $disk</code>

== rebalance ==

Die Erweiterung gilt nur für neu geschriebene Daten, vorhandene Daten müssen komplett neu geschrieben werden.
Das könnte mit dem PHP-Skript [[https://github.com/programster/zfs-balancer|ZFS Balancer]] erledigt werden.

=== ZFS Pool Scrubbing ===

Scrubbing **starten** (Fortschritt beobachten mit -v): <code bash>zpool scrub $Poolname</code>
Scrubbing **pausieren**: <code bash>zpool scrub -p $Poolname</code> nochmal ausführen für **fortsetzen**, oder <code bash>zpool scrub $Poolname</code>

=== Diskaustausch ===

Wenn der scrub Fehler gezeigt hat (zpool status) 
<code bash>
zpool detach mypool /dev/BADDISK
zpool attach mypool /dev/GOODDISK /dev/BADDISK  -f
</code>

Bzw. ersetzen (resilvering) einer Festplatte:
<code bash>zpool replace mypool /dev/BADDISK /dev/GOODDISK</code>

Insbesondere falls Geräte mit /dev/sdX aufgenommen wurden, kann es sein das sich Gerätenamen ändern und diese Platten als FAULTED bzw. UNAVAILABLE angezeigt werden. Ein resilver ist leider nicht zu vermeiden aber leider weigert sich zfs diese Platten wieder aufzunehmen (auch mit -f) weil sie Teil des aktiven Pools waren ("").
In diesem Fall ist der Export des Pools notwendig (''zpool export mypool'') und labelclear auf den betroffenen Platten: ''zpool labelclear /dev/sda''


=== resilvering speed ===

<code bash>
sysctl vfs.zfs.scrub_delay=0
sysctl vfs.zfs.top_maxinflight=128
sysctl vfs.zfs.resilver_min_time_ms=5000
sysctl vfs.zfs.resilver_delay=0
</code>

^ Einstellung ^ Wirkung ^
| vfs.zfs.scrub_delay | "Number of ticks to delay between each I/O during a scrub" |
| vfs.zfs.resilver_delay | "Number of milliseconds of delay inserted between each I/O during a resilver." |
| vfs.zfs.top_maxinflight | "Maxmimum number of outstanding I/Os per top-level vdev. Limits the depth of the command queue to prevent high latency." |
| vfs.zfs.resilver_min_time_ms | "spend more time resilvering before it flushes pending transactions." |

iostat-x zeigt mit "w_await" und "r_await" langsame disks.
=== pool umbenennen ===

...das geht über export/import:

<code bash>
zpool export AlterNAME
zpool import AlterNAME NeuerName
</code>

=== pools importieren ===

Scan nach pools: ''zpool import''

import eines pools (-f ist nötig wenn der pool vorher an einem anderen Systemen angemeldet war): ''zpool import $NAME -f''
=== ZFS Intent Logs ===

[[http://nex7.blogspot.com/2013/04/zfs-intent-log.html|ZFS Intent Logs]] 

<code bash>sudo zpool add meinPool log /dev/DISK1 -f</code>


=== ZFS Cache Drives ===

ZFS Cache Drives bieten eine extra caching-Schicht zwischen Arbeitsspeicher und Speichergerät, vor allem für zufällige Lesezugriffe vorrangig statischer Daten.

<code bash>sudo zpool add meinPool cache /dev/DISK1 -f</code>


=== ZFS trim ===

  * Automatisches Trim für $pool aktivieren (ZoL 0.8.x nötig): <code bash>sudo zpool set autotrim=on $pool</code>
  * manuelles trim: <code bash>zpool trim$pool</code>


==== ZFS Dateisysteme ====

Jeder Pool kann 2^64 ZFS-Dateisysteme enthalten.

=== erzeugen ===

Hinweis: Die Größe wird über **Quota** festgelegt, ohne Angabe wird die Kapazität des pool ausgeschöpft.


**Dateisystem anlegen**:
<code bash>
zfs create -o acltype=posixacl -o compression=on -o dnodesize=auto -o normalization=formD -o relatime=on -o mountpoint=/media/FS1 meinPool/FS1
</code>

**Erklärungen**:
  * ashift=12: Berücksichtigung der physikalisch benutzten 4k Blöcke
  * acltype=posixacl POSIX ACLs global aktivieren (wichtig wenn damit / oder ein Ort genutzt wird der von systemd beschrieben wird)
  * xattr=sa verbessert performance erweiterter Attribute (Linux-only)
  * normalization=formD UTF-8 Dateinamens-Normalisierung (impliziert utf8only=on was gültige UTF-8 Dateinamen erzwingt)

=== Verschlüsselung ===

**Dateisystem (verschlüsselt) anlegen** (ZoL Version 0.8.x erforderlich) - zusätzliche Parameter: <code bash>zfs create -o atime=off -o xattr=sa -o compression=on -o encryption=aes-256-ccm -o keyformat=passphrase -o mountpoint=/media/FS1 meinPool/FS1</code>

  * encryption is standardmäßig "aes-128-ccm", möglich sind die CCM-Modi aes-128-ccm, aes-192-ccm, aes-256-ccm und die GCM-modi "aes-128-gcm, aes-192-gcm, aes-256-gcm". :!: Der GCM-Modus ist normalerweise schneller als ccm, aber bei neueren Kerneln (4.14.120, 4.19.38, and 5.0) ist wegen den fehlenden SIMD-Instruktionen der GCM-Modus deutlich langsamer (50%?)
  * keylocation=prompt ist standard

**datastore öffnen**: <code bash>zfs load-key meinPool/FS1</code>
**datastore schließen**: <code bash>zfs unload-key meinPool/FS1</code>


**Informationen anzeigen**: <code bash>zfs get -p encryption,keystatus,keyformat,keylocation,encryptionroot</code>

**Key ändern** (Zugang zum Masterschlüssel der die Daten verschlüsselt): <code bash>zfs change-key meinPool/FS1</code>

Falls manueller import erzwungen werden soll: ([[https://docs.oracle.com/cd/E19253-01/820-2313/gdrcf/index.html|canmount=off]] erzwingt manuellen import)


== keyfiles ==

  * Anforderungen an die Schlüsseldatei: "Raw keys and hex keys must be 32 bytes long (regardless of the chosen encryption suite) and must be randomly generated":
  * Datei anlegen:<code bash>dd if=/dev/urandom of=/root/zfs_key bs=32 count=1</code>
  * keylocation angeben <code bash>-o keyformat=raw -o keylocation=file:///root/zfs_key $poolname</code>
  * keylocation via GET-request von einem Server holen: <code bash>-o keysource=raw,https://keys.example.com/mykey $poolname</code>


Weitere Informationen:
  * [[https://docs.oracle.com/cd/E53394_01/html/E54801/gkkih.html|Encrypting ZFS File Systems]]
  * https://pve.proxmox.com/wiki/ZFS_on_Linux
  * https://techgoat.net/index.php?id=174


=== mounten ===

  * Bei der Erstellung werden ZFS-Dateisysteme automatisch eingehangen.
  * Die Eigenschaften übergeordneter Dateisystem werden vererbt.
  * Standardmäßig wird nur in leere Verzeichnisse gemounted, der Parameter ''-O'' ändert das

**Automatische Einhängepunkte**:
  * ''zfs list'' die aktuellen mountpoint auf
  * abfragen geht auch: <code bash>zfs get mountpoint meinPool/FS1</code>
  * **Einhängepunkte setzen**: <code bash>zfs set mountpoint=/media/meinPool_FS1 meinPool/FS1</code>
  * Property mounted: <code bash>zfs get mounted meinPool/FS1</code>

Auf Anforderung:
  * wenn die [[https://docs.oracle.com/cd/E19253-01/820-2313/gdrcf/index.html|Eigenschaft canmount]] auf off steht, ist die Eigeschaft mountpoint leer (und ''zfs mount'' bzw. ''zfs mount -a'' funktionieren nicht).
  * bei canmount auf noauto können diese nur explizit eingehangen werden
 
**Legacy-Einhängepunkte** <code bash>zfs set mountpoint=legacy meinPool/FS1</code>
  * ZFS-Dateisysteme können nur mit Legacy-Dienstprogrammen verwaltet werden (indem Sie die Eigenschaft "mountpoint" auf "legacy" setzen)
  * dann sind die Befehle mount und umount sowie die Datei /etc/vfstab relevant.
  * ZFS hängt Legacy-Dateisysteme beim Systemstart nicht automatisch ein, die ZFS-Befehle ''zfs mount'' und ''zfs umount'' funktionieren nicht!


^ ZFS-Property ^ mount Option ^
| atime | atime/noatime  |
| devices | devices/nodevices |
| exec |  exec/noexec |
| nbmand | nbmand/nonbmand |
| readonly | ro/rw |
| setuid | setuid/nosetuid |
| xattr | xattr/noaxttr |

Die mount Option nosuid ist ein Alias für nodevices,nosetuid.

=== löschen ===

<code bash>sudo zfs destroy meinPool/FS1</code>


=== Eigenschaften festlegen ===

  * Reservierung (minimal reserviert): <code bash>zfs set reservation=800G meinPool/FS1</code>
  * Quota (Maximalverbrauch): <code bash>sudo zfs set quota=10G meinPool/FS1</code>
  * **Kompression**: <file>
compression=on
compression=lz4
</file>
  * die effektive Kompressionsrate gibt der folgende Befehl aus:<code bash>zfs get compressratio $Pool/$FS</code>je höher der Wert über 1.00x liegt desto besser die Kompression

=== Tuning ===

Aktuelle Statistiken: ''arc_summary''


https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Workload%20Tuning.html

  * ashift=12 für 4k sectors (ashift=13 8k sectors bei modernen SSDs, ashift=9: 512B Sectoren - bei alten Festplatten)
  * xattr=sa extended attributes in inodes statt in kleinen Dateien abspeichern, kann bei SELinux eine Rolle spielen
  * compression=on compression=lz4 immer anschalten, selbst wenn die Dateien schlecht komprimierbar sind, werden immer noch die leeren Bereich komprimiert
  * atime=off atime (Zugriffszeiten) nicht speichern (oft uninteressant)
  * recordsize=64k (große Dateien sequentiell gelesen: 1M)

  * Lesecache
    * ARC Cache (RAM): Größe einstellbar oder auf metadaten begrenzbar
    * L2ARC Cache (SSD, NVMe): 
  * SLOG: Schreibcache (nur wenn viel wenn sync-geschrieben wird) - muss als mirror aufgebaut werden, sonst Datenverlust!

=== Deduplication ===

:!: Deduplication kostet Arbeitsspeicher und hat Auzswirkungen auf die Leistung, siehe [[https://de.wikibooks.org/wiki/ZFS_auf_Linux/_Deduplizierung|ZFS auf Linux/ Deduplizierung]] und [[https://constantin.glez.de/2011/07/27/zfs-to-dedupe-or-not-dedupe/|ZFS: To Dedupe or not to Dedupe...]].

  * abfragen der property "dedup": ''zfs get dedup $Pool''
  * setzen: ''zfs set dedup=on $Pool''


=== ZFS Snapshots ===

  * **Snapshot erstellen** ( meinPool/FS1 Name des snapshots "snapshot1"): <code bash>zfs snapshot -r meinPool/FS1@snapshot1</code>
  * **Snapshots auflisten**:
    * ...alle:<code bash>zfs list -t snapshot</code>
    * Standardmäßig mit auflisten: <code bash>zpool set listsnapshots=on rpool</code>
    * ...geordnet nach Name und Erstellungszeit: <code bash>zfs list -t snapshot -o name,creation -s creation</code>
    * ...nur für FS1 (ebenfalls geordnet nach Name und Erstellungszeit):<code bash>zfs list -t snapshot -o name,creation -s creation | grep FS1</code>
  * **rollback des Dateisystems** "meinPool/FS1" auf den Snapshot "snapshot1" (-r ist nötig wenn neuere snapshots/bookmarks existieren): <code bash>zfs rollback -r meinPool/FS1@snapshot1</code>
  * **Snapshot löschen**: <code bash>zfs destroy meinPool/FS1@snapshot1</code>
  * **Snapshots mounten** (read-only):
    * im normalen mountverzeichnis gibt es den unsichtbaren Ordner ''.zfs''. Auch wenn dieser Ordner auch mit ls -la nicht sichtbar ist, so kann dennoch hinein gewechselt werden. Dieses Verhalten ist praktisch damit tools wie rsync nicht versehentlich snapshots mitkopieren. Falls es notwendig sein sollte den ''.zfs''-Ordner immer mit anzuzeigen, kann dies über die folgende property aktiviert werden:<code bash>zfs set snapdir=visible meinPool/FS1</code>
    * legacy mount: <code bash>mount -t zfs meinPool/FS1@snapshot1 /media/snapshot</code>
  * **Snapshot diff**: Unterschiede zwischen den Snapshots anzeigen:  zfs diff meinPool/FS1@snapshot1 meinPool/FS1@snapshot2
  * **inkrementelle Snapshots** (nur Änderungen zwischen Snapshots werden übertragen), Beispiel (Quellhost mit meinPool und Snapshots snap1,2,... sollen übertragen werden zum entfernten Rechner mit SSH-Login root$Zielhost mit Pool meinBACKUPPool):
    * initial übertragen: <code bash>zfs send meinPool/FS1@snap1 | ssh root@$Zielhost zfs recv meinBACKUPPool/FS1</code>
    * Datei mit 1G erstellen: <code bash>dd if=/dev/zero bs=1024000 count=1024 >> /meinPool/FS1/test2</code>
    * Snapshot anlegen: <code bash>zfs snapshot -r meinPool/FS1@snap2</code>
    * **Snapshot inkrementell senden** (lt. ifconfig werden 1G übertragen): <code bash>zfs send -i meinPool/FS1@snap1 meinPool/FS1@snap2 | ssh root$Zielhost zfs recv meinBACKUPPool/FS1</code> siehe auch: [[https://docs.oracle.com/cd/E19253-01/819-5461/gbinw/index.html|Sending and Receiving ZFS Data]] [[https://stephanvaningen.net/node/14|How to do incremental snapshot backups with ZFS]] 

<box 100% blue round right | **Vorteile inkrementeller Snapshot gegenüber rsync**>
  * kein CPU (I/O) overhead weil beide Datenbestände nicht verglichen werden müssen
  * unabhängig vom Dateisystem weil auf Blockebene
  * Daten sind konsistent, wenn rsync ohne snapshot läuft, können Dateien während der Übertragung verändert oder gelöscht worden sein.
</box>

== zfs auto snapshot ==

[[https://github.com/zfsonlinux/zfs-auto-snapshot|zfsautosnapshot]] ([[https://packages.debian.org/bullseye/zfs-auto-snapshot|debian-package]])


== Snapshot Belegungsberechnung ==


<file>
NAME                  AVAIL USED   USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD                                                                                                                                                                                                     
meinPool              550G  27,2G        0B     24K             0B      27,2G                                                                                                                                                                                                     
meinPool/FS1          253G  27,1G     3,46G   23,7G             0B         0B                                                                                                                                                                                                     
meinPool/FS1@snap1       -   153K         -       -              -          -                                                                                                                                                                                                     
meinPool/FS1@snap2       -   150K         -       -              -          -
</file>

^ Spalte ^ Bedeutung ^
| NAME | Name des Elements |
| AVAIL | freier Speicherplatz |
| USED | Speicherplatz belegt | 
| USEDSNAP | Speicherplatz belegt vom Snapshot  |
| USEDDS | Speicherplatz belegt vom dataset selbst |
| USEDREFRESERV | belegt von einer ref-reservation |
| USEDCHILD | belegt von einem "children"  des data sets |

**Links**:
  * [[https://www.thegeekdiary.com/how-to-find-the-space-consumed-by-zfs-snapshots/|How to find the space consumed by ZFS snapshots]]
  * [[https://github.com/mafm/zfs-snapshot-disk-usage-matrix/blob/master/README.md|Useful ZFS snapshot disk space usage accounting]]
  * [[https://blogs.oracle.com/solaris/understanding-the-space-used-by-zfs-v2|Understanding the Space Used by ZFS]]
  * [[http://schalwad.blogspot.com/2015/09/understanding-how-zfs-calculates-used.html|Understanding How ZFS Calculates Used Space]]
  * [[https://docs.oracle.com/cd/E19253-01/819-5461/gazsy/index.html|ZFS Read-Only Native Properties]]
=== ZFS Clones ===

Ein ZFS clone ist eine schreibbare Kopie eines Snapshots. Der Snapshot kann so lange nicht gelöscht werden wie der Clone besteht.

<code bash>
zfs snapshot -r meinPool/FS1@snapshot1
zfs clone meinPool/FS1@snapshot1 meinPool/FS2
</code>


=== ZFS Send and Receive ===

  * Senden: <code bash>zfs send meinPool/FS1 > /backup-snap.zfs</code>
  * Empfangen: <code bash>zfs receive -F meinPool/FS1-Kopie < /backup-snap.zfs</code>

Um über Netzwerk zu senden muss die Ausgabe per pipe an ein entferntes System senden:
  * Senden:  <code bash>sudo zfs send meinPool/FS1@snap1 | ssh $remoteServer zfs recv remotePool/remoteFS1</code>

verschlüsselt Senden: hier muss lt. [[https://techgoat.net/index.php?id=175|techgoat]] wohl -w, --raw angegeben werden damit die Daten exakt so wie in der Quelle verschickt werden (Schlüssel muss nicht geladen werden, testen: funktioniert das entschlüsseln obwohl ?)

=== ZFS Ditto Blocks ===

<code bash>zfs set copies=3 meinPool/FS1</code>


==== ZFS Volumes ====

Statt Dateisystemen kann ZFS auch volumes anlegen, das sind Blockgeräte die z.B. in virtualisierten Gästen verwendet werden können.

Aufruf ist wie beim Dateisystem anlegen, nur **zusätzlich mit dem Parameter** "-V $Größe":

<code bash>zfs create -V 2G tank/volumes/v2</code>

Das Volume steht dann unter ''/dev/tank/volumes/v2'' zur Verfügung und kann normal wie blockdevice verwendet werden.

Bei kleiner blocksize kommt es zu ineffizienten Nutzung von Speicherplatz, so dass ''-b 128k'' empfehlenswert sein kann.

=== Volumes vergrößern ===

Aktuelle Größe anzeigen: ''zfs get volsize tank/volumes/v2''

Vergrößern: ''zfs set volsize=3GB tank/volumes/v2''

=== Volume Reservierung aufheben ===

Mit refreservation kann die feste Reservierung des Speicherplatzes aufgehoben ( refreservation=none ) oder auf einen niederigeren Wert als der Maximalwert gesetzt werden.

''zfs set refreservation=none tank/volumes/v2''

:!: Wenn der Pool vollläuft wird das Dateisystem im Volume korrupt werden.

==== Monitoring ====

**[[http://manpages.ubuntu.com/manpages/xenial/man8/zed.8.html|ZED - ZFS Event Daemon]]**
<code bash>zpool events</code>

Config: ''/etc/zfs/zed.d/zed.rc''

<file>
##
# zed.rc
#
# This file should be owned by root and permissioned 0600.
##

##
# Absolute path to the debug output file.
#
#ZED_DEBUG_LOG="/tmp/zed.debug.log"

##
# Email address of the zpool administrator for receipt of notifications;
#   multiple addresses can be specified if they are delimited by whitespace.
# Email will only be sent if ZED_EMAIL_ADDR is defined.
# Enabled by default; comment to disable.
#
ZED_EMAIL_ADDR="root"

##
# Name or path of executable responsible for sending notifications via email;
#   the mail program must be capable of reading a message body from stdin.
# Email will only be sent if ZED_EMAIL_ADDR is defined.
#
#ZED_EMAIL_PROG="mail"

##
# Command-line options for ZED_EMAIL_PROG.
# The string @ADDRESS@ will be replaced with the recipient email address(es).
# The string @SUBJECT@ will be replaced with the notification subject;
#   this should be protected with quotes to prevent word-splitting.
# Email will only be sent if ZED_EMAIL_ADDR is defined.
# If @SUBJECT@ was omited here, a "Subject: ..." header will be added to notification
#
#ZED_EMAIL_OPTS="-s '@SUBJECT@' @ADDRESS@"

##
# Default directory for zed lock files.
#
#ZED_LOCKDIR="/var/lock"

##
# Minimum number of seconds between notifications for a similar event.
#
ZED_NOTIFY_INTERVAL_SECS=3600

##
# Notification verbosity.
#   If set to 0, suppress notification if the pool is healthy.
#   If set to 1, send notification regardless of pool health.
#
#ZED_NOTIFY_VERBOSE=0

##
# Send notifications for 'ereport.fs.zfs.data' events.
# Disabled by default, any non-empty value will enable the feature.
#
#ZED_NOTIFY_DATA=

##
# Pushbullet access token.
# This grants full access to your account -- protect it accordingly!
#   <https://www.pushbullet.com/get-started>
#   <https://www.pushbullet.com/account>
# Disabled by default; uncomment to enable.
#
#ZED_PUSHBULLET_ACCESS_TOKEN=""

##
# Pushbullet channel tag for push notification feeds that can be subscribed to.
#   <https://www.pushbullet.com/my-channel>
# If not defined, push notifications will instead be sent to all devices
#   associated with the account specified by the access token.
# Disabled by default; uncomment to enable.
#
#ZED_PUSHBULLET_CHANNEL_TAG=""

##
# Slack Webhook URL.
# This allows posting to the given channel and includes an access token.
#   <https://api.slack.com/incoming-webhooks>
# Disabled by default; uncomment to enable.
#
#ZED_SLACK_WEBHOOK_URL=""

##
# Pushover token.
# This defines the application from which the notification will be sent.
#   <https://pushover.net/api#registration>
# Disabled by default; uncomment to enable.
# ZED_PUSHOVER_USER, below, must also be configured.
#
#ZED_PUSHOVER_TOKEN=""

##
# Pushover user key.
# This defines which user or group will receive Pushover notifications.
#  <https://pushover.net/api#identifiers>
# Disabled by default; uncomment to enable.
# ZED_PUSHOVER_TOKEN, above, must also be configured.
#ZED_PUSHOVER_USER=""

##
# Default directory for zed state files.
#
#ZED_RUNDIR="/var/run"

##
# Turn on/off enclosure LEDs when drives get DEGRADED/FAULTED.  This works for
# device mapper and multipath devices as well.  This works with JBOD enclosures
# and NVMe PCI drives (assuming they're supported by Linux in sysfs).
#
ZED_USE_ENCLOSURE_LEDS=1

##
# Run a scrub after every resilver
# Disabled by default, 1 to enable and 0 to disable.
#ZED_SCRUB_AFTER_RESILVER=0

##
# The syslog priority (e.g., specified as a "facility.level" pair).
#
#ZED_SYSLOG_PRIORITY="daemon.notice"

##
# The syslog tag for marking zed events.
#
#ZED_SYSLOG_TAG="zed"

##
# Which set of event subclasses to log
# By default, events from all subclasses are logged.
# If ZED_SYSLOG_SUBCLASS_INCLUDE is set, only subclasses
# matching the pattern are logged. Use the pipe symbol (|)
# or shell wildcards (*, ?) to match multiple subclasses.
# Otherwise, if ZED_SYSLOG_SUBCLASS_EXCLUDE is set, the
# matching subclasses are excluded from logging.
#ZED_SYSLOG_SUBCLASS_INCLUDE="checksum|scrub_*|vdev.*"
ZED_SYSLOG_SUBCLASS_EXCLUDE="history_event"

##
# Use GUIDs instead of names when logging pool and vdevs
# Disabled by default, 1 to enable and 0 to disable.
#ZED_SYSLOG_DISPLAY_GUIDS=1
</file>


==== Replikation mit inkrementellen snapshots ====

  * [[https://github.com/jimsalterjrs/sanoid|sanoid]]
  * [[https://packages.debian.org/bullseye/zfs-auto-snapshot|zfs-auto-snapshot]]

Beispielskript: inkrementelle Snapshots über Netzwerk
<code bash>
#!/bin/sh

# Transfers zfs snapshots to a remote system
# add -v to zfs send if you want transfer-information

# set -x

# put SSH-user in ssh-config
snapshot_name="backup-latest"
source_reference_snapshot="backup-baseline"
source_server="server1"
source_fs="tank/pool1"
dest_server="backupserver"
dest_fs="tank_backup/pool1_backup"

# if you want to run preparations (only once needed!) -> set to "yes" (no backup is made)
first_run=no

# usually nothing needs to be changed after this point
source_reference_dataset="$source_fs@$source_reference_snapshot"
source_dataset="$source_fs@$snapshot_name"
dest_dataset="$dest_fs@$snapshot_name"


# === preparations START ===
if [ first_run == "yes" ]
then
  # create pool (need to edit level and disks here):
  # ssh $dest_server "zpool create $pool_name mirror|raidz|whatever /dev/DISK1 /dev/DISK2"

  # and a fs as target:
  ssh $dest_server "zfs create -o acltype=posixacl -o compression=on -o dnodesize=auto -o normalization=formD -o relatime=on $source_fs"

  #  the reference-snapshot needs to be created:
  ssh $source_server "zfs snapshot -r $source_reference_dataset"
  # and on the target system, there needs to be a pool...
  exit
fi
# === preparations END ===


# destroy old snapshots
ssh $source_server "zfs destroy $source_dataset"
ssh $dest_server "zfs destroy $dest_dataset"

# make new snapshot
ssh $source_server "zfs snapshot -r $source_dataset"

# SSH (encrypted)
# using the agent here (-A), since i need to login on remote host. not needed if source-Host has a access to backup-system via pubkey.
ssh -A $source_server "zfs send -i $source_reference_dataset $source_dataset | ssh $dest_server zfs recv -F $dest_dataset"

# netcat (unencrypted, might be faster on slow machines)
#ssh dest_server "nc -w 300 -l -p 2020 | zfs recv -F $dest_dataset" &
#ssh root@source_server  "zfs send -i $source_reference_dataset $source_dataset | nc -w 20 $dest_server 2020"
</code>