lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu,  1 Nov 2018 15:52:19 -0700
From:   Eric Biggers <>
        "Theodore Y . Ts'o" <>,
        Jaegeuk Kim <>,
        Victor Hsieh <>,
        Chandan Rajendra <>
Subject: [PATCH v2 01/12] fs-verity: add a documentation file

From: Eric Biggers <>

Add a documentation file for fs-verity, covering:

- Introduction
- Use cases
- Metadata format
    - Merkle tree
    - fs-verity descriptor
    - fsveritysetup format
- Filesystem support
    - ext4
    - f2fs
- User API
- Access semantics
- In-kernel policies
    - Built-in signature verification
- Implementation details
    - I/O path design
- Userspace utility
- Tests

Signed-off-by: Eric Biggers <>
 Documentation/filesystems/fsverity.rst | 583 +++++++++++++++++++++++++
 Documentation/filesystems/index.rst    |  11 +
 2 files changed, 594 insertions(+)
 create mode 100644 Documentation/filesystems/fsverity.rst

diff --git a/Documentation/filesystems/fsverity.rst b/Documentation/filesystems/fsverity.rst
new file mode 100644
index 0000000000000..d633fc0567bd5
--- /dev/null
+++ b/Documentation/filesystems/fsverity.rst
@@ -0,0 +1,583 @@
+Read-only file-based authenticity protection (fs-verity)
+fs-verity (``fs/verity/``) is a library that filesystems can hook into
+to support transparent integrity and authenticity protection of
+read-only files.  Currently, it is supported by the ext4 and f2fs
+filesystems.  Similar to fscrypt, not too much filesystem-specific
+code is needed to support fs-verity.
+fs-verity is similar to `dm-verity
+but works on files rather than block devices.  On supported
+filesystems, userspace can append a Merkle tree (hash tree) to a file,
+then use an ioctl to enable fs-verity on it.  Then, the filesystem
+transparently verifies all data read from the file against the Merkle
+tree; reads that fail verification will fail.  The filesystem also
+hides or moves the Merkle tree, and forbids changes to the file's
+contents via the syscall interface.
+Essentially, fs-verity is a way of efficiently hashing a file, subject
+to the caveat that the enforcement of that hash happens on-demand as
+reads occur.  The file hash that fs-verity computes is called the
+"file measurement"; this is the hash of the Merkle tree's root hash
+and certain other fs-verity metadata, and it takes constant time to
+compute regardless of the size of the file.  Note: the value of the
+fs-verity file measurement will differ from a regular hash of the
+file, even when they use the same hash algorithm, e.g. SHA-256;
+however, they achieve the same purpose.
+Use cases
+In general, fs-verity does not replace or obsolete dm-verity.
+dm-verity should still be used when it is possible to authenticate the
+full block device, i.e. when the device is read-only.  fs-verity is
+intended for use on read-write filesystems where dm-verity cannot be
+fs-verity is most useful for hashing large files where only a small
+portion may be accessed.  For example, it's useful on Android
+application package (APK) files, which typically contain many
+translations, classes, and other resources that are infrequently or
+even never accessed on a particular device.  It would be wasteful to
+hash the entire file before starting the application.
+Unlike an ahead-of-time hash, fs-verity also re-verifies data each
+time it's paged in, which ensures the file measurement remains
+correctly enforced even if the file contents are modified from
+underneath the filesystem, e.g. by malicious disk firmware.
+fs-verity can support various use cases, such as:
+- Integrity protection (detecting accidental corruption)
+- Auditing (logging file hashes before use)
+- Authenticity protection (detecting malicious modification)
+Note that the latter two are not features of fs-verity per se, but
+rather fs-verity is a tool for supporting these use cases.  For
+example, for the overall system to actually provide authenticity
+protection, the file measurement itself must still be authenticated,
+e.g. by comparing it with a known good value or by verifying a digital
+signature of it.
+This can be userspace driven, in which case fs-verity will only be
+used (essentially) as a fast way of hashing the file contents, via the
+`FS_IOC_MEASURE_VERITY`_ ioctl.  For authenticity protection, trusted
+userspace code [#]_ still must verify the relevant portions of the
+untrusted filesystem state before it is used in a security-critical
+way, such as executing code from it.
+For example, the trusted userspace code might verify that the file
+located at ``/foo/bar/baz`` has an fs-verity file measurement of
+or that all security-critical files (e.g. executable code) have stored
+alongside them a valid digital signature (signed by a known, trusted
+public key) of their fs-verity file measurement, potentially combined
+with other important file metadata such as path and SELinux label.
+However, for ease of use, a subset of this policy logic (but not all
+of it!) is also supported in the kernel by the `Built-in signature
+verification`_ mechanism.  Support for fs-verity file hashes in IMA
+(Integrity Measurement Architecture) policies is also planned.
+.. [#] For example, on Android, "trusted userspace code" would be code
+       running from the system or vendor partitions, which are
+       read-only partitions authenticated by dm-verity tied into
+       Verified Boot, as opposed to the userdata partition which is
+       read-write.
+Metadata format
+Merkle tree
+fs-verity uses the same Merkle tree (hash tree) format as dm-verity;
+the only difference is that fs-verity's Merkle tree is built over the
+contents of a regular file rather than a block device.
+Briefly, the file contents is divided into blocks, where the blocksize
+is configurable but usually 4096 bytes.  The last block is zero-padded
+if needed.  Each block is then hashed, producing the first level of
+hashes.  Then, the hashes in this first level are grouped into
+'blocksize'-byte blocks (zero-padding the ends as needed) and these
+blocks are hashed, producing the second level of hashes.  This
+proceeds up the tree until only a single block remains.  The hash of
+this block is called the "Merkle tree root hash".  Note: if the entire
+file contents fit in one block, then there are no hash blocks and the
+"Merkle tree root hash" is simply the hash of the data block.
+The blocks of the Merkle tree are stored on-disk starting from the
+root level and then proceeding to store each level down to the "first"
+(the level that gives the hashes of the data blocks).
+The hash algorithm is configurable.  The default is SHA-256, but
+SHA-512 is also supported.  The non-cryptographic checksum CRC-32C is
+also supported for integrity-only use cases such as detecting bit
+errors in read-only backup files.  A non-cryptographic checksum must
+not be used if authenticity protection is desired.
+In the recommended configuration of SHA-256 and 4K blocks, 128 hash
+values fit in each block.  Thus, each level of the hash tree is 128
+times smaller than the previous, and for large files the Merkle tree's
+size converges to approximately 1/129 of the original file size.
+However, for small files, the padding to a block boundary is
+significant, making the space overhead proportionally more.
+fs-verity descriptor
+For each file, fs-verity also uses an additional on-disk metadata
+structure called the *fs-verity descriptor*.  This contains the
+properties of the Merkle tree and some other information.  It begins
+with a header in the following format::
+    struct fsverity_descriptor {
+            __u8 magic[8];
+            __u8 major_version;
+            __u8 minor_version;
+            __u8 log_data_blocksize;
+            __u8 log_tree_blocksize;
+            __le16 data_algorithm;
+            __le16 tree_algorithm;
+            __le32 flags;
+            __le32 reserved1;
+            __le64 orig_file_size;
+            __le16 auth_ext_count;
+            __u8 reserved2[30];
+    };
+This structure contains:
+- ``magic`` is the ASCII bytes "FSVerity".
+- ``major_version`` is 1.
+- ``minor_version`` is 0.
+- ``log_data_blocksize`` and ``log_tree_blocksize`` are the log base 2
+  of the block size (in bytes) of data blocks and Merkle tree blocks,
+  respectively.  Currently, in both cases the kernel only supports
+  page-sized blocks, i.e. on most architectures, 4096-byte blocks.
+  Thus, usually both of these fields must be 12.
+- ``data_algorithm`` and ``tree_algorithm`` are the hash algorithms
+  used to hash data blocks and Merkle tree blocks, respectively.
+  Currently the kernel requires these to have the same value.  The
+  recommended value is FS_VERITY_ALG_SHA256.  See
+  ``include/uapi/linux/fsverity.h`` for the list of allowed values.
+- ``orig_file_size`` is the original size of the file in bytes.  This
+  means the size excluding the verity metadata and padding.
+- ``auth_ext_count`` is the number of authenticated extensions that
+  follow.
+- All other fields are zeroed.
+Following the ``struct fsverity_descriptor``, there is a list of
+"authenticated extensions".  Each extension is a variable-length
+structure that begins with the following header::
+    struct fsverity_extension {
+            __le32 length;
+            __le16 type;
+            __le16 reserved;
+    };
+This structure contains:
+- ``length`` is the length of this extension in bytes, including the
+  header.
+- ``type`` is the extension number.  See
+  ``include/uapi/linux/fsverity.h`` for the allowed values.
+- ``reserved`` must be 0.
+Each extension begins on an 8-byte aligned boundary.  When an
+extension's length is not a multiple of 8, it must be zero-padded to
+the next 8-byte boundary, even if it is the last extension.  This zero
+padding is not counted in the ``length`` field.
+This first list of extensions is "authenticated", meaning that they
+are included in the file measurement.  Currently, the following
+authenticated extensions are supported.  Except where otherwise
+indicated, extensions are optional and cannot be given multiple times:
+- FS_VERITY_EXT_ROOT_HASH:  This is mandatory.  It gives the root hash
+  of the Merkle tree, as a byte array.
+- FS_VERITY_EXT_SALT: A salt to salt the hashes with, given as a byte
+  array.  The salt is prepended to every block that is hashed.  Any
+  length salt is supported.  Note that using a unique salt for every
+  file should make it more difficult for fs-verity to be attacked
+  across many files.  However, in principle this is unnecessary since
+  simply choosing a strong cryptographic hash algorithm such as
+  SHA-256 or SHA-512 should be sufficient.
+Following the authenticated extensions, there is a list of
+unauthenticated extensions.  These are *not* included in the file
+measurement.  This list begins with a header::
+        __le16 unauth_ext_count;
+        __le16 padding[3];
+``unauth_ext_count`` is the number of unauthenticated extensions.
+This may be 0.
+Like authenticated extensions, each unauthenticated extension begins
+with the header ``struct fsverity_extension`` from above.
+The following types of unauthenticated extensions are supported:
+- FS_VERITY_EXT_PKCS7_SIGNATURE.  This is a DER-encoded PKCS#7 message
+  containing the signed file measurement.  See `Built-in signature
+  verification`_ for details.
+fsveritysetup format
+When enabling fs-verity on a file via the `FS_IOC_ENABLE_VERITY`_
+ioctl, the kernel requires that the verity metadata has been appended
+to the file contents.  Specifically, the file must be arranged as:
+#. Original file contents
+#. Zero-padding to next block boundary
+#. `Merkle tree`_
+#. `fs-verity descriptor`_
+#. fs-verity footer
+We call this file format the "fsveritysetup format".  It is not
+necessarily the on-disk format actually used by the filesystem, since
+the filesystem is free to move things around during the ioctl.
+However, the easiest way to implement fs-verity is to just keep this
+arrangement in-place, as ext4 and f2fs do; see `Filesystem support`_.
+Note that "block" here means the fs-verity block size, which is not
+necessarily the same as the filesystem's block size.  For example, on
+ext4, fs-verity can use 4K blocks on top of a filesystem formatted to
+use a 1K block size.
+The fs-verity footer is a structure of the following format::
+    struct fsverity_footer {
+            __le32 desc_reverse_offset;
+            __u8 magic[8];
+    };
+``desc_reverse_offset`` is the distance in bytes from the end of the
+fs-verity footer to the beginning of the fs-verity descriptor; this
+allows software to find the fs-verity descriptor.  ``magic`` is the
+ASCII bytes "FSVerity"; this allows software to quickly identify a
+file as being in the "fsveritysetup" format as well as find the
+fs-verity footer if zeroes have been appended.
+The kernel cannot handle fs-verity footers that cross a page boundary.
+Padding must be prepended as needed to meet this constaint.
+Filesystem support
+ext4 supports fs-verity since kernel version TODO.
+CONFIG_EXT4_FS_VERITY must be enabled in the kernel config.  Also, the
+filesystem must have been formatted with ``-O verity``, or had
+``tune2fs -O verity`` run on it.  These require e2fsprogs v1.44.4-2 or
+later.  This e2fsprogs version is also required for e2fsck to
+understand the verity feature.  Since "verity" is an RO_COMPAT
+feature, once enabled earlier kernels will be unable to mount the
+filesystem for writing, and earlier versions of e2fsck will be unable
+to check the filesystem.
+ext4 only allows fs-verity on extent-based files.
+The EXT4_VERITY_FL flag in the inode is used to indicate that the
+inode uses fs-verity.  This bit cannot be set directly; it can only be
+set indirectly via `FS_IOC_ENABLE_VERITY`_.
+When enabling verity on an inode, ext4 leaves the verity metadata
+in-place in the `fsveritysetup format`_.  However, it changes the
+on-disk i_size to the original file size, which allows the verity
+feature to be RO_COMPAT rather than INCOMPAT.  Later, the fs-verity
+footer is found by scanning backwards from the end of the last extent
+rather than from i_size.
+f2fs supports fs-verity since kernel version TODO.
+CONFIG_F2FS_FS_VERITY must be enabled in the kernel config.  Also, the
+filesystem must have been formatted with ``-O verity``.  This requires
+f2fs-tools v1.11.0 or later.
+The FADVISE_VERITY_BIT flag in the inode is used to indicate that the
+inode uses fs-verity.  This bit cannot be set directly; it can only be
+set indirectly via `FS_IOC_ENABLE_VERITY`_.
+When enabling verity on an inode, f2fs leaves the verity metadata
+in-place in the `fsveritysetup format`_.  It leaves the on-disk i_size
+as the full file size; however, the in-memory i_size is overridden
+with the original size.
+User API
+The FS_IOC_ENABLE_VERITY ioctl enables fs-verity on a regular file.
+Userspace must have already appended verity metadata to the file,
+using the file format described in `fsveritysetup format`_.
+Additionally, the filesystem must support fs-verity.
+The argument parameter for this ioctl is reserved and must be NULL.
+This ioctl checks for write access to the inode; no capability is
+required.  However, it must be executed on an O_RDONLY file
+descriptor, and no processes may have the file open for writing.
+(This is necessary to prevent various race conditions.)
+On success, this ioctl returns 0, and the file becomes a verity file.
+This means that:
+- The filesystem marks the file as a verity file both in-memory and
+  on-disk, e.g. by setting a bit in the inode.
+- All later reads from the file are verified against the Merkle tree.
+- The verity metadata at the end of the file is hidden or moved.
+- Opening the file for writing or truncating it is no longer allowed.
+- There is no way to disable verity on the file, other than by
+  deleting it and replacing it with a copy.
+If this ioctl fails, then no changes are made to the file.  The
+reasons it might fail include:
+- ``EACCES``: the process does not have write access to the file
+- ``EBADMSG``: the file's fs-verity metadata is invalid
+- ``EEXIST``: the file already has fs-verity enabled
+- ``EINVAL``: a value was specified for the reserved argument
+  parameter, or the file descriptor refers to neither a regular file
+  nor a directory
+- ``EIO``: an I/O error occurred
+- ``EISDIR``: the file descriptor refers to a directory, not a regular
+  file
+- ``ENOTTY``: this type of filesystem does not implement fs-verity
+- ``EOPNOTSUPP``: the kernel was not configured with fs-verity support
+  for this filesystem, or the filesystem superblock has not had the
+  'verity' feature enabled on it.  (See `Filesystem support`_.)
+- ``EPERM``: the file is append-only
+- ``EROFS``: the filesystem is read-only
+- ``ETXTBSY``: the file is open for writing.  Note that this can be
+  the caller's file descriptor, or another open file descriptor, or
+  the file reference held by a writable memory map.
+The FS_IOC_MEASURE_VERITY ioctl retrieves the fs-verity measurement of
+a regular file.  This is a digest that cryptographically summarizes
+the file contents that are being enforced on reads.  The file must
+have fs-verity enabled.
+This ioctl takes in a pointer to a variable-length structure::
+    struct fsverity_digest {
+            __u16 digest_algorithm;
+            __u16 digest_size; /* input/output */
+            __u8 digest[];
+    };
+``digest_size`` is an input/output field.  On input, it must be
+initialized to the number of bytes allocated for the variable-length
+``digest`` field.
+On success, 0 is returned and the kernel fills in the structure as
+- ``digest_algorithm`` will be the hash algorithm used for the file
+  measurement.  It will match the algorithm used in the Merkle tree,
+  e.g. FS_VERITY_ALG_SHA256.  See ``include/uapi/linux/fsverity.h``
+  for the list of possible values.
+- ``digest_size`` will be the size of the digest in bytes, e.g. 32
+  for SHA-256.  (This can be redundant with ``digest_algorithm``.)
+- ``digest`` will be the actual bytes of the digest.
+This ioctl is guaranteed to be very fast.  Due to fs-verity's use of a
+Merkle tree, its running time is independent of the file size.
+This ioctl can fail with the following errors:
+- ``EFAULT``: invalid buffer was specified
+- ``ENODATA``: the file is not a verity file
+- ``ENOTTY``: this type of filesystem does not implement fs-verity
+- ``EOPNOTSUPP``: the kernel was not configured with fs-verity support
+  for this filesystem, or the filesystem superblock has not had the
+  'verity' feature enabled on it.  (See `Filesystem support`_.)
+- ``EOVERFLOW``: the file measurement is longer than the specified
+  ``digest_size`` bytes.  Try providing a larger buffer.
+Access semantics
+fs-verity only implements reads, not writes.  Therefore, after it is
+enabled on a given file, regardless of the mode bits filesystems will
+forbid opening the file for writing as well as changing the size of
+the file via truncate().  The error code received for this is EPERM.
+However, fs-verity does not measure metadata such as owner, mode,
+timestamps, and xattrs.  Therefore, changes to these are still
+For read-only access, fs-verity is intended to be transparent; no
+changes to userspace applications should be needed.  However, astute
+users may notice some slight differences in behavior:
+- Direct I/O is not supported on verity files.  Attempts to use direct
+  I/O on such files will fall back to buffered I/O.
+- DAX (Direct Access) is not supported on verity files.
+Note: read-only mmaps are supported, as is combining fs-verity and
+Verity files can be sparse; holes are still verified.
+In-kernel policies
+Built-in signature verification
+With CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y, fs-verity supports putting
+a portion of an authentication policy (see `Use cases`_) in the
+kernel.  Specifically, it adds support for:
+1. At fs-verity module initialization time, a keyring ".fs-verity" is
+   created.  The root user can add trusted X.509 certificates to this
+   keyring using the add_key() system call, then (when done)
+   optionally use keyctl_restrict_keyring() to prevent additional
+   certificates from being added.
+2. When a PKCS7_SIGNATURE extension containing a signed file
+   measurement is found in a file's verity metadata, the kernel will
+   verify this signature against the certificates in the ".fs-verity"
+   keyring, and verify that it matches the actual file measurement.
+   The extension must contain the PKCS#7 formatted signature in DER
+   format, where the signed data is the file measurement as a ``struct
+   fsverity_digest`` as described for `FS_IOC_MEASURE_VERITY`_ except
+   that all fields must be little-endian rather than native endian.
+3. A new sysctl "fs.verity.require_signatures" is made available.
+   When set to 1, the kernel requires that all fs-verity files have a
+   correctly signed file measurement as described in (2).
+This is meant as a relatively simple mechanism that can be used to
+provide some level of authenticity protection for fs-verity files, as
+an alternative to doing the signature verification in userspace or
+using IMA-appraisal.  However, with this mechanism, userspace programs
+still need to check that the fs-verity bit is set, and there is no
+protection against fs-verity files being swapped around.
+Implementation details
+I/O path design
+To support fs-verity, the filesystem's ``->readpage()`` and
+``->readpages()`` methods are modified to verify the data pages before
+they are marked Uptodate.  Merely hooking ``->read_iter()`` would be
+insufficient, since ``->read_iter()`` is not used for memory maps.
+fs-verity exposes functions to verify data:
+- ``fsverity_verify_page()`` verifies an individual page
+- ``fsverity_verify_bio()`` verifies all pages in a bio
+Currently, fs-verity only supports the case where data blocks, hash
+blocks, and pages all have the same size (usually 4096 bytes).
+Filesystems that use bios call ``fsverity_verify_bio()`` after each
+read bio completes.  To do this while also continuing to support
+encryption (fscrypt), filesystems allocate a "post-read context" for
+each bio and store it in ``->bi_private``::
+    struct bio_post_read_ctx {
+           struct bio *bio;
+           struct work_struct work;
+           unsigned int cur_step;
+           unsigned int enabled_steps;
+    };
+``enabled_steps`` is a bitmask of the post-read steps that are
+enabled.  The available steps are STEP_DECRYPT and STEP_VERITY.  These
+steps can be enabled together, independently, or not at all.  If both
+are enabled, then decryption is done first.  Since bio completion
+callbacks cannot sleep, each post-read step is done by enqueueing the
+struct on a workqueue, and then actual work happens in the work item.
+Different workqueues are needed for encryption and verity because
+verity work may require decrypting metadata pages from the file.
+The bio completion callback sets PG_error for each page if either
+decryption or verification failed.  Finally, after the work item(s)
+complete, pages without PG_error are set Uptodate, and all pages are
+A data page being set Uptodate and unlocked implies that it has been
+verified, and such pages become visible to userspace via read(),
+mmap(), etc.  Otherwise, the page is left in the PG_error && !Uptodate
+state which results in the read() family of syscalls failing with EIO,
+and accesses to the data via a memory map raising SIGBUS.  Note that
+even if some pages in a file fail verification, pages that pass
+verification can still be read.
+To verify a data page, fs-verity reads the required hash page(s)
+starting at the leaves and ascending to the root; then, the pages are
+verified descending from the root.  Filesystems that store the verity
+metadata past EOF implement reading hash pages using their usual
+``->readpage{,s}()`` methods, with modifications:
+- Verification is skipped for pages beyond ``i_size``.
+- When checking whether a page is in the implicit hole beyond EOF,
+  the full file size (including the verity metadata) is used rather
+  than the original data i_size.  Note that this does not allow
+  userspace to read or mmap the verity metadata.
+The hash pages are also cached in the inode's address_space, similar
+to data pages.  However, to simplify the verification logic, a hash
+page being Uptodate doesn't imply that it has been verified; instead,
+the PG_checked bit is used for this purpose.  Hash pages aren't locked
+while being verified, so multiple threads may race to set PG_checked,
+but this doesn't matter.
+Thus, when ascending the tree reading hash pages, fs-verity can stop
+as soon as it finds an already-checked hash page.  This optimization,
+which is also used by dm-verity, results in excellent sequential read
+performance since usually the deepest needed hash page will already be
+cached and checked.  However, random reads perform worse.
+Files may contain holes.  Normally, the filesystem's
+``->readpage{,s}()`` methods will zero pages in holes and set them
+Uptodate without issuing any bios.  To prevent this from being abused
+to bypass fs-verity, filesystems call ``fsverity_verify_page()`` on
+hole pages.
+Like fscrypt, filesystems also disable direct I/O on verity files,
+since direct I/O bypasses the normal read paths.
+Userspace utility
+This document focuses on the kernel, but a userspace utility for
+fs-verity can be found at:
+See the file in the fsverity-utils source tree for details,
+including examples of setting up fs-verity protected files.
+To test fs-verity, use xfstests.  For example, using `kvm-xfstests
+    kvm-xfstests -c ext4,f2fs -g verity
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 46d1b1be3a510..818390c32be63 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -359,3 +359,14 @@ encryption of files and directories.
     :maxdepth: 2
+Verity API
+A library which filesystems can hook into to support transparent
+authentication of read-only files.
+.. toctree::
+    :maxdepth: 2
+    fsverity

Powered by blists - more mailing lists