[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200922175001.GB7948@magnolia>
Date: Tue, 22 Sep 2020 10:50:01 -0700
From: "Darrick J. Wong" <darrick.wong@...cle.com>
To: Harshad Shirwadkar <harshadshirwadkar@...il.com>
Cc: linux-ext4@...r.kernel.org, tytso@....edu
Subject: Re: [PATCH v9 1/9] doc: update ext4 and journalling docs to include
fast commit feature
On Fri, Sep 18, 2020 at 05:54:43PM -0700, Harshad Shirwadkar wrote:
> This patch adds necessary documentation for fast commits.
>
> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@...il.com>
> ---
> Documentation/filesystems/ext4/journal.rst | 66 ++++++++++++++++++++++
> Documentation/filesystems/journalling.rst | 28 +++++++++
> 2 files changed, 94 insertions(+)
>
> diff --git a/Documentation/filesystems/ext4/journal.rst b/Documentation/filesystems/ext4/journal.rst
> index ea613ee701f5..c2e4d010a201 100644
> --- a/Documentation/filesystems/ext4/journal.rst
> +++ b/Documentation/filesystems/ext4/journal.rst
> @@ -28,6 +28,17 @@ metadata are written to disk through the journal. This is slower but
> safest. If ``data=writeback``, dirty data blocks are not flushed to the
> disk before the metadata are written to disk through the journal.
>
> +In case of ``data=ordered`` mode, Ext4 also supports fast commits which
> +help reduce commit latency significantly. The default ``data=ordered``
> +mode works by logging metadata blocks tothe journal. In fast commit
"to the journal"
> +mode, Ext4 only stores the minimal delta needed to recreate the
> +affected metadata in fast commit space that is shared with JBD2.
> +Once the fast commit area fills in or if fast commit is not possible
> +or if JBD2 commit timer goes off, Ext4 performs a traditional full commit.
> +A full commit invalidates all the fast commits that happened before
> +it and thus it makes the fast commit area empty for further fast
> +commits. This feature needs to be enabled at compile time.
And mkfs time too, I would hope?
> +
> The journal inode is typically inode 8. The first 68 bytes of the
> journal inode are replicated in the ext4 superblock. The journal itself
> is normal (but hidden) file within the filesystem. The file usually
> @@ -609,3 +620,58 @@ bytes long (but uses a full block):
> - h\_commit\_nsec
> - Nanoseconds component of the above timestamp.
>
> +Fast commits
> +~~~~~~~~~~~~
> +
> +Fast commit area is organized as a log of tag tag length values. Each TLV has
> +a ``struct ext4_fc_tl`` in the beginning which stores the tag and the length
> +of the entire field. It is followed by variable length tag specific value.
"The fast commit area is organized as a log of tagged variable-length
values. Each value begins with a ``struct ext4_fc_tl`` tag that
identifies the type of the value and its length, and is followed by the
value itself." ?
I would've called that struct "ext4_fc_tag" or something, since "tl"
isn't really a word... ah well.
> +Here is the list of supported tags and their meanings:
> +
> +.. list-table::
> + :widths: 8 20 20 32
> + :header-rows: 1
> +
> + * - Tag
> + - Meaning
> + - Value struct
> + - Description
> + * - EXT4_FC_TAG_HEAD
> + - Fast commit area header
> + - ``struct ext4_fc_head``
> + - Stores the TID of the transaction after which these fast commits should
> + be applied.
So I guess log recovery is supposed to apply the transaction TID, then
apply these fast commits, and then move on to the next transaction?
--D
> + * - EXT4_FC_TAG_ADD_RANGE
> + - Add extent to inode
> + - ``struct ext4_fc_add_range``
> + - Stores the inode number and extent to be added in this inode
> + * - EXT4_FC_TAG_DEL_RANGE
> + - Remove logical offsets to inode
> + - ``struct ext4_fc_del_range``
> + - Stores the inode number and the logical offset range that needs to be
> + removed
> + * - EXT4_FC_TAG_CREAT
> + - Create directory entry for a newly created file
> + - ``struct ext4_fc_dentry_info``
> + - Stores the parent inode numer, inode number and directory entry of the
> + newly created file
> + * - EXT4_FC_TAG_LINK
> + - Link a directory entry to an inode
> + - ``struct ext4_fc_dentry_info``
> + - Stores the parent inode numer, inode number and directory entry
> + * - EXT4_FC_TAG_UNLINK
> + - Unink a directory entry of an inode
> + - ``struct ext4_fc_dentry_info``
> + - Stores the parent inode numer, inode number and directory entry
> +
> + * - EXT4_FC_TAG_PAD
> + - Padding (unused area)
> + - None
> + - Unused bytes in the fast commit area.
> +
> + * - EXT4_FC_TAG_TAIL
> + - Mark the end of a fast commit
> + - ``struct ext4_fc_tail``
> + - Stores the TID of the commit, CRC of the fast commit of which this tag
> + represents the end of
> +
> diff --git a/Documentation/filesystems/journalling.rst b/Documentation/filesystems/journalling.rst
> index 58ce6b395206..a9817220dc9b 100644
> --- a/Documentation/filesystems/journalling.rst
> +++ b/Documentation/filesystems/journalling.rst
> @@ -132,6 +132,34 @@ The opportunities for abuse and DOS attacks with this should be obvious,
> if you allow unprivileged userspace to trigger codepaths containing
> these calls.
>
> +Fast commits
> +~~~~~~~~~~~~
> +
> +JBD2 to also allows you to perform file-system specific delta commits known as
> +fast commits. In order to use fast commits, you first need to call
> +:c:func:`jbd2_fc_init` and tell how many blocks at the end of journal
> +area should be reserved for fast commits. Along with that, you will also need
> +to set following callbacks that perform correspodning work:
> +
> +`journal->j_fc_cleanup_cb`: Cleanup function called after every full commit and
> +fast commit.
> +
> +`journal->j_fc_replay_cb`: Replay function called for replay of fast commit
> +blocks.
> +
> +File system is free to perform fast commits as and when it wants as long as it
> +gets permission from JBD2 to do so by calling the function
> +:c:func:`jbd2_fc_start()`. Once a fast commit is done, the client
> +file system should tell JBD2 about it by calling :c:func:`jbd2_fc_stop()`.
> +If file system wants JBD2 to perform a full commit immediately after stopping
> +the fast commit it can do so by calling :c:func:`jbd2_fc_stop_do_commit()`.
> +This is useful if fast commit operation fails for some reason and the only way
> +to guarantee consistency is for JBD2 to perform the full traditional commit.
> +
> +JBD2 helper functions to manage fast commit buffers. File system can use
> +:c:func:`jbd2_fc_get_buf()` and :c:func:`jbd2_fc_wait_bufs()` to allocate
> +and wait on IO completion of fast commit buffers.
> +
> Summary
> ~~~~~~~
>
> --
> 2.28.0.681.g6f77f65b4e-goog
>
Powered by blists - more mailing lists