[<prev] [next>] [day] [month] [year] [list]
Message-ID: <YMN10sXgoTR/IPxr@mit.edu>
Date: Fri, 11 Jun 2021 10:40:18 -0400
From: "Theodore Ts'o" <tytso@....edu>
To: Ritesh Harjani <riteshh@...ux.ibm.com>
Cc: linux-ext4@...r.kernel.org
Subject: Parallel fsck current status
Parallel FSCK Project current status
Written by harshads@ and further updated by tytso@
Background
==========
Ext4 fsck has traditionally been a single threaded program. On large
(and especially fragmented) disks, fsck has resulted in performance
degradation. On large disks, this single threaded fsck takes a long
time to complete.
Fortunately, upstream has seen some action for parallelizing fsck
[1]. However, as you can see the patchset is very long (with around
50~ patches) and it didn’t completely make it through to e2fsck. Ted
added threading support to e2fsprogs [3] that added following
features:
* The patchset made libext2fs thread-aware
* The patchset added parallel bitmap loading
However, the upstream changes added by Ted only parallelize bitmap
loading. File system checking is still single threaded. Reviewing and
merging massive patchset is extremely hard and that’s why Ted
suggested on the mailing list[4] that we first add support for
multithreading to libext2fs. This will allow us to add unit tests for
parallelizing libext2fs independently of parallel e2fsck. Once that
goes in, we can rebase the rest of the patches on top of libext2fs
changes.
Saranya spent some effort cleaning up Wang Shilong's patches, and
there is a working version of those patches which are based on a
recent version of e2fsprogs (just before fast_commit support was
integrated) at [2]. However, when we looked more closely at that
patch, a fundamental issue of that patch is that the changes to e2fsck
to enable multithreaded access to the internal data structures of the
libext2fs library made the patches extremely fragile, since it exposed
the internal data abstractions of libext2fs into e2fsck.
Problem Definition
==================
The top level object holding critical information in e2fsprogs is
called ext2fil_sys. Every application that links against libext2fs,
allocates, updates and frees this struct using libext2fs API [5]. For
making any libext2fs application thread-aware, we first need to add
the ability in libext2fs to clone this structure so that multiple
threads can make progress parallely. Once all the threads finish,
we’ll need to add the ability to merge these structures back. So, in
other words, we’ll need to add following APIs in libext2fs:
/* Clone fs object into dest based on flags */
errcode_t ext2fs_clone_fs(ext2_filsys fs, ext2_filsys *dest, int flags);
/* Try to free the FS object. If this object is a clone, merge it with the parent. */
errcode_t ext2fs_free_fs(ext2_filsys fs);
Saranya was working on this project; the commit [6] is a work in
progress to implement this design. We can either take that code and
modify or start from scratch and use that code as a reference.
Outcome and Future Direction
============================
At the end of this project, we’ll have an upstream ready
patchset. Once these changes are in, the next step would be to drop
some patches from Wang’s original e2fsck patchset[1] and rebase the
rest of the series on top of the patchset.
REFERENCES
==========
[1] Wang Shilong’s original parallel e2fsck patchset:
http://patchwork.ozlabs.org/project/linux-ext4/list/?series=169193
[2] Wang Shilong's patches rebased and cleaned up versus a relatively
recent version of e2fsprogs:
https://github.com/tytso/e2fsprogs/tree/pfsck
git fetch https://github.com/tytso/e2fsprogs.git pfsck
[3] Patches sent by Ted that add parallel bitmap support:
https://www.spinics.net/lists/linux-ext4/msg75716.html
[4] Ted’s suggested next steps:
http://patchwork.ozlabs.org/project/linux-ext4/patch/20201118153947.3394530-11-saranyamohan@google.com/#2584340
[5] libext2fs API
https://github.com/tytso/e2fsprogs/blob/master/lib/ext2fs/ext2fs.h
[6] Saranya’s WIP commit that adds clonefs support:
https://github.com/srnym/e2fsprogs/commit/3007ba6c47a5caf2e2346d4eb2e05f1333663c2f
Powered by blists - more mailing lists