linux-ext4 - Re: [PATCH] [RFC] ext2fs: parallel bitmap loading

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <D9D82475-7B39-4D79-84CA-C246130AD3B1@dilger.ca>
Date:   Wed, 16 Sep 2020 19:34:22 -0600
From:   Andreas Dilger <adilger@...ger.ca>
To:     "Theodore Y. Ts'o" <tytso@....edu>
Cc:     Ext4 Developers List <linux-ext4@...r.kernel.org>,
        Wang Shilong <wangshilong1991@...il.com>,
        saranyamohan@...gle.com, harshads@...gle.com
Subject: Re: [PATCH] [RFC] ext2fs: parallel bitmap loading

On Sep 16, 2020, at 3:03 PM, Theodore Y. Ts'o <tytso@....edu> wrote:
> 
> On Fri, Sep 04, 2020 at 03:34:26PM -0600, Andreas Dilger wrote:
>> This is a patch that is part of the parallel e2fsck series that Shilong
>> is working on, and does not work by itself, but was requested during
>> discussion on the ext4 concall today.
> 
> Andreas, thanks for sending this patch.  (Also available at[1].)
> 
> [1] https://lore.kernel.org/linux-ext4/132401FE-6D25-41B3-99D1-50E7BC746237@dilger.ca/
> 
> I took look at it, and there are a number of issues with it.  First of
> all, there seems to be an assumption that (a) the number of threads is
> less than the number of block groups, and (b) the number of threads
> can evenly divide the number of block groups.  So for example, if the
> number of block groups is prime, or if you are trying to use say, 8 or
> 16 threads, and the number of block groups is odd, the code in
> question will not do the right thing.

Yes, the thread count is checked earlier in the parallel e2fsck patch
series to be <= number of block groups.  However, I wasn't aware of any
requirement for groups = N * threads.  It may be coincidental that we
have never tested that case.

In any case, the patch was never really intended to be used by itself,
only for review and discussion of the general approach.

> (a) meant that attempting to run the e2fsprogs regression test suite
> caused most of the test cases to fail with e2fsck crashing due to
> buffer overruns.  I fixed this by changing the number of threads to be
> 16, or if 16 was greater than the number of block groups, to be the
> number of block groups, just for debugging purposes.  However, there
> were still a few regression test failures.
> 
> I also then tried to use a file system that we had been using for
> testing fragmentation issues.  The file system was creating a 10GB
> virtual disk, and then running these commands:
> 
>   DEV=/dev/sdc
>   mke2fs -t ext4 $DEV 10G
>   mount $DEV /mnt
>   pushd /mnt
>   for t in $(seq 1 6144) ; do
>       for i in $(seq 1 25) ; do
>           fallocate tb$t-8mb-$i -l 8M
>       done
>       for i in $(seq 1 2) ; do
>           fallocate tb$t-400mb-$i -l 400M
>       done
>   done
>   popd
>   umount /mnt
> 
> With the patch applied, all of the threads failed with error code 22
> (EINVAL), except for one which failed with a bad block group checksum
> error.  I haven't had a chance to dig into further; but I was hoping
> that Shilong and/or Saranya might be able to take closer look at that.

There may very well be other issues with the patch that make it not
useful as-is in isolation.  I'd have to let Shilong comment on that.

> But the other thing that we might want to consider is to add
> demand-loading of the block (or inode) bitmap.  We got a complaint
> that "e2fsck -E journal_only" was super-slow whereas running the
> journal by mounting and unmounting the file system was much faster.
> The reason, of course, was because the kernel was only reading those
> bitmap blocks that are needed to be modified by the orphaned inode
> processing, whereas with e2fsprogs, we have to read in all of the
> bitmap blocks whether this is necessary or not.

Forking threads to do on-demand loading may have a high overhead, so
it would be interesting to investigate a libext2fs IO engine that is
using libaio.  That would allow O_DIRECT reading of filesystem metadata
without double caching, as well as avoid blocking threads.  Alternately,
there is already a "readahead" method exported that could be used to
avoid changing the code too much, using posix_fadvise(WILLNEED), but I
have no idea on how that would perform.

> So another idea that we've talked about is teaching libext2fs to be
> able to demand load the bitmap, and then when we write out the block
> bitmap, we only need to write out those blocks that were loaded.  This
> would also speed up running debugfs to examine the file system, as
> well as running fuse2fs.  Fortunately, we have abstractions in front
> of all of the bitmap accessor functions, and the code paths that would
> need to be changed to add demand-loading of bitmaps should be mostly
> exclusive of the changes needed for parallel bitmap loading.  So if
> Shilong has time to look at making the parallel bitmap loader more
> robust, perhaps Saranya could work on the demand-loading idea.
> 
> Or if Shilong doesn't have time to try to polish this parallel bitmap
> loading changes, we could have Saranya look at clean it up --- since
> regardless of whether we implement demand-loading or not, parallel
> bitmap reading is going to be useful for some use cases (e.g., a full
> fsck, dumpe2fs, or e2image).

I don't think Shilong will have time to work on major code changes for
the next few weeks at least, due to internal deadlines, after which we
can finish cleaning up and submitting the pfsck patch series upstream.
If you are interested in the whole 59-patch series, it is available via:

git pull https://review.whamcloud.com/tools/e2fsprogs refs/changes/14/39914/1

or viewable online via Gerrit at:

https://review.whamcloud.com/39914

Getting some high-level review/feedback of that patch series would avoid
spending time to rework/rebase it and finding it isn't in the form that
you would prefer, or if it needs major architectural changes.

Note that this is currently based on top of the Lustre e2fsprogs branch.
While these shouldn't cause any problems with non-Lustre filesystems,
there are other patches in the series that are not necessarily ready
for submission (e.g. dirdata, Lustre xattr decoding, inode badness, etc).

Cheers, Andreas






Download attachment "signature.asc" of type "application/pgp-signature" (874 bytes)