lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <YHCCc5UKANOd2VbQ@mit.edu>
Date:   Fri, 9 Apr 2021 12:36:03 -0400
From:   "Theodore Ts'o" <tytso@....edu>
To:     Harshad Shirwadkar <harshadshirwadkar@...il.com>
Cc:     linux-ext4@...r.kernel.org
Subject: Re: [PATCH v6 0/7] Block Allocator Improvements

Thanks, I've applied this patch series into the ext4 git tree.

	     	     	  	- Ted

On Thu, Apr 01, 2021 at 10:21:22AM -0700, Harshad Shirwadkar wrote:
> This patch series improves cr 0 and cr 1 passes of the allocator
> signficantly. Currently, at cr 0 and 1, we perform linear lookups to
> find the matching groups. That's very inefficient for large file
> systems where there are millions of block groups. At cr 0, we only
> care about the groups that have the largest free order >= the
> request's order and at cr 1 we only care about groups where average
> fragment size > the request size. so, this patchset introduces new
> data structures that allow us to perform cr 0 lookup in constant time
> and cr 1 lookup in log (number of groups) time instead of linear.
> 
> For cr 0, we add a list for each order and all the groups are enqueued
> to the appropriate list based on the largest free order in its buddy
> bitmap. This allows us to lookup a match at cr 0 in constant time.
> 
> For cr 1, we add a new rb tree of groups sorted by largest fragment
> size. This allows us to lookup a match for cr 1 in log (num groups)
> time.
> 
> These optimizations can be enabled by passing "mb_optimize_scan" mount
> option.
> 
> These changes may result in allocations to be spread across the block
> device. While that would not matter some block devices (such as flash)
> it may be a cause of concern for other block devices that benefit from
> storing related content togetther such as disk. However, it can be
> argued that in high fragmentation scenrio, especially for large disks,
> it's still worth optimizing the scanning since in such cases, we get
> cpu bound on group scanning instead of getting IO bound. Perhaps, in
> future, we could dynamically turn this new optimization on based on
> fragmentation levels for such devices.
> 
> Verified that there are no regressions in smoke tests (-g quick -c 4k).
> 
> Also, to demonstrate the effectiveness for the patch series, following
> experiment was performed:
> 
> Created a highly fragmented disk of size 65TB. The disk had no
> contiguous 2M regions. Following command was run consecutively for 3
> times:
> 
> time dd if=/dev/urandom of=file bs=2M count=10
> 
> Here are the results with and without cr 0/1 optimizations:
> 
> |---------+------------------------------+---------------------------|
> |         | Without CR 0/1 Optimizations | With CR 0/1 Optimizations |
> |---------+------------------------------+---------------------------|
> | 1st run | 5m1.871s                     | 2m47.642s                 |
> | 2nd run | 2m28.390s                    | 0m0.611s                  |
> | 3rd run | 2m26.530s                    | 0m1.255s                  |
> |---------+------------------------------+---------------------------|
> 
> The patch [3/6] "ext4: add mballoc stats proc file" is a modified
> version of the patch originally written by Artem Blagodarenko
> (artem.blagodarenko@...il.com). With that patch, I ran following
> command with and without optimizations.
> 
> dd if=/dev/zero of=/mnt/file bs=2M count=2 conv=fsync
> 
> Without optimizations:
> 
> useless_c0_loops: 3
> useless_c1_loops: 39
> useless_c2_loops: 0
> useless_c3_loops: 0
> 
> With optimizations:
> 
> useless_c0_loops: 0
> useless_c1_loops: 0
> useless_c2_loops: 0
> useless_c3_loops: 0
> 
> This shows that CR0 and CR1 optimizations get rid of useless CR0 and
> CR1 loops altogether thereby significantly reducing the number of
> groups that get considered.
> 
> Changes from V5:
> ----------------
> - Turned block bitmap prefetching on by default
> - Fixed a bug where for cr >= 2, we were skipping first group without
>   searching in it
> - Renamed mb_linear_limit to mb_max_linear_groups
> 
> Harshad Shirwadkar (7):
>   ext4: drop s_mb_bal_lock and convert protected fields to atomic
>   ext4: add ability to return parsed options from parse_options
>   ext4: add mballoc stats proc file
>   ext4: add MB_NUM_ORDERS macro
>   ext4: improve cr 0 / cr 1 group scanning
>   ext4: add proc files to monitor new structures
>   ext4: make prefetch_block_bitmaps default
> 
>  fs/ext4/ext4.h    |  34 ++-
>  fs/ext4/mballoc.c | 590 +++++++++++++++++++++++++++++++++++++++++++---
>  fs/ext4/mballoc.h |  22 +-
>  fs/ext4/super.c   |  92 +++++---
>  fs/ext4/sysfs.c   |   6 +
>  5 files changed, 680 insertions(+), 64 deletions(-)
> 
> -- 
> 2.31.0.291.g576ba9dcdaf-goog
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ