lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220420195425.34911-1-logang@deltatee.com>
Date:   Wed, 20 Apr 2022 13:54:13 -0600
From:   Logan Gunthorpe <logang@...tatee.com>
To:     linux-kernel@...r.kernel.org, linux-raid@...r.kernel.org,
        Song Liu <song@...nel.org>
Cc:     Christoph Hellwig <hch@...radead.org>,
        Guoqing Jiang <guoqing.jiang@...ux.dev>,
        Stephen Bates <sbates@...thlin.com>,
        Martin Oliveira <Martin.Oliveira@...eticom.com>,
        David Sloan <David.Sloan@...eticom.com>,
        Logan Gunthorpe <logang@...tatee.com>
Subject: [PATCH v2 00/12] Improve Raid5 Lock Contention

Hi,

This is v2 of this series which addresses Christoph's feedback and
fixes some bugs. The first posting is at [1]. A git branch is
available at [2].

--

I've been doing some work trying to improve the bulk write performance
of raid5 on large systems with fast NVMe drives. The bottleneck appears
largely to be lock contention on the hash_lock and device_lock. This
series improves the situation slightly by addressing a couple of low
hanging fruit ways to take the lock fewer times in the request path.

Patch 9 adjusts how batching works by keeping a reference to the
previous stripe_head in raid5_make_request(). Under most situtations,
this removes the need to take the hash_lock in stripe_add_to_batch_list()
which should reduce the number of times the lock is taken by a factor of
about 2.

Patch 12 pivots the way raid5_make_request() works. Before the patch, the
code must find the stripe_head for every 4KB page in the request, so each
stripe head must be found once for every data disk. The patch changes this
so that all the data disks can be added to a stripe_head at once and the
number of times the stripe_head must be found (and thus the number of
times the hash_lock is taken) should be reduced by a factor roughly equal
to the number of data disks.

The remaining patches are just cleanup and prep patches for those two
patches.

Doing apples to apples testing this series on a small VM with 5 ram
disks, I saw a bandwidth increase of roughly 14% and lock contentions
on the hash_lock (as reported by lock stat) reduced by more than a factor
of 5 (though it is still significantly contended).

Testing on larger systems with NVMe drives saw similar small bandwidth
increases from 3% to 20% depending on the parameters. Oddly small arrays
had larger gains, likely due to them having lower starting bandwidths; I
would have expected larger gains with larger arrays (seeing there
should have been even fewer locks taken in raid5_make_request()).

Logan

[1] https://lkml.kernel.org/r/20220407164511.8472-1-logang@deltatee.com
[2] https://github.com/sbates130272/linux-p2pmem raid5_lock_cont_v2

--

Changes since v1:
  - Rebased on current md-next branch (190a901246c69d79)
  - Added patch to create a helper for checking if a sector
    is ahead of the reshape (per Christoph)
  - Reworked the __find_stripe() patch to create a find_get_stripe()
    helper (per Christoph)
  - Added more patches to further refactor raid5_make_request() and
    pull most of the loop body into a helper function (per Christoph)
  - A few other minor cleanups (boolean return, droping casting when
    printing sectors, commit message grammar) as suggested by Christoph.
  - Fixed two uncommon but bad data corruption bugs in that were found.

--

Logan Gunthorpe (12):
  md/raid5: Factor out ahead_of_reshape() function
  md/raid5: Refactor raid5_make_request loop
  md/raid5: Move stripe_add_to_batch_list() call out of add_stripe_bio()
  md/raid5: Move common stripe count increment code into __find_stripe()
  md/raid5: Factor out helper from raid5_make_request() loop
  md/raid5: Drop the do_prepare flag in raid5_make_request()
  md/raid5: Move read_seqcount_begin() into make_stripe_request()
  md/raid5: Refactor for loop in raid5_make_request() into while loop
  md/raid5: Keep a reference to last stripe_head for batch
  md/raid5: Refactor add_stripe_bio()
  md/raid5: Check all disks in a stripe_head for reshape progress
  md/raid5: Pivot raid5_make_request()

 drivers/md/raid5.c | 632 +++++++++++++++++++++++++++++----------------
 drivers/md/raid5.h |   1 +
 2 files changed, 410 insertions(+), 223 deletions(-)


base-commit: 190a901246c69d79dadd1ab574022548da612724
--
2.30.2

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ