[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4f0b44aa-77a4-9896-b780-eb52241954ae@deltatee.com>
Date: Thu, 28 Apr 2022 15:22:06 -0600
From: Logan Gunthorpe <logang@...tatee.com>
To: Xiao Ni <xni@...hat.com>
Cc: Guoqing Jiang <guoqing.jiang@...ux.dev>,
open list <linux-kernel@...r.kernel.org>,
linux-raid <linux-raid@...r.kernel.org>,
Song Liu <song@...nel.org>,
Christoph Hellwig <hch@...radead.org>,
Stephen Bates <sbates@...thlin.com>,
Martin Oliveira <Martin.Oliveira@...eticom.com>,
David Sloan <David.Sloan@...eticom.com>
Subject: Re: [PATCH v2 00/12] Improve Raid5 Lock Contention
On 2022-04-25 10:12, Xiao Ni wrote:
>> I do know that lkp-tests has run it on this series as I did get an error
>> from it. But while I'm pretty sure that error has been resolved, I was
>> never able to figure out how to run them locally.
>>
>
> Hi Logan
>
> You can clone the mdadm repo at
> git://git.kernel.org/pub/scm/utils/mdadm/mdadm.git
> Then you can find there is a script test under the directory. It's not
> under the tests directory.
> The test cases are under tests directory.
So I've been fighting with this and it seems there are just a ton of
failures in these tests without my changes. Running on the latest master
(52c67fcdd6dad) with stock v5.17.5 I see major brokenness. About 17 out
of 44 tests that run failed. I had to run with --disable-integrity
because those tests seem to hang on an infinite loop waiting for the md
array to go into the U state (even though it appears idle).
Even though I ran the tests with '--keep-going', the testing stopped
after the 07revert-grow reported errors in dmesg -- even though the only
errors printed to dmesg were that of mdadm segfaulting.
Running on md/md-next seems to get a bit further (to
10ddf-create-fail-rebuild) and stops with the same segfaulting issue (or
perhaps the 07 test only randomly fails first -- I haven't run it that
many times). Though most of the tests between these points fail anyway.
My upcoming v3 patches cause no failures that are different from the
md/md-next branch. But it seems these tests have rotted to the point
that they aren't all that useful; or maybe there are a ton of
regressions in the kernel already and nobody was paying much attention.
I have also tried to test certain cases that appear broken in recent
kernels anyway (like reducing the number of disks in a raid5 array hangs
on the first stripe to reshape).
In any case I have a very rough ad-hoc test suite I've been expanding
that is targeted at testing my specific changes. Testing these changes
has definitely been challenging. In any case, I've published my tests here:
https://github.com/Eideticom/raid5-tests
Logan
Powered by blists - more mailing lists