[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8603B8C7-DE79-4362-BE60-DE95ABE015BE@nvidia.com>
Date: Wed, 16 Apr 2025 11:19:22 +0000
From: Joel Fernandes <joelagnelf@...dia.com>
To: "paulmck@...nel.org" <paulmck@...nel.org>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Frederic
Weisbecker <frederic@...nel.org>, Neeraj Upadhyay
<neeraj.upadhyay@...nel.org>, Joel Fernandes <joel@...lfernandes.org>, Josh
Triplett <josh@...htriplett.org>, Boqun Feng <boqun.feng@...il.com>,
Uladzislau Rezki <urezki@...il.com>, Steven Rostedt <rostedt@...dmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, Lai Jiangshan
<jiangshanlai@...il.com>, Zqiang <qiang.zhang1211@...il.com>, Davidlohr Bueso
<dave@...olabs.net>, "rcu@...r.kernel.org" <rcu@...r.kernel.org>
Subject: Re: [PATCH v3 1/2] rcutorture: Perform more frequent testing of
->gpwrap
> On Apr 15, 2025, at 8:19 PM, Paul E. McKenney <paulmck@...nel.org> wrote:
>
> On Mon, Apr 14, 2025 at 11:05:45AM -0400, Joel Fernandes wrote:
>> On 4/10/2025 2:29 PM, Paul E. McKenney wrote:
>>>> +static int rcu_gpwrap_lag_init(void)
>>>> +{
>>>> + if (gpwrap_lag_cycle_mins <= 0 || gpwrap_lag_active_mins <= 0) {
>>>> + pr_alert("rcu-torture: lag timing parameters must be positive\n");
>>>> + return -EINVAL;
>>> When rcutorture is initiated by modprobe, this makes perfect sense.
>>>
>>> But if rcutorture is built in, we have other choices: (1) Disable gpwrap
>>> testing and do other testing but splat so that the bogus scripting can
>>> be fixed, (2) Force default values and splat as before, (3) Splat and
>>> halt the system.
>>>
>>> The usual approach has been #1, but what makes sense in this case?
>>
>> If the user deliberately tries to prevent the test, I am Ok with #3 which I
>> believe is the current behavior. But otherwise #1 is also Ok with me but I don't
>> feel strongly about doing that.
>>
>> If we want to do #3, it will just involve changing the "return -EINVAL" to
>> "return 0" but also may need to be doing so only if RCU torture is a built-in.
>>
>> IMO the current behavior is reasonable than adding more complexity for an
>> unusual case for a built-in?
>
> The danger is that someone adjusts a scenario, accidentally disables
> *all* ->gpwrap testing during built-in tests (kvm.sh, kvm-remote,sh,
> and torture.sh), and nobody notices. This has tripped me up in the
> past, hence the existing splats in rcutorture, but only for runs with
> built-in rcutorture.
But in the code we are discussing, we will splat with an error if either parameter is set to 0? Sorry if I missed something.
>
>> On the other hand if the issue is with providing the user with a way to disable
>> gpwrap testing, that should IMO be another parameter than setting the _mins
>> parameters to be 0. But I think we may not want this testing disabled since it
>> is already "self-disabled" for the first 25 miutes.
>
> We do need a way of disabling the testing on long runs for fault-isolation
> purposes.
Thanks, I will add an option for this.
>
> For example, rcutorture.n_up_down=0 disables SRCU up/down testing.
> Speaking of which, I am adding a section on that topic to this document:
>
> https://docs.google.com/document/d/1RoYRrTsabdeTXcldzpoMnpmmCjGbJNWtDXN6ZNr_4H8/edit?usp=sharing
Nice, thanks,
- Joel
>
> Thanx, Paul
Powered by blists - more mailing lists