[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+Y8Az3_gi-UX-KCfQ1dxARJtL1NhB1AGLv9o_5gNtkWOg@mail.gmail.com>
Date: Tue, 19 Apr 2022 16:11:36 +0200
From: Dmitry Vyukov <dvyukov@...gle.com>
To: paulmck@...nel.org
Cc: Hillf Danton <hdanton@...a.com>, linux-kernel@...r.kernel.org,
syzkaller <syzkaller@...glegroups.com>
Subject: Re: [PATCH rcu 04/11] kernel/smp: Provide boot-time timeout for CSD
lock diagnostics
On Tue, 19 Apr 2022 at 15:46, Paul E. McKenney <paulmck@...nel.org> wrote:
>
> On Tue, Apr 19, 2022 at 04:56:07PM +0800, Hillf Danton wrote:
> > On Mon, 18 Apr 2022 15:53:52 -0700 Paul E. McKenney wrote:
> > > Debugging of problems involving insanely long-running SMI handlers
> > > proceeds better if the CSD-lock timeout can be adjusted. This commit
> > > therefore provides a new smp.csd_lock_timeout kernel boot parameter
> > > that specifies the timeout in milliseconds. The default remains at the
> > > previously hard-coded value of five seconds.
> > >
> > > Cc: Rik van Riel <riel@...riel.com>
> > > Cc: Peter Zijlstra <peterz@...radead.org>
> > > Cc: Ingo Molnar <mingo@...nel.org>
> > > Cc: Thomas Gleixner <tglx@...utronix.de>
> > > Cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
> > > Cc: Juergen Gross <jgross@...e.com>
> > > Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
> > > ---
> > > Documentation/admin-guide/kernel-parameters.txt | 11 +++++++++++
> > > kernel/smp.c | 7 +++++--
> > > 2 files changed, 16 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > index 3f1cc5e317ed..645c4c001b16 100644
> > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > @@ -5377,6 +5377,17 @@
> > > smart2= [HW]
> > > Format: <io1>[,<io2>[,...,<io8>]]
> > >
> > > + smp.csd_lock_timeout= [KNL]
> > > + Specify the period of time in milliseconds
> > > + that smp_call_function() and friends will wait
> > > + for a CPU to release the CSD lock. This is
> > > + useful when diagnosing bugs involving CPUs
> > > + disabling interrupts for extended periods
> > > + of time. Defaults to 5,000 milliseconds, and
> > > + setting a value of zero disables this feature.
> > > + This feature may be more efficiently disabled
> > > + using the csdlock_debug- kernel parameter.
> > > +
> >
> > Can non-responsive CSD lock detected trigger syzbot (warning) report?
>
> If they enable it by building with CONFIG_CSD_LOCK_WAIT_DEBUG=y, yes.
+syzkaller mailing list
Currently we don't enable CONFIG_CSD_LOCK_WAIT_DEBUG in syzbot configs.
Is it a generally useful debugging feature recommended to be enabled
in kernel testing systems?
If we enabled it, we also need to figure out where it fits into the
timeout hierarchy and adjust smp.csd_lock_timeout:
https://github.com/google/syzkaller/blob/master/dashboard/config/linux/bits/x86_64.yml#L15-L40
> Thanx, Paul
>
> > Hillf
> > > smsc-ircc2.nopnp [HW] Don't use PNP to discover SMC devices
> > > smsc-ircc2.ircc_cfg= [HW] Device configuration I/O port
> > > smsc-ircc2.ircc_sir= [HW] SIR base I/O port
> > > diff --git a/kernel/smp.c b/kernel/smp.c
> > > index 01a7c1706a58..d82439bac401 100644
> > > --- a/kernel/smp.c
> > > +++ b/kernel/smp.c
> > > @@ -183,7 +183,9 @@ static DEFINE_PER_CPU(smp_call_func_t, cur_csd_func);
> > > static DEFINE_PER_CPU(void *, cur_csd_info);
> > > static DEFINE_PER_CPU(struct cfd_seq_local, cfd_seq_local);
> > >
> > > -#define CSD_LOCK_TIMEOUT (5ULL * NSEC_PER_SEC)
> > > +static ulong csd_lock_timeout = 5000; /* CSD lock timeout in milliseconds. */
> > > +module_param(csd_lock_timeout, ulong, 0444);
> > > +
> > > static atomic_t csd_bug_count = ATOMIC_INIT(0);
> > > static u64 cfd_seq;
> > >
> > > @@ -329,6 +331,7 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
> > > u64 ts2, ts_delta;
> > > call_single_data_t *cpu_cur_csd;
> > > unsigned int flags = READ_ONCE(csd->node.u_flags);
> > > + unsigned long long csd_lock_timeout_ns = csd_lock_timeout * NSEC_PER_MSEC;
> > >
> > > if (!(flags & CSD_FLAG_LOCK)) {
> > > if (!unlikely(*bug_id))
> > > @@ -341,7 +344,7 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
> > >
> > > ts2 = sched_clock();
> > > ts_delta = ts2 - *ts1;
> > > - if (likely(ts_delta <= CSD_LOCK_TIMEOUT))
> > > + if (likely(ts_delta <= csd_lock_timeout_ns || csd_lock_timeout_ns <= 0))
> > > return false;
> > >
> > > firsttime = !*bug_id;
> > > --
> > > 2.31.1.189.g2e36527f23
Powered by blists - more mailing lists