[<prev] [next>] [day] [month] [year] [list]
Message-ID: <Y9p3HjW3yzn0UYrZ@lothringen>
Date: Wed, 1 Feb 2023 15:28:46 +0100
From: Frederic Weisbecker <frederic@...nel.org>
To: Hillf Danton <hdanton@...a.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
Yu Liao <liaoyu15@...wei.com>, fweisbec@...il.com,
mingo@...nel.org, liwei391@...wei.com, adobriyan@...il.com,
mirsad.todorovac@....unizg.hr, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH RFC] tick/nohz: fix data races in get_cpu_idle_time_us()
On Wed, Feb 01, 2023 at 10:01:17PM +0800, Hillf Danton wrote:
> > > +++ b/kernel/time/tick-sched.c
> > > @@ -640,13 +640,26 @@ static void tick_nohz_update_jiffies(kti
> > > /*
> > > * Updates the per-CPU time idle statistics counters
> > > */
> > > -static void
> > > -update_ts_time_stats(int cpu, struct tick_sched *ts, ktime_t now, u64 *last_update_time)
> > > +static u64 update_ts_time_stats(int cpu, struct tick_sched *ts, ktime_t now,
> > > + int io, u64 *last_update_time)
> > > {
> > > ktime_t delta;
> > >
> > > + if (last_update_time)
> > > + *last_update_time = ktime_to_us(now);
> > > +
> > > if (ts->idle_active) {
> > > delta = ktime_sub(now, ts->idle_entrytime);
> > > +
> > > + /* update is only expected on the local CPU */
> > > + if (cpu != smp_processor_id()) {
> >
> > Why not just updating it only on idle exit then?
>
> This aligns to idle exit as much as it can by disallowing remote update.
I mean why bother updating if idle does it for us already?
One possibility is that we get some more precise values if we read during
long idle periods with nr_iowait_cpu() changes in the middle.
> >
> > > + if (io)
> >
> > I fear it's not up to the caller to decides if the idle time is IO or not.
>
> Could you specify a bit on your concern, given the callers of this function?
You are randomly stating if the elapsing idle time is IO or not depending on
the caller, without verifying nr_iowait_cpu(). Or am I missing something?
> >
> > > + delta = ktime_add(ts->iowait_sleeptime, delta);
> > > + else
> > > + delta = ktime_add(ts->idle_sleeptime, delta);
> > > + return ktime_to_us(delta);
>
> Based on the above comments, I guest you missed this line which prevents
> get_cpu_idle_time_us() and get_cpu_iowait_time_us() from updating ts.
Right...
> > But then you may race with the local updater, risking to return
> > the delta added twice. So you need at least a seqcount.
>
> Add seqcount if needed. No problem.
> >
> > But in the end, nr_iowait_cpu() is broken because that counter can be
> > decremented remotely and so the whole thing is beyond repair:
> >
> > CPU 0 CPU 1 CPU 2
> > ----- ----- ------
> > //io_schedule() TASK A
> > current->in_iowait = 1
> > rq(0)->nr_iowait++
> > //switch to idle
> > // READ /proc/stat
> > // See nr_iowait_cpu(0) == 1
> > return ts->iowait_sleeptime + ktime_sub(ktime_get(), ts->idle_entrytime)
> >
> > //try_to_wake_up(TASK A)
> > rq(0)->nr_iowait--
> > //idle exit
> > // See nr_iowait_cpu(0) == 0
> > ts->idle_sleeptime += ktime_sub(ktime_get(), ts->idle_entrytime)
>
> Ah see your point.
>
> The diff disallows remotely updating ts, and it is updated in idle exit
> after my proposal, so what nr_iowait_cpu() breaks is mitigated.
Only halfway mitigated. This doesn't prevent from backward or forward jumps
when non-updating readers are involved at all.
Thanks.
>
> Thanks for taking a look, particularly the race linked to nr_iowait_cpu().
>
> Hillf
Powered by blists - more mailing lists