[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <12d6e6ef-2b8a-7ed2-e386-60c0f4d76683@linutronix.de>
Date: Mon, 12 Jun 2023 17:57:00 +0200 (CEST)
From: Anna-Maria Behnsen <anna-maria@...utronix.de>
To: Frederic Weisbecker <frederic@...nel.org>
cc: linux-kernel@...r.kernel.org,
Peter Zijlstra <peterz@...radead.org>,
John Stultz <jstultz@...gle.com>,
Thomas Gleixner <tglx@...utronix.de>,
Eric Dumazet <edumazet@...gle.com>,
"Rafael J . Wysocki" <rafael.j.wysocki@...el.com>,
Arjan van de Ven <arjan@...radead.org>,
"Paul E . McKenney" <paulmck@...nel.org>,
Frederic Weisbecker <fweisbec@...il.com>,
Rik van Riel <riel@...riel.com>,
Steven Rostedt <rostedt@...dmis.org>,
Sebastian Siewior <bigeasy@...utronix.de>,
Giovanni Gherdovich <ggherdovich@...e.cz>,
Lukasz Luba <lukasz.luba@....com>,
"Gautham R . Shenoy" <gautham.shenoy@....com>
Subject: Re: [PATCH v7 19/21] timer: Implement the hierarchical pull model
On Mon, 12 Jun 2023, Frederic Weisbecker wrote:
> Le Wed, May 24, 2023 at 09:06:27AM +0200, Anna-Maria Behnsen a écrit :
> > +void tmigr_handle_remote(void)
> > +{
> > + struct tmigr_cpu *tmc = this_cpu_ptr(&tmigr_cpu);
> > + struct tmigr_remote_data data;
> > +
> > + if (!is_tmigr_enabled() || !tmc->tmgroup || !tmc->online)
> > + return;
> > +
> > + /*
> > + * NOTE: This is a doubled check because migrator test will be done
> > + * in tmigr_handle_remote_up() anyway. Keep this check to fasten
> > + * the return when nothing has to be done.
> > + */
> > + if (!tmigr_check_migrator(tmc->tmgroup, tmc->childmask))
> > + return;
> > +
> > + data.now = get_jiffies_update(&data.basej);
> > + data.childmask = tmc->childmask;
> > + data.wakeup = KTIME_MAX;
> > +
> > + __walk_groups(&tmigr_handle_remote_up, &data, tmc);
> > +
> > + raw_spin_lock_irq(&tmc->lock);
> > + if (tmc->idle)
> > + tmc->wakeup = data.wakeup;
>
> So this assumes there will be a timer re-eavluation right after
> in the idle path but this is not true in the case of nohz_full.
>
> If the softirq happens at the tail of the timer interrupt, you're
> good because the dynticks is re-evaluated right after. But if the
> softirq happens in ksoftirqd, you'll have no timer re-evaluation.
Ok. This problem is around only for nohz_full and not for nohz idle?
> The crude trick used by nohz_full in order to re-evaluate the dynticks when
> a timer is queued from the timer softirq is to launch a self IPI (from
> trigger_dyntick_cpu()). Here you would need something like that but
> that's not something we wish either.
>
> In fact we don't want any nohz_full CPU to perform an idle migrator job.
> And the problem here is that whenever a timer interrupt occurs while
> tmc->idle == 1 (and that _might_ happen in nohz_full), it will go up the
> hierarchy as long as there is no active migrator on a given level and
> check for remote expiry. And if something expired it will not only perform
> the remote callbacks handling but also reprogram the next tick to fire in
> the next expiry. That's a potential disturbance on a nohz_full CPU.
>
> There is always an active CPU in nohz_full so there is always an active
> migrator at least at the top level. Therefore you can spare concurrent
> idle migrators if they are nohz_full.
>
As long as the top level group is not/never idle, the wakeup value will be
KTIME_MAX and so it is no problem for nohz_full cpus - or? The check for
handling remote expiry is still a problem but please read my proposal for
this below.
> My suggestion is to make tmigr_requires_handle_remote() return 0 if
> tick_nohz_full_cpu(smp_processor_id()), so that we don't even bother
> raising the softirq. But if the timer softirq happens nevertheless, due
> to some local timer to process, also make tmigr_handle_remote() to
> return early.
Regarding this problem and also the two things you mentioned in the two
earlier review remarks (timer IRQ which fires too early, IPI when CPU goes
offline), I would propose to use the tmc->wakeup value slightly different
as it is used right now:
- Whenever a wakeup value is required, because top level group is
completely idle, the value is set in per CPU tmc struct. It could be
then reevaluated in idle code in irq exit path.
- For checking whether remote expiry is required, the wakeup value could
also be used.
- For nohz_full CPUs wakeup will be always KTIME_MAX - under the
assumption that there is alwasy an active CPU in top level group.
What do you think about this? Do I miss something else here?
Thanks,
Anna-Maria
Powered by blists - more mailing lists