[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <xhsmhpm2prnd1.mognet@vschneid.remote.csb>
Date: Mon, 11 Sep 2023 12:54:50 +0200
From: Valentin Schneider <vschneid@...hat.com>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: linux-kernel@...r.kernel.org,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Steven Rostedt <rostedt@...dmis.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH] sched/rt: Make rt_rq->pushable_tasks updates drive
rto_mask
Ok, back to this :)
On 15/08/23 16:21, Sebastian Andrzej Siewior wrote:
> What I still observe is:
> - CPU0 is idle. CPU0 gets a task assigned from CPU1. That task receives
> a wakeup. CPU0 returns from idle and schedules the task.
> pull_rt_task() on CPU1 and sometimes on other CPU observe this, too.
> CPU1 sends irq_work to CPU0 while at the time rto_next_cpu() sees that
> has_pushable_tasks() return 0. That bit was cleared earlier (as per
> tracing).
>
> - CPU0 is idle. CPU0 gets a task assigned from CPU1. The task on CPU0 is
> woken up without an IPI (yay). But then pull_rt_task() decides that
> send irq_work and has_pushable_tasks() said that is has tasks left
> so….
> Now: rto_push_irq_work_func() run once once on CPU0, does nothing,
> rto_next_cpu() return CPU0 again and enqueues itself again on CPU0.
> Usually after the second or third round the scheduler on CPU0 makes
> enough progress to remove the task/ clear the CPU from mask.
>
If CPU0 is selected for the push IPI, then we should have
rd->rto_cpu == CPU0
So per the
cpumask_next(rd->rto_cpu, rd->rto_mask);
in rto_next_cpu(), it shouldn't be able to re-select itself.
Do you have a simple enough reproducer I could use to poke at this?
> I understand that there is a race and the CPU is cleared from rto_mask
> shortly after checking. Therefore I would suggest to look at
> has_pushable_tasks() before returning a CPU in rto_next_cpu() as I did
> just to avoid the interruption which does nothing.
>
> For the second case the irq_work seems to make no progress. I don't see
> any trace_events in hardirq, the mask is cleared outside hardirq (idle
> code). The NEED_RESCHED bit is set for current therefore it doesn't make
> sense to send irq_work to reschedule if the current already has this on
> its agenda.
>
> So what about something like:
>
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 00e0e50741153..d963408855e25 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -2247,8 +2247,23 @@ static int rto_next_cpu(struct root_domain *rd)
>
> rd->rto_cpu = cpu;
>
> - if (cpu < nr_cpu_ids)
> + if (cpu < nr_cpu_ids) {
> + struct task_struct *t;
> +
> + if (!has_pushable_tasks(cpu_rq(cpu)))
> + continue;
> +
IIUC that's just to plug the race between the CPU emptying its
pushable_tasks list and it removing itself from the rto_mask - that looks
fine to me.
> + rcu_read_lock();
> + t = rcu_dereference(rq->curr);
> + /* if (test_preempt_need_resched_cpu(cpu_rq(cpu))) */
> + if (test_tsk_need_resched(t)) {
We need to make sure this doesn't cause us to loose IPIs we actually need.
We do have a call to put_prev_task_balance() through entering __schedule()
if the previous task is RT/DL, and balance_rt() can issue a push
IPI, but AFAICT only if the previous task was the last DL task. So I don't
think we can do this.
> + rcu_read_unlock();
> + continue;
> + }
> + rcu_read_unlock();
> +
> return cpu;
> + }
>
> rd->rto_cpu = -1;
>
> Sebastian
Powered by blists - more mailing lists