[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <jhjsg6z4i2w.mognet@arm.com>
Date: Sun, 17 Jan 2021 16:57:27 +0000
From: Valentin Schneider <valentin.schneider@....com>
To: Peter Zijlstra <peterz@...radead.org>, mingo@...nel.org,
tglx@...utronix.de
Cc: linux-kernel@...r.kernel.org, jiangshanlai@...il.com,
cai@...hat.com, vincent.donnefort@....com, decui@...rosoft.com,
paulmck@...nel.org, vincent.guittot@...aro.org,
rostedt@...dmis.org, tj@...nel.org, peterz@...radead.org
Subject: Re: [PATCH 7/8] sched: Fix CPU hotplug / tighten is_per_cpu_kthread()
On 16/01/21 12:30, Peter Zijlstra wrote:
> @@ -1796,13 +1796,28 @@ static inline bool rq_has_pinned_tasks(s
> */
> static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
> {
> + /* When not in the task's cpumask, no point in looking further. */
> if (!cpumask_test_cpu(cpu, p->cpus_ptr))
> return false;
>
> - if (is_per_cpu_kthread(p) || is_migration_disabled(p))
> + /* migrate_disabled() must be allowed to finish. */
> + if (is_migration_disabled(p))
> return cpu_online(cpu);
>
> - return cpu_active(cpu);
> + /* Non kernel threads are not allowed during either online or offline. */
> + if (!(p->flags & PF_KTHREAD))
> + return cpu_active(cpu);
> +
> + /* KTHREAD_IS_PER_CPU is always allowed. */
> + if (kthread_is_per_cpu(p))
> + return cpu_online(cpu);
> +
> + /* Regular kernel threads don't get to stay during offline. */
> + if (cpu_rq(cpu)->balance_callback == &balance_push_callback)
> + return cpu_active(cpu);
is_cpu_allowed(, cpu) isn't guaranteed to have cpu_rq(cpu)'s rq_lock
held, so this can race with balance_push_set(, true). This shouldn't
matter under normal circumstances as we'll have sched_cpu_wait_empty()
further down the line.
This might get ugly with the rollback faff - this is jumping the gun a
bit, but that's something we'll have to address, and I think what I'm
concerned about is close to what you mentioned in
http://lore.kernel.org/r/YAM1t2Qzr7Rib3bN@hirez.programming.kicks-ass.net
Here's what I'm thinking of:
_cpu_up() ttwu()
select_task_rq()
is_cpu_allowed()
rq->balance_callback != balance_push_callback
smpboot_unpark_threads() // FAIL
(now going down, set push here)
sched_cpu_wait_empty()
... ttwu_queue()
sched_cpu_dying()
*ARGH*
I've written some horrors on top of this series here:
https://gitlab.arm.com/linux-arm/linux-vs/-/commits/mainline/migrate_disable/stragglers/
Also, my TX2 is again in need of CPR, so in the meantime I'm running
tests on a (much) smaller machine...
> +
> + /* But are allowed during online. */
> + return cpu_online(cpu);
> }
Powered by blists - more mailing lists