lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 18 Jan 2021 10:30:21 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Valentin Schneider <valentin.schneider@....com>
Cc:     mingo@...nel.org, tglx@...utronix.de, linux-kernel@...r.kernel.org,
        jiangshanlai@...il.com, cai@...hat.com, vincent.donnefort@....com,
        decui@...rosoft.com, paulmck@...nel.org,
        vincent.guittot@...aro.org, rostedt@...dmis.org, tj@...nel.org
Subject: Re: [PATCH 7/8] sched: Fix CPU hotplug / tighten is_per_cpu_kthread()

On Sun, Jan 17, 2021 at 04:57:27PM +0000, Valentin Schneider wrote:
> On 16/01/21 12:30, Peter Zijlstra wrote:
> > @@ -1796,13 +1796,28 @@ static inline bool rq_has_pinned_tasks(s
> >   */
> >  static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
> >  {
> > +	/* When not in the task's cpumask, no point in looking further. */
> >       if (!cpumask_test_cpu(cpu, p->cpus_ptr))
> >               return false;
> >
> > +	/* migrate_disabled() must be allowed to finish. */
> > +	if (is_migration_disabled(p))
> >               return cpu_online(cpu);
> >
> > +	/* Non kernel threads are not allowed during either online or offline. */
> > +	if (!(p->flags & PF_KTHREAD))
> > +		return cpu_active(cpu);
> > +
> > +	/* KTHREAD_IS_PER_CPU is always allowed. */
> > +	if (kthread_is_per_cpu(p))
> > +		return cpu_online(cpu);
> > +
> > +	/* Regular kernel threads don't get to stay during offline. */
> > +	if (cpu_rq(cpu)->balance_callback == &balance_push_callback)
> > +		return cpu_active(cpu);
> 
> is_cpu_allowed(, cpu) isn't guaranteed to have cpu_rq(cpu)'s rq_lock
> held, so this can race with balance_push_set(, true). This shouldn't
> matter under normal circumstances as we'll have sched_cpu_wait_empty()
> further down the line.
> 
> This might get ugly with the rollback faff - this is jumping the gun a
> bit, but that's something we'll have to address, and I think what I'm
> concerned about is close to what you mentioned in
> 
>   http://lore.kernel.org/r/YAM1t2Qzr7Rib3bN@hirez.programming.kicks-ass.net
> 
> Here's what I'm thinking of:
> 
> _cpu_up()                            ttwu()
>                                        select_task_rq()
>                                          is_cpu_allowed()
>                                            rq->balance_callback != balance_push_callback
>   smpboot_unpark_threads() // FAIL
>   (now going down, set push here)
>   sched_cpu_wait_empty()
>   ...                                  ttwu_queue()
>   sched_cpu_dying()
>   *ARGH*
> 

Let me try this then...

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5057054b1cff..9b045296d646 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7495,6 +7495,8 @@ int sched_cpu_activate(unsigned int cpu)
 	return 0;
 }
 
+unsigned long sched_cpu_rcu_state;
+
 int sched_cpu_deactivate(unsigned int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
@@ -7519,6 +7521,11 @@ int sched_cpu_deactivate(unsigned int cpu)
 	 */
 	balance_push_set(cpu, true);
 
+	/*
+	 * See sched_cpu_wait_empty().
+	 */
+	sched_cpu_rcu_state = get_state_synchronize_rcu();
+
 	rq_lock_irqsave(rq, &rf);
 	if (rq->rd) {
 		update_rq_clock(rq);
@@ -7578,6 +7585,12 @@ int sched_cpu_starting(unsigned int cpu)
  */
 int sched_cpu_wait_empty(unsigned int cpu)
 {
+	/*
+	 * Guarantee that TTWU will observe balance_push_set(true),
+	 * such that all wakeups will refuse this CPU.
+	 */
+	cond_synchronize_rcu(sched_cpu_rcu_state);
+
 	balance_hotplug_wait();
 	return 0;
 }

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ