lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YHQ3Iy7QfL+0UoM0@hirez.programming.kicks-ass.net>
Date:   Mon, 12 Apr 2021 14:03:47 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Valentin Schneider <valentin.schneider@....com>
Cc:     tglx@...utronix.de, mingo@...nel.org, bigeasy@...utronix.de,
        swood@...hat.com, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com, vincent.donnefort@....com, qais.yousef@....com,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/3] sched: Use cpu_dying() to fix balance_push vs
 hotplug-rollback

On Thu, Mar 11, 2021 at 03:13:04PM +0000, Valentin Schneider wrote:
> Peter Zijlstra <peterz@...radead.org> writes:
> > @@ -7883,14 +7889,6 @@ int sched_cpu_deactivate(unsigned int cp
> >  	set_cpu_active(cpu, false);
> >  
> >  	/*
> > -	 * From this point forward, this CPU will refuse to run any task that
> > -	 * is not: migrate_disable() or KTHREAD_IS_PER_CPU, and will actively
> > -	 * push those tasks away until this gets cleared, see
> > -	 * sched_cpu_dying().
> > -	 */
> > -	balance_push_set(cpu, true);
> > -
> > -	/*
> >  	 * We've cleared cpu_active_mask / set balance_push, wait for all
> >  	 * preempt-disabled and RCU users of this state to go away such that
> >  	 * all new such users will observe it.
> > @@ -7910,6 +7908,14 @@ int sched_cpu_deactivate(unsigned int cp
> >  	}
> >  	rq_unlock_irqrestore(rq, &rf);
> >  
> > +	/*
> > +	 * From this point forward, this CPU will refuse to run any task that
> > +	 * is not: migrate_disable() or KTHREAD_IS_PER_CPU, and will actively
> > +	 * push those tasks away until this gets cleared, see
> > +	 * sched_cpu_dying().
> > +	 */
> > +	balance_push_set(cpu, true);
> > +
> 
> AIUI with cpu_dying_mask being flipped before even entering
> sched_cpu_deactivate(), we don't need this to be before the
> synchronize_rcu() anymore; is there more than that to why you're punting it
> back this side of it?

I think it does does need to be like this, we need to clearly separate
the active=true and balance_push_set(). If we were to somehow observe
both balance_push_set() and active==false, we'd be in trouble.

> >  #ifdef CONFIG_SCHED_SMT
> >  	/*
> >  	 * When going down, decrement the number of cores with SMT present.
> 
> > @@ -8206,7 +8212,7 @@ void __init sched_init(void)
> >  		rq->sd = NULL;
> >  		rq->rd = NULL;
> >  		rq->cpu_capacity = rq->cpu_capacity_orig = SCHED_CAPACITY_SCALE;
> > -		rq->balance_callback = NULL;
> > +		rq->balance_callback = &balance_push_callback;
> >  		rq->active_balance = 0;
> >  		rq->next_balance = jiffies;
> >  		rq->push_cpu = 0;
> > @@ -8253,6 +8259,7 @@ void __init sched_init(void)
> >  
> >  #ifdef CONFIG_SMP
> >  	idle_thread_set_boot_cpu();
> > +	balance_push_set(smp_processor_id(), false);
> >  #endif
> >  	init_sched_fair_class();
> >
> 
> I don't get what these two changes do - the end result is the same as
> before, no?

Not quite; we have to make sure the state of the offline CPUs matches
that of a CPU that's been offlined. For consistency if nothing else, but
it's actually important for a point below.

> Also, AIUI this patch covers the cpu_dying -> !cpu_dying rollback case
> since balance_push gets numbed down by !cpu_dying. What about the other way
> around (hot-plug failure + rollback)? We may have allowed !pcpu tasks on the
> now-dying CPU, and we'd need to re-install the balance_push callback. 

This is in fact handled. Note how the previous point initialized the
offline CPU to have the push_callback installed.

Also note how balance_push() re-instates itself unconditionally.

So the thing is, we install the push callback on deactivate() and leave
it in place until activate, it is always there, regardless what way
we're moving.

However, it is only effective whild going down, see the early exit.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ