[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <871rclu3jz.mognet@e113632-lin.i-did-not-set--mail-host-address--so-tickle-me>
Date: Thu, 11 Mar 2021 15:13:04 +0000
From: Valentin Schneider <valentin.schneider@....com>
To: Peter Zijlstra <peterz@...radead.org>, tglx@...utronix.de,
mingo@...nel.org, bigeasy@...utronix.de, swood@...hat.com,
juri.lelli@...hat.com, vincent.guittot@...aro.org,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, bristot@...hat.com, vincent.donnefort@....com,
qais.yousef@....com
Cc: linux-kernel@...r.kernel.org, peterz@...radead.org
Subject: Re: [PATCH 3/3] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback
Peter Zijlstra <peterz@...radead.org> writes:
> @@ -7883,14 +7889,6 @@ int sched_cpu_deactivate(unsigned int cp
> set_cpu_active(cpu, false);
>
> /*
> - * From this point forward, this CPU will refuse to run any task that
> - * is not: migrate_disable() or KTHREAD_IS_PER_CPU, and will actively
> - * push those tasks away until this gets cleared, see
> - * sched_cpu_dying().
> - */
> - balance_push_set(cpu, true);
> -
> - /*
> * We've cleared cpu_active_mask / set balance_push, wait for all
> * preempt-disabled and RCU users of this state to go away such that
> * all new such users will observe it.
> @@ -7910,6 +7908,14 @@ int sched_cpu_deactivate(unsigned int cp
> }
> rq_unlock_irqrestore(rq, &rf);
>
> + /*
> + * From this point forward, this CPU will refuse to run any task that
> + * is not: migrate_disable() or KTHREAD_IS_PER_CPU, and will actively
> + * push those tasks away until this gets cleared, see
> + * sched_cpu_dying().
> + */
> + balance_push_set(cpu, true);
> +
AIUI with cpu_dying_mask being flipped before even entering
sched_cpu_deactivate(), we don't need this to be before the
synchronize_rcu() anymore; is there more than that to why you're punting it
back this side of it?
> #ifdef CONFIG_SCHED_SMT
> /*
> * When going down, decrement the number of cores with SMT present.
> @@ -8206,7 +8212,7 @@ void __init sched_init(void)
> rq->sd = NULL;
> rq->rd = NULL;
> rq->cpu_capacity = rq->cpu_capacity_orig = SCHED_CAPACITY_SCALE;
> - rq->balance_callback = NULL;
> + rq->balance_callback = &balance_push_callback;
> rq->active_balance = 0;
> rq->next_balance = jiffies;
> rq->push_cpu = 0;
> @@ -8253,6 +8259,7 @@ void __init sched_init(void)
>
> #ifdef CONFIG_SMP
> idle_thread_set_boot_cpu();
> + balance_push_set(smp_processor_id(), false);
> #endif
> init_sched_fair_class();
>
I don't get what these two changes do - the end result is the same as
before, no?
Also, AIUI this patch covers the cpu_dying -> !cpu_dying rollback case
since balance_push gets numbed down by !cpu_dying. What about the other way
around (hot-plug failure + rollback)? We may have allowed !pcpu tasks on the
now-dying CPU, and we'd need to re-install the balance_push callback.
I'm starting to think we'd need to have
rq->balance_callback = &balance_push_callback
for any CPU with hotplug state < CPUHP_AP_ACTIVE. Thus we would
need:
balance_push_set(cpu, true) in sched_init() and sched_cpu_deactivate()
balance_push_set(cpu, false) in sched_cpu_activate()
and the rest would be driven by the cpu_dying_mask.
Powered by blists - more mailing lists