linux-kernel - Re: [RFC PATCH v3] sched/fair: select idle cpu from idle cpumask for task wakeup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <jhjimamz1dv.mognet@arm.com>
Date:   Tue, 03 Nov 2020 19:27:56 +0000
From:   Valentin Schneider <valentin.schneider@....com>
To:     Aubrey Li <aubrey.li@...ux.intel.com>
Cc:     mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        tim.c.chen@...ux.intel.com, linux-kernel@...r.kernel.org,
        Aubrey Li <aubrey.li@...el.com>,
        Qais Yousef <qais.yousef@....com>,
        Jiang Biao <benbjiang@...il.com>
Subject: Re: [RFC PATCH v3] sched/fair: select idle cpu from idle cpumask for task wakeup

Hi,

On 21/10/20 16:03, Aubrey Li wrote:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6b3b59cc51d6..088d1995594f 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6023,6 +6023,38 @@ void __update_idle_core(struct rq *rq)
>       rcu_read_unlock();
>  }
>
> +static DEFINE_PER_CPU(bool, cpu_idle_state);

I would've expected this to be far less compact than a cpumask, but that's
not the story readelf is telling me. Objdump tells me this is recouping
some of the padding in .data..percpu, at least with the arm64 defconfig.

In any case this ought to be better wrt cacheline bouncing, which I suppose
is what we ultimately want here.

Also, see rambling about init value below.

> @@ -10070,6 +10107,12 @@ static void nohz_balancer_kick(struct rq *rq)
>       if (unlikely(rq->idle_balance))
>               return;
>
> +	/* The CPU is not in idle, update idle cpumask */
> +	if (unlikely(sched_idle_cpu(cpu))) {
> +		/* Allow SCHED_IDLE cpu as a wakeup target */
> +		update_idle_cpumask(rq, true);
> +	} else
> +		update_idle_cpumask(rq, false);

This means that without CONFIG_NO_HZ_COMMON, a CPU going into idle will
never be accounted as going out of it, right? Eventually the cpumask
should end up full, which conceptually implements the previous behaviour of
select_idle_cpu() but in a fairly roundabout way...

> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 9079d865a935..f14a6ef4de57 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1407,6 +1407,7 @@ sd_init(struct sched_domain_topology_level *tl,
>               sd->shared = *per_cpu_ptr(sdd->sds, sd_id);
>               atomic_inc(&sd->shared->ref);
>               atomic_set(&sd->shared->nr_busy_cpus, sd_weight);
> +		cpumask_copy(sds_idle_cpus(sd->shared), sched_domain_span(sd));

So at init you would have (single LLC for sake of simplicity):

  \all cpu : cpu_idle_state[cpu]  == false
  cpumask_full(sds_idle_cpus)     == true

IOW it'll require all CPUs to go idle at some point for these two states to
be properly aligned. Should cpu_idle_state not then be init'd to 1?

This then happens again for hotplug, except that cpu_idle_state[cpu] may be
either true or false when the sds_idle_cpus mask is reset to 1's.