linux-kernel - Re: [PATCH] sched/core: An optimization of pick_next

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YRqz93crZIS1Mvmy@hirez.programming.kicks-ass.net>
Date:   Mon, 16 Aug 2021 20:52:39 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Tao Zhou <tao.zhou@...ux.dev>
Cc:     linux-kernel@...r.kernel.org, tglx@...utronix.de,
        joel@...lfernandes.org, chris.hyser@...cle.com, joshdon@...gle.com,
        mingo@...nel.org, vincent.guittot@...aro.org,
        valentin.schneider@....com, mgorman@...e.de
Subject: Re: [PATCH] sched/core: An optimization of pick_next_task() not sure

On Mon, Aug 16, 2021 at 11:44:01PM +0800, Tao Zhou wrote:
> When find a new candidate max, wipe the stale and start over.
> Goto again: and use the new max to loop to pick the the task.
> 
> Here first want to get the max of the core and use this new
> max to loop once to pick the task on each thread.
> 
> Not sure this is an optimization and just stop here a little
> and move on..
> 

Did you find this retry was an issue on your workload? Or was this from
reading the source?

> ---
>  kernel/sched/core.c | 52 +++++++++++++++++----------------------------
>  1 file changed, 20 insertions(+), 32 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 20ffcc044134..bddcd328df96 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5403,7 +5403,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
>  	const struct sched_class *class;
>  	const struct cpumask *smt_mask;
>  	bool fi_before = false;
> -	int i, j, cpu, occ = 0;
> +	int i, cpu, occ = 0;
>  	bool need_sync;
>  
>  	if (!sched_core_enabled(rq))
> @@ -5508,11 +5508,27 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
>  	 * order.
>  	 */
>  	for_each_class(class) {
> -again:
> +		struct rq *rq_i;
> +		struct task_struct *p;
> +
>  		for_each_cpu_wrap(i, smt_mask, cpu) {
> -			struct rq *rq_i = cpu_rq(i);
> -			struct task_struct *p;
> +			rq_i = cpu_rq(i);
> +			p = pick_task(rq_i, class, max, fi_before);
> +			/*
> +			 * If this new candidate is of higher priority than the
> +			 * previous; and they're incompatible; pick_task makes
> +			 * sure that p's priority is more than max if it doesn't
> +			 * match max's cookie. Update max.
> +			 *
> +			 * NOTE: this is a linear max-filter and is thus bounded
> +			 * in execution time.
> +			 */
> +			if (!max || !cookie_match(max, p))
> +				max = p;
> +		}
>  
> +		for_each_cpu_wrap(i, smt_mask, cpu) {
> +			rq_i = cpu_rq(i);
>  			if (rq_i->core_pick)
>  				continue;
>  

This now calls pick_task() twice for each CPU, which seems unfortunate;
perhaps add q->core_temp storage to cache that result. Also, since the
first iteration is now explicitly about the max filter, perhaps we
shouuld move that part of pick_task() into the loop and simplify things
further?