linux-kernel - Re: [tip:sched/eevdf] [sched/fair] e0c2ff903c: phoronix-test-suite.blogbench.Write.final

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZN7EyZ+BLfjEJTWJ@chenyu5-mobl2>
Date:   Fri, 18 Aug 2023 09:09:29 +0800
From:   Chen Yu <yu.c.chen@...el.com>
To:     Peter Zijlstra <peterz@...radead.org>
CC:     Mike Galbraith <umgwanakikbuti@...il.com>,
        kernel test robot <oliver.sang@...el.com>,
        <oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
        <linux-kernel@...r.kernel.org>, <x86@...nel.org>,
        Ingo Molnar <mingo@...nel.org>
Subject: Re: [tip:sched/eevdf] [sched/fair]  e0c2ff903c:
 phoronix-test-suite.blogbench.Write.final_score -34.8% regression

On 2023-08-16 at 15:40:59 +0200, Peter Zijlstra wrote:
> On Wed, Aug 16, 2023 at 02:37:16PM +0200, Peter Zijlstra wrote:
> > On Mon, Aug 14, 2023 at 08:32:55PM +0200, Mike Galbraith wrote:
> > 
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -875,6 +875,12 @@ static struct sched_entity *pick_eevdf(s
> > >  	if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
> > >  		curr = NULL;
> > >  
> > > +	/*
> > > +	 * Once selected, run the task to parity to avoid overscheduling.
> > > +	 */
> > > +	if (sched_feat(RUN_TO_PARITY) && curr)
> > > +		return curr;
> > > +
> > >  	while (node) {
> > >  		struct sched_entity *se = __node_2_se(node);
> > >  
> > 
> > So I read it wrong last night... but I rather like this idea. But
> > there's something missing. When curr starts a new slice it should
> > probably do a full repick and not stick with it.
> > 
> > Let me poke at this a bit.. nice
> 
> Something like so.. it shouldn't matter much now, but might make a
> difference once we start mixing different slice lengths.
> 
> ---
>  kernel/sched/fair.c     | 12 ++++++++++++
>  kernel/sched/features.h |  1 +
>  2 files changed, 13 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index fe5be91c71c7..128a78f3f264 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -873,6 +873,13 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
>  	if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
>  		curr = NULL;
>  
> +	/*
> +	 * Once selected, run a task until it either becomes non-eligible or
> +	 * until it gets a new slice. See the HACK in set_next_entity().
> +	 */
> +	if (sched_feat(RUN_TO_PARITY) && curr && curr->vlag == curr->deadline)
> +		return curr;
> +
>  	while (node) {
>  		struct sched_entity *se = __node_2_se(node);
>  
> @@ -5168,6 +5175,11 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  		update_stats_wait_end_fair(cfs_rq, se);
>  		__dequeue_entity(cfs_rq, se);
>  		update_load_avg(cfs_rq, se, UPDATE_TG);
> +		/*
> +		 * HACK, stash a copy of deadline at the point of pick in vlag,
> +		 * which isn't used until dequeue.
> +		 */
> +		se->vlag = se->deadline;
>  	}
>  
>  	update_stats_curr_start(cfs_rq, se);
> diff --git a/kernel/sched/features.h b/kernel/sched/features.h
> index 61bcbf5e46a4..f770168230ae 100644
> --- a/kernel/sched/features.h
> +++ b/kernel/sched/features.h
> @@ -6,6 +6,7 @@
>   */
>  SCHED_FEAT(PLACE_LAG, true)
>  SCHED_FEAT(PLACE_DEADLINE_INITIAL, true)
> +SCHED_FEAT(RUN_TO_PARITY, true)
>  
>  /*
>   * Prefer to schedule the task we woke last (assuming it failed

I had a test on top of this, and it restored some performance:

uname -r
6.5.0-rc2-mixed-slice-run-parity

echo NO_RUN_TO_PARITY > /sys/kernel/debug/sched/features 
hackbench -g 112 -f 20 --threads -l 30000 -s 100
Running in threaded mode with 112 groups using 40 file descriptors each (== 4480 tasks)
Each sender will pass 30000 messages of 100 bytes
Time: 118.220  <---


echo RUN_TO_PARITY > /sys/kernel/debug/sched/features 
hackbench -g 112 -f 20 --threads -l 30000 -s 100
Running in threaded mode with 112 groups using 40 file descriptors each (== 4480 tasks)
Each sender will pass 30000 messages of 100 bytes
Time: 100.873  <---

Then I switched to the first commit of EEVDF, that is to
introduce the avg_runtime for cfs_rq(but not use it yet)
It seems that there is still 50% performance to restored:

uname -r
6.5.0-rc2-add-cfs-avruntime

hackbench -g 112 -f 20 --threads -l 30000 -s 100
Running in threaded mode with 112 groups using 40 file descriptors each (== 4480 tasks)
Each sender will pass 30000 messages of 100 bytes
Time: 52.569  <---

I'll collect the statistic to check what's remainning there.

[Besides, with the $subject benchmark, I tested it with PLACE_DEADLINE_INITIAL
diabled, and it did not show much difference. I'll work with lkp
to test it with RUN_TO_PARITY enabled.]

thanks,
Chenyu

thanks,
Chenyu