[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZN7EyZ+BLfjEJTWJ@chenyu5-mobl2>
Date: Fri, 18 Aug 2023 09:09:29 +0800
From: Chen Yu <yu.c.chen@...el.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: Mike Galbraith <umgwanakikbuti@...il.com>,
kernel test robot <oliver.sang@...el.com>,
<oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
<linux-kernel@...r.kernel.org>, <x86@...nel.org>,
Ingo Molnar <mingo@...nel.org>
Subject: Re: [tip:sched/eevdf] [sched/fair] e0c2ff903c:
phoronix-test-suite.blogbench.Write.final_score -34.8% regression
On 2023-08-16 at 15:40:59 +0200, Peter Zijlstra wrote:
> On Wed, Aug 16, 2023 at 02:37:16PM +0200, Peter Zijlstra wrote:
> > On Mon, Aug 14, 2023 at 08:32:55PM +0200, Mike Galbraith wrote:
> >
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -875,6 +875,12 @@ static struct sched_entity *pick_eevdf(s
> > > if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
> > > curr = NULL;
> > >
> > > + /*
> > > + * Once selected, run the task to parity to avoid overscheduling.
> > > + */
> > > + if (sched_feat(RUN_TO_PARITY) && curr)
> > > + return curr;
> > > +
> > > while (node) {
> > > struct sched_entity *se = __node_2_se(node);
> > >
> >
> > So I read it wrong last night... but I rather like this idea. But
> > there's something missing. When curr starts a new slice it should
> > probably do a full repick and not stick with it.
> >
> > Let me poke at this a bit.. nice
>
> Something like so.. it shouldn't matter much now, but might make a
> difference once we start mixing different slice lengths.
>
> ---
> kernel/sched/fair.c | 12 ++++++++++++
> kernel/sched/features.h | 1 +
> 2 files changed, 13 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index fe5be91c71c7..128a78f3f264 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -873,6 +873,13 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
> if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
> curr = NULL;
>
> + /*
> + * Once selected, run a task until it either becomes non-eligible or
> + * until it gets a new slice. See the HACK in set_next_entity().
> + */
> + if (sched_feat(RUN_TO_PARITY) && curr && curr->vlag == curr->deadline)
> + return curr;
> +
> while (node) {
> struct sched_entity *se = __node_2_se(node);
>
> @@ -5168,6 +5175,11 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
> update_stats_wait_end_fair(cfs_rq, se);
> __dequeue_entity(cfs_rq, se);
> update_load_avg(cfs_rq, se, UPDATE_TG);
> + /*
> + * HACK, stash a copy of deadline at the point of pick in vlag,
> + * which isn't used until dequeue.
> + */
> + se->vlag = se->deadline;
> }
>
> update_stats_curr_start(cfs_rq, se);
> diff --git a/kernel/sched/features.h b/kernel/sched/features.h
> index 61bcbf5e46a4..f770168230ae 100644
> --- a/kernel/sched/features.h
> +++ b/kernel/sched/features.h
> @@ -6,6 +6,7 @@
> */
> SCHED_FEAT(PLACE_LAG, true)
> SCHED_FEAT(PLACE_DEADLINE_INITIAL, true)
> +SCHED_FEAT(RUN_TO_PARITY, true)
>
> /*
> * Prefer to schedule the task we woke last (assuming it failed
I had a test on top of this, and it restored some performance:
uname -r
6.5.0-rc2-mixed-slice-run-parity
echo NO_RUN_TO_PARITY > /sys/kernel/debug/sched/features
hackbench -g 112 -f 20 --threads -l 30000 -s 100
Running in threaded mode with 112 groups using 40 file descriptors each (== 4480 tasks)
Each sender will pass 30000 messages of 100 bytes
Time: 118.220 <---
echo RUN_TO_PARITY > /sys/kernel/debug/sched/features
hackbench -g 112 -f 20 --threads -l 30000 -s 100
Running in threaded mode with 112 groups using 40 file descriptors each (== 4480 tasks)
Each sender will pass 30000 messages of 100 bytes
Time: 100.873 <---
Then I switched to the first commit of EEVDF, that is to
introduce the avg_runtime for cfs_rq(but not use it yet)
It seems that there is still 50% performance to restored:
uname -r
6.5.0-rc2-add-cfs-avruntime
hackbench -g 112 -f 20 --threads -l 30000 -s 100
Running in threaded mode with 112 groups using 40 file descriptors each (== 4480 tasks)
Each sender will pass 30000 messages of 100 bytes
Time: 52.569 <---
I'll collect the statistic to check what's remainning there.
[Besides, with the $subject benchmark, I tested it with PLACE_DEADLINE_INITIAL
diabled, and it did not show much difference. I'll work with lkp
to test it with RUN_TO_PARITY enabled.]
thanks,
Chenyu
thanks,
Chenyu
Powered by blists - more mailing lists