[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADjb_WRBu8+zhO3ynHE4gferJhvZ=o9Gq1_rUisz143ivjkS0g@mail.gmail.com>
Date: Tue, 24 Sep 2024 21:10:17 +0800
From: Chen Yu <yu.chen.surf@...il.com>
To: K Prateek Nayak <kprateek.nayak@....com>
Cc: Chen Yu <yu.c.chen@...el.com>, Peter Zijlstra <peterz@...radead.org>,
Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>,
Valentin Schneider <vschneid@...hat.com>, Chunxin Zang <zangchunxin@...iang.com>,
linux-kernel@...r.kernel.org, Oliver Sang <oliver.sang@...el.com>,
Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: [PATCH] sched/eevdf: Fix wakeup-preempt by checking cfs_rq->nr_running
Hello Prateek,
On Tue, Sep 24, 2024 at 6:28 PM K Prateek Nayak <kprateek.nayak@....com> wrote:
>
> Hello Chenyu,
>
> On 9/23/2024 12:51 PM, Chen Yu wrote:
> > Commit 85e511df3cec ("sched/eevdf: Allow shorter slices to wakeup-preempt")
> > introduced a mechanism that a wakee with shorter slice could preempt
> > the current running task. It also lower the bar for the current task
> > to be preempted, by checking the rq->nr_running instead of cfs_rq->nr_running
> > when the current task has ran out of time slice. Say, if there is 1 cfs
> > task and 1 rt task, before 85e511df3cec, update_deadline() will
> > not trigger a reschedule, and after 85e511df3cec, since rq->nr_running
> > is 2 and resched is true, a resched_curr() would happen.
> >
> > Some workloads (like the hackbench reported by lkp) do not like
> > over-scheduling. We can see that the preemption rate has been
> > increased by 2.2%:
> >
> > 1.654e+08 +2.2% 1.69e+08 hackbench.time.involuntary_context_switches
> >
> > Restore its previous check criterion.
> >
> > Fixes: 85e511df3cec ("sched/eevdf: Allow shorter slices to wakeup-preempt")
> > Reported-by: kernel test robot <oliver.sang@...el.com>
> > Closes: https://lore.kernel.org/oe-lkp/202409231416.9403c2e9-oliver.sang@intel.com
> > Signed-off-by: Chen Yu <yu.c.chen@...el.com>
>
> Gave it a spin on my dual socket 3rd Generation EPYC System and I do not
> as big a jump in hackbench numbers as Oliver reported, most likely
> because I couldn't emulate the exact scenario where a fair task is
> running in presence of an RT task queued.
Actually I did not dig into Oliver's test scenario. While looking at the commit
I thought that there could be scenario where a rt and a cfs task together might
make a difference in preemption which caused this problem.
> Following are numbers from my
> testing:
>
> ==================================================================
> Test : hackbench
> Units : Normalized time in seconds
> Interpretation: Lower is better
> Statistic : AMean
> ==================================================================
> Case: tip[pct imp](CV) preempt-fix[pct imp](CV)
> 1-groups 1.00 [ -0.00]( 2.60) 1.00 [ 0.17]( 2.12)
> 2-groups 1.00 [ -0.00]( 1.21) 0.98 [ 2.05]( 0.95)
> 4-groups 1.00 [ -0.00]( 1.63) 0.97 [ 2.65]( 1.53)
> 8-groups 1.00 [ -0.00]( 1.34) 0.99 [ 0.81]( 1.33)
> 16-groups 1.00 [ -0.00]( 2.07) 0.98 [ 2.31]( 1.09)
> --
>
> Feel free to include:
>
> Tested-by: K Prateek Nayak <kprateek.nayak@....com>
>
Thanks for the test!
> > ---
> > kernel/sched/fair.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 225b31aaee55..2859fc7e2da2 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -1025,7 +1025,7 @@ static bool update_deadline(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > /*
> > * The task has consumed its request, reschedule.
> > */
> > - return true;
> > + return (cfs_rq->nr_running > 1);
>
> Was there a strong reason why Peter decided to use "rq->nr_running"
> instead of "cfs_rq->nr_running" with PREEMPT_SHORT in update_curr()?
>
> I wonder if it was to force a pick_next_task() cycle to dequeue a
> possibly delayed entity
> but AFAICT, "cfs_rq->nr_running" should
> account for the delayed entity still on the cfs_rq and perhaps the
> early return in update_curr() can just be changed to use
> "cfs_rq->nr_running". Not sure if I'm missing something trivial.
>
85e511df3cec changes
if (cfs_rq->nr_running > 1) resched
to
if (rq->nr_running == 1) not_resched
which does lower the bar to trigger resched
Yes, I think your proposal make sense, the resched should only
be triggered between 2 cfs tasks,
and the restore to update_deadline() is not needed, something like below
in update_curr() could also work:
if (cfs_rq->nr_running == 1)
return;
thanks,
Chenyu
Powered by blists - more mailing lists