[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c15e2f07-5a0d-4e48-b7f4-83e4689f9299@126.com>
Date: Fri, 27 Sep 2024 21:38:53 +0800
From: Honglei Wang <jameshongleiwang@....com>
To: Chen Yu <yu.c.chen@...el.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Valentin Schneider <vschneid@...hat.com>,
Chunxin Zang <zangchunxin@...iang.com>, linux-kernel@...r.kernel.org,
Chen Yu <yu.chen.surf@...il.com>, kernel test robot <oliver.sang@...el.com>,
K Prateek Nayak <kprateek.nayak@....com>
Subject: Re: [PATCH v2] sched/eevdf: Fix wakeup-preempt by checking
cfs_rq->nr_running
On 2024/9/27 14:56, Chen Yu wrote:
> Hello Honglei,
>
> On 2024-09-27 at 09:54:28 +0800, Honglei Wang wrote:
>>
>>
>> On 2024/9/25 16:54, Chen Yu wrote:
>>> Commit 85e511df3cec ("sched/eevdf: Allow shorter slices to wakeup-preempt")
>>> introduced a mechanism that a wakee with shorter slice could preempt
>>> the current running task. It also lower the bar for the current task
>>> to be preempted, by checking the rq->nr_running instead of cfs_rq->nr_running
>>> when the current task has ran out of time slice. But there is a scenario
>>> that is problematic. Say, if there is 1 cfs task and 1 rt task, before
>>> 85e511df3cec, update_deadline() will not trigger a reschedule, and after
>>> 85e511df3cec, since rq->nr_running is 2 and resched is true, a resched_curr()
>>> would happen.
>>>
>>> Some workloads (like the hackbench reported by lkp) do not like
>>> over-scheduling. We can see that the preemption rate has been
>>> increased by 2.2%:
>>>
>>> 1.654e+08 +2.2% 1.69e+08 hackbench.time.involuntary_context_switches
>>>
>>> Restore its previous check criterion.
>>>
>>> Fixes: 85e511df3cec ("sched/eevdf: Allow shorter slices to wakeup-preempt")
>>> Reported-by: kernel test robot <oliver.sang@...el.com>
>>> Closes: https://lore.kernel.org/oe-lkp/202409231416.9403c2e9-oliver.sang@intel.com
>>> Suggested-by: K Prateek Nayak <kprateek.nayak@....com>
>>> Tested-by: K Prateek Nayak <kprateek.nayak@....com>
>>> Signed-off-by: Chen Yu <yu.c.chen@...el.com>
>>> ---
>>> v1->v2:
>>> Check cfs_rq->nr_running instead of rq->nr_running(K Prateek Nayak)
>>> ---
>>> kernel/sched/fair.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 225b31aaee55..53a351b18740 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -1247,7 +1247,7 @@ static void update_curr(struct cfs_rq *cfs_rq)
>>> account_cfs_rq_runtime(cfs_rq, delta_exec);
>>> - if (rq->nr_running == 1)
>>> + if (cfs_rq->nr_running == 1)
>>> return;
>> Hi Yu,
>>
>> I'm wondering if commit 85e511df3cec wants to give more chances to do
>> resched just in case there are 'short slices' tasks ready in the other cfs
>> hierarchy.
>> Does something like rq->cfs->nr_running == 1 make more sense? But
>> maybe it helps less than 'cfs_rq->nr_running == 1' in this hackbench case.
>>
>
> Thanks for taking a look.
>
> It could be possible that Peter wanted the short tasks to preempt other quickly.
> If I understand correctly, when we say preemption, we usually consider two
> tasks which are in the same cfs_rq(level). For example, check_preempt_wakeup_fair()
> iterates the hierarchy from down-up until the current task and the wakee are in the
> same level via find_matching_se(&se, &pse), then check if the wakee can preempt the
> current. This should be consistent with the tick preemption in update_curr(). And
> whether the short task should preempt the current is checked by
> update_curr() -> did_preempt_short(), rather than checking the cfs_rq->nr_running/nr_h_running
> I suppose.
>
Hi Yu,
Yep, I understand the preemption should happen in the same cfs level.
Just not sure the purpose of the 'nr_running check' stuff. Perhaps its
role is just to judge whether it’s necessary to do the preemption check.
If there is at least one more ready (cfs) task in the rq and current is
not eligible, we take care of the waiting tasks. Thoughts?
Thanks,
Honglei
> Thanks,
> Chenyu
Powered by blists - more mailing lists