[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZmJylkNFg7EFgPmZ@chenyu5-mobl2>
Date: Fri, 7 Jun 2024 10:38:14 +0800
From: Chen Yu <yu.c.chen@...el.com>
To: Chunxin Zang <spring.cxz@...il.com>
CC: K Prateek Nayak <kprateek.nayak@....com>, <mingo@...hat.com>, "Peter
Zijlstra" <peterz@...radead.org>, <juri.lelli@...hat.com>,
<vincent.guittot@...aro.org>, <dietmar.eggemann@....com>,
<rostedt@...dmis.org>, <bsegall@...gle.com>, <mgorman@...e.de>,
<bristot@...hat.com>, <vschneid@...hat.com>, <linux-kernel@...r.kernel.org>,
<yangchen11@...iang.com>, Jerry Zhou <zhouchunhua@...iang.com>, Chunxin Zang
<zangchunxin@...iang.com>, Balakumaran Kannan <kumaran.4353@...il.com>, "Mike
Galbraith" <efault@....de>
Subject: Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is
ineligible
On 2024-06-06 at 09:46:53 +0800, Chunxin Zang wrote:
>
>
> > On Jun 6, 2024, at 01:19, Chen Yu <yu.c.chen@...el.com> wrote:
> >
> >
> > Sorry for the late reply and thanks for help clarify this. Yes, this is
> > what my previous concern was:
> > 1. It does not consider the cgroup and does not check preemption in the same
> > level which is covered by find_matching_se().
> > 2. The if (!entity_eligible(cfs_rq, se)) for current is redundant because
> > later pick_eevdf() will check the eligible of current anyway. But
> > as pointed out by Chunxi, his concern is the double-traverse of the rb-tree,
> > I just wonder if we could leverage the cfs_rq->next to store the next
> > candidate, so it can be picked directly in the 2nd pick as a fast path?
> > Something like below untested:
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 8a5b1ae0aa55..f716646d595e 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8349,7 +8349,7 @@ static void set_next_buddy(struct sched_entity *se)
> > static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int wake_flags)
> > {
> > struct task_struct *curr = rq->curr;
> > - struct sched_entity *se = &curr->se, *pse = &p->se;
> > + struct sched_entity *se = &curr->se, *pse = &p->se, *next;
> > struct cfs_rq *cfs_rq = task_cfs_rq(curr);
> > int cse_is_idle, pse_is_idle;
> >
> > @@ -8415,7 +8415,11 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
> > /*
> > * XXX pick_eevdf(cfs_rq) != se ?
> > */
> > - if (pick_eevdf(cfs_rq) == pse)
> > + next = pick_eevdf(cfs_rq);
> > + if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK) && next)
> > + set_next_buddy(next);
> > +
> > + if (next == pse)
> > goto preempt;
> >
> > return;
> >
> >
> > thanks,
> > Chenyu
>
> Hi Chen
>
> First of all, thank you for your patient response. Regarding the issue of avoiding traversing
> the RB-tree twice, I initially had two methods in mind.
> 1. Cache the optimal result so that it can be used directly during the second pick_eevdf operation.
> This idea is similar to the one you proposed this time.
> 2. Avoid the pick_eevdf operation as much as possible within 'check_preempt_wakeup_fair.'
> Because I believe that 'checking whether preemption is necessary' and 'finding the optimal
> process to schedule' are two different things.
I agree, and it seems that in current eevdf implementation the former relies on the latter.
> 'check_preempt_wakeup_fair' is not just to
> check if the newly awakened process should preempt the current process; it can also serve
> as an opportunity to check whether any other processes should preempt the current one,
> thereby improving the real-time performance of the scheduler. Although now in pick_eevdf,
> the legitimacy of 'curr' is also evaluated, if the result returned is not the awakened process,
> then the current process will still not be preempted.
I thought Mike has proposed a patch to deal with this scenario you mentioned above:
https://lore.kernel.org/lkml/e17d3d90440997b970067fe9eaf088903c65f41d.camel@gmx.de/
And I suppose you are refering to increase the preemption chance on current rather than reducing
the invoke of pick_eevdf() in check_preempt_wakeup_fair().
> Therefore, I posted the v2 PATCH.
> The implementation of v2 PATCH might express this point more clearly.
> https://lore.kernel.org/lkml/20240529141806.16029-1-spring.cxz@gmail.com/T/
>
Let me take a look at it and do some tests.
> I previously implemented and tested both of these methods, and the test results showed that
> method 2 had somewhat more obvious benefits. Therefore, I submitted method 2. Now that I
> think about it, perhaps method 1 could also be viable at the same time. :)
>
Actually I found that, even without any changes, if we enabled sched feature NEXT_BUDDY, the
wakeup latency/request latency are both reduced. The following is the schbench result on a
240 CPUs system:
NO_NEXT_BUDDY
Wakeup Latencies percentiles (usec) runtime 100 (s) (1698990 total samples)
50.0th: 6 (429125 samples)
90.0th: 14 (682355 samples)
* 99.0th: 29 (126695 samples)
99.9th: 529 (14603 samples)
min=1, max=4741
Request Latencies percentiles (usec) runtime 100 (s) (1702523 total samples)
50.0th: 14992 (550939 samples)
90.0th: 15376 (668687 samples)
* 99.0th: 15600 (128111 samples)
99.9th: 15888 (11238 samples)
min=3528, max=31677
RPS percentiles (requests) runtime 100 (s) (101 total samples)
20.0th: 16864 (31 samples)
* 50.0th: 16928 (26 samples)
90.0th: 17248 (36 samples)
min=16615, max=20041
average rps: 17025.23
NEXT_BUDDY
Wakeup Latencies percentiles (usec) runtime 100 (s) (1653564 total samples)
50.0th: 5 (376845 samples)
90.0th: 12 (632075 samples)
* 99.0th: 24 (114398 samples)
99.9th: 105 (13737 samples)
min=1, max=7428
Request Latencies percentiles (usec) runtime 100 (s) (1657268 total samples)
50.0th: 14480 (524763 samples)
90.0th: 15216 (647982 samples)
* 99.0th: 15472 (130730 samples)
99.9th: 15728 (13980 samples)
min=3542, max=34805
RPS percentiles (requests) runtime 100 (s) (101 total samples)
20.0th: 16544 (62 samples)
* 50.0th: 16544 (0 samples)
90.0th: 16608 (37 samples)
min=16470, max=16648
average rps: 16572.68
So I think NEXT_BUDDY has more or less reduced the rb-tree scan.
thanks,
Chenyu
Powered by blists - more mailing lists