linux-kernel - Re: [PATCH v2] sched/fair: Reschedule the cfs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <2E6EB0D6-D913-4205-B7DD-35EF4FA25667@gmail.com>
Date: Wed, 12 Jun 2024 18:39:11 +0800
From: Chunxin Zang <spring.cxz@...il.com>
To: Chen Yu <yu.c.chen@...el.com>,
 Honglei Wang <jameshongleiwang@....com>
Cc: mingo@...hat.com,
 Peter Zijlstra <peterz@...radead.org>,
 juri.lelli@...hat.com,
 vincent.guittot@...aro.org,
 dietmar.eggemann@....com,
 rostedt@...dmis.org,
 bsegall@...gle.com,
 mgorman@...e.de,
 bristot@...hat.com,
 vschneid@...hat.com,
 Honglei Wang <jameshongleiwang@....com>,
 Mike Galbraith <efault@....de>,
 K Prateek Nayak <kprateek.nayak@....com>,
 linux-kernel@...r.kernel.org,
 yangchen11@...iang.com,
 Jerry Zhou <zhouchunhua@...iang.com>,
 Chunxin Zang <zangchunxin@...iang.com>
Subject: Re: [PATCH v2] sched/fair: Reschedule the cfs_rq when current is
 ineligible



> On Jun 7, 2024, at 13:07, Chen Yu <yu.c.chen@...el.com> wrote:
> 
> On 2024-05-29 at 22:18:06 +0800, Chunxin Zang wrote:
>> I found that some tasks have been running for a long enough time and
>> have become illegal, but they are still not releasing the CPU. This
>> will increase the scheduling delay of other processes. Therefore, I
>> tried checking the current process in wakeup_preempt and entity_tick,
>> and if it is illegal, reschedule that cfs queue.
>> 
>> When RUN_TO_PARITY is enabled, its behavior essentially remains
>> consistent with the original process. When NO_RUN_TO_PARITY is enabled,
>> some additional preemptions will be introduced, but not too many.
>> 
>> I have pasted some test results below.
>> I isolated four cores for testing and ran hackbench in the background,
>> and observed the test results of cyclictest.
>> 
>> hackbench -g 4 -l 100000000 &
>> cyclictest --mlockall -D 5m -q
>> 
>>                                 EEVDF      PATCH  EEVDF-NO_PARITY  PATCH-NO_PARITY
>> 
>>                # Min Latencies: 00006      00006      00006      00006
>>  LNICE(-19)    # Avg Latencies: 00191      00133      00089      00066
>>                # Max Latencies: 15442      08466      14133      07713
>> 
>>                # Min Latencies: 00006      00010      00006      00006
>>  LNICE(0)      # Avg Latencies: 00466      00326      00289      00257
>>                # Max Latencies: 38917      13945      32665      17710
>> 
>>                # Min Latencies: 00019      00053      00010      00013
>>  LNICE(19)     # Avg Latencies: 37151      25852      18293      23035
>>                # Max Latencies: 2688299    4643635    426196     425708
>> 
>> I captured and compared the number of preempt occurrences in wakeup_preempt
>> to see if it introduced any additional overhead.
>> 
>> Similarly, hackbench is used to stress the utilization of four cores to
>> 100%, and the method for capturing the number of PREEMPT occurrences is
>> referenced from [1].
>> 
>> schedstats                          EEVDF       PATCH   EEVDF-NO_PARITY  PATCH-NO_PARITY  CFS(6.5)
>> .stats.check_preempt_count          5053054     5045388    5018589    5029585
>> .stats.patch_preempt_count          -------     0020495    -------    0700670    -------
>> .stats.need_preempt_count           0570520     0458947    3380513    3116966    1140821
>> 
>> From the above test results, there is a slight increase in the number of
>> preempt occurrences in wakeup_preempt. However, the results vary with each
>> test, and sometimes the difference is not that significant.
>> 
>> [1]: https://lore.kernel.org/all/20230816134059.GC982867@hirez.programming.kicks-ass.net/T/#m52057282ceb6203318be1ce9f835363de3bef5cb
>> 
>> Signed-off-by: Chunxin Zang <zangchunxin@...iang.com>
>> Reviewed-by: Chen Yang <yangchen11@...iang.com>
>> 
>> ------
>> Changes in v2:
>> - Make the logic that determines the current process as ineligible and
>>  triggers preemption effective only when NO_RUN_TO_PARITY is enabled.
>> - Update the commit message
>> ---
>> kernel/sched/fair.c | 17 +++++++++++++++++
>> 1 file changed, 17 insertions(+)
>> 
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 03be0d1330a6..fa2c512139e5 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -745,6 +745,17 @@ int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se)
>> return vruntime_eligible(cfs_rq, se->vruntime);
>> }
>> 
>> +static bool check_entity_need_preempt(struct cfs_rq *cfs_rq, struct sched_entity *se)
>> +{
>> + if (sched_feat(RUN_TO_PARITY) && se->vlag != se->deadline)
>> + return true;
> 
> If I understand correctly, here it intends to check if the current se
> has consumed its 1st slice after been picked at set_next_entity(), and if yes do a reschedule.
> check_entity_need_preempt() is added at the end of entity_tick(), which could overwrite
> the police to reschedule current: (entity_tick()->update_curr()->update_deadline()), only there
> are more than 1 runnable tasks will the current be preempted, even if it has expired the 1st
> requested slice.
> 

The purpose of the modification is to increase preemption opportunities without breaking the
RUN_TO_PARITY rule. However, it clearly introduces some additional preemptions, or perhaps
there should be a check for the eligibility of the se. Also, to avoid overwriting the scheduling
strategy in entity_tick, would a modification like the following be more appropriate?

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 03be0d1330a6..5e49a15bbdd3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -745,6 +745,21 @@ int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se)
        return vruntime_eligible(cfs_rq, se->vruntime);
 }

+static bool check_entity_need_preempt(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+       if (cfs_rq->nr_running <= 1)
+               return false;
+
+       if (sched_feat(RUN_TO_PARITY) && se->vlag != se->deadline
+                                     && !entity_eligible(cfs_rq, se))
+               return true;
+
+       if (!sched_feat(RUN_TO_PARITY) && !entity_eligible(cfs_rq, se))
+               return true;
+
+       return false;
+}
+
 static u64 __update_min_vruntime(struct cfs_rq *cfs_rq, u64 vruntime)
 {
        u64 min_vruntime = cfs_rq->min_vruntime;
@@ -974,11 +989,13 @@ static void clear_buddies(struct cfs_rq *cfs_rq, struct sched_entity *se);
 /*
  * XXX: strictly: vd_i += N*r_i/w_i such that: vd_i > ve_i
  * this is probably good enough.
+ *
+ * return true if se need preempt
  */
-static void update_deadline(struct cfs_rq *cfs_rq, struct sched_entity *se)
+static bool update_deadline(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
        if ((s64)(se->vruntime - se->deadline) < 0)
-               return;
+               return false;

        /*
         * For EEVDF the virtual time slope is determined by w_i (iow.
@@ -995,10 +1012,7 @@ static void update_deadline(struct cfs_rq *cfs_rq, struct sched_entity *se)
        /*
         * The task has consumed its request, reschedule.
         */
-       if (cfs_rq->nr_running > 1) {
-               resched_curr(rq_of(cfs_rq));
-               clear_buddies(cfs_rq, se);
-       }
+       return true;
 }

 #include "pelt.h"
@@ -1157,6 +1171,7 @@ static void update_curr(struct cfs_rq *cfs_rq)
 {
        struct sched_entity *curr = cfs_rq->curr;
        s64 delta_exec;
+       bool need_preempt = false;

        if (unlikely(!curr))
                return;
@@ -1166,12 +1181,17 @@ static void update_curr(struct cfs_rq *cfs_rq)
                return;

        curr->vruntime += calc_delta_fair(delta_exec, curr);
-       update_deadline(cfs_rq, curr);
+       need_preempt = update_deadline(cfs_rq, curr);
        update_min_vruntime(cfs_rq);

        if (entity_is_task(curr))
                update_curr_task(task_of(curr), delta_exec);

+       if (need_preempt || check_entity_need_preempt(cfs_rq, curr)) {
+               resched_curr(rq_of(cfs_rq));
+               clear_buddies(cfs_rq, curr);
+       }
+
        account_cfs_rq_runtime(cfs_rq, delta_exec);
 }



>> +
>> + if (!sched_feat(RUN_TO_PARITY) && !entity_eligible(cfs_rq, se))
>> + return true;
>> +
>> + return false;
>> +}
>> +
>> static u64 __update_min_vruntime(struct cfs_rq *cfs_rq, u64 vruntime)
>> {
>> u64 min_vruntime = cfs_rq->min_vruntime;
>> @@ -5523,6 +5534,9 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
>> hrtimer_active(&rq_of(cfs_rq)->hrtick_timer))
>> return;
>> #endif
>> +
>> + if (check_entity_need_preempt(cfs_rq, curr))
>> + resched_curr(rq_of(cfs_rq));
>> }
>> 
>> 
>> @@ -8343,6 +8357,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>> cfs_rq = cfs_rq_of(se);
>> update_curr(cfs_rq);
>> 
>> + if (check_entity_need_preempt(cfs_rq, se))
>> + goto preempt;
>> +
> 
> As we changes the preemption policy for current in two places, the tick preemption and wakeup preemption,
> do you have statistics that shows which one brings the most benefit?

This modification no longer involves both wakeup and tick but is consolidated in 'update_curr', and it completes
the preemption decision along with 'update_deadline'. This approach seems more elegant and achieves the
same performance benefits as before.

thanks 
Chunxin

> 
> thanks,
> Chenyu