linux-kernel - Re: [PATCH] sched: Forward deadline for early tick

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAHb70uz8SZ49MSxyXvZfd-5AF7Y9KuUYbcKaZ_mbXD2zgu_0Yw@mail.gmail.com>
Date: Tue, 10 Dec 2024 17:22:46 +0800
From: zihan zhou <15645113830zzh@...il.com>
To: peterz@...radead.org
Cc: bsegall@...gle.com, dietmar.eggemann@....com, juri.lelli@...hat.com, 
	linux-kernel@...r.kernel.org, mgorman@...e.de, mingo@...hat.com, 
	rostedt@...dmis.org, vincent.guittot@...aro.org, vschneid@...hat.com, 
	yaozhenguo@...com, zhouzihan30@...com
Subject: Re: [PATCH] sched: Forward deadline for early tick

Hello, may I ask what you think of my latest patch
(https://lore.kernel.org/all/20241127070534.62903-1-zhouzihan30@jd.com/)?
 If there are any shortcomings here, I would be happy to continue
making revisions.
(I'm sorry, there seems to be a formatting issue with the email I
sent, so I sent it again)

zihan zhou <15645113830zzh@...il.com> 于2024年12月10日周二 16:58写道：
>
> Hello, may I ask what you think of my latest patch (https://lore.kernel.org/all/20241127070534.62903-1-zhouzihan30@jd.com/)?  If there are any shortcomings here, I would be happy to continue making revisions.
>
> zhouzihan30 <15645113830zzh@...il.com> 于2024年11月27日周三 15:06写道：
>>
>> From: zhouzihan <zhouzihan30@...com>
>>
>> Thank you very much for your reply, which has brought me lots
>>  of thoughts.
>>
>> I have reconsidered this issue and believe that the root cause is that
>>  the kernel is difficult and unnecessary to implement an ideal eevdf
>>  due to real hardware:
>> for example,
>> an ideal eevdf may bring frequent switching, its cost makes kernel can't
>>  really do it.
>>
>> So I see that the kernel has a very concise and clever implementation:
>>  update_deadline, which allows each task to use up the request size
>>  as much as possible in one go.
>>
>> Here, the kernel has actually slightly violated eevdf: we are no longer
>>  concerned about whether a task is eligible for the time being.
>>
>> In the prev patch, it was mentioned that due to tick errors, some tasks
>>  run longer than requested. So if we can do this: when a task vruntime
>>  approaches the deadline, we check if the task is eligible.
>> If it is eligible, we follow the previous logic and do not schedule it.
>> However, if it is ineligible, we schedule it because eevdf has the
>>  responsibility to not exec ineligible task.
>>
>> In other words, the kernel has given the task a "benefit" based on the
>>  actual situation, and we still have the right to revoke this benefit.
>>
>> In this way, it actually brings the kernel closer to an ideal eevdf,
>> and at the same time, your reply made me realize my mistake:
>> The deadline update should be updated using the following function,
>> which is more reasonable:
>>     vd_i += r_i / w_i
>> By using it, our scheduler is still fair,
>>  and each task can obtain its own time according to its weight.
>>
>> About tolerance, I think min(vslice/128, tick/2) is better,
>> as your reply, vslice maybe too big, so use it.
>>
>> However, there is still a new issue here:
>> If a se 1 terminates its deadline prematurely due to ineligible,
>> and then a se 2 runs, after the end of the run,
>> se 1 becomes eligible, but its deadline has already been pushed back,
>> which is not earliest eligible,
>> so se 1 can't run. This seems to not comply with eevdf specifications.
>>
>> But I think this is acceptable. In the past, the logic causes a task to
>>  run an extra tick (ms), which means other tasks have to wait for an
>>  extra tick. Now, a task maybe run less time (us), but it will be
>>  compensated for in the next scheduling. In terms of overall accuracy,
>>  I think it has improved.
>>
>> By the way, we may try to solve this by delaying the deadline update,
>> which means we schedule a task but not update its deadline,
>> util next exec it.
>> I am not sure if it is necessary to implement such complex logic here.
>> I think it is actually unnecessary,
>> because the ideal eevdf may not be suitable for the kernel.
>> It is no need to spend so much to solve the error of few time.
>> If there is a truly completely accurate system, it should not have
>>  tick error, and just closes the FORWARD_DEADLINE feature.
>> Of course, if you think it is necessary, I am willing try to implement it.
>>
>> Signed-off-by: zhouzihan30 <zhouzihan30@...com>
>> ---
>>  kernel/sched/fair.c     | 41 +++++++++++++++++++++++++++++++++++++----
>>  kernel/sched/features.h |  7 +++++++
>>  2 files changed, 44 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 2d16c8545c71..e6e58c7d6d4c 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -1006,8 +1006,10 @@ static void clear_buddies(struct cfs_rq *cfs_rq, struct sched_entity *se);
>>   */
>>  static bool update_deadline(struct cfs_rq *cfs_rq, struct sched_entity *se)
>>  {
>> -       if ((s64)(se->vruntime - se->deadline) < 0)
>> -               return false;
>> +
>> +       u64 vslice;
>> +       u64 tolerance = 0;
>> +       u64 next_deadline;
>>
>>         /*
>>          * For EEVDF the virtual time slope is determined by w_i (iow.
>> @@ -1016,11 +1018,42 @@ static bool update_deadline(struct cfs_rq *cfs_rq, struct sched_entity *se)
>>          */
>>         if (!se->custom_slice)
>>                 se->slice = sysctl_sched_base_slice;
>> +       vslice = calc_delta_fair(se->slice, se);
>> +
>> +       next_deadline = se->vruntime + vslice;
>> +
>> +       if (sched_feat(FORWARD_DEADLINE))
>> +               tolerance = min(vslice>>7, TICK_NSEC/2);
>> +
>> +       if ((s64)(se->vruntime + tolerance - se->deadline) < 0)
>> +               return false;
>>
>>         /*
>> -        * EEVDF: vd_i = ve_i + r_i / w_i
>> +        * when se->vruntime + tolerance - se->deadline >= 0
>> +        * but se->vruntime - se->deadline < 0,
>> +        * there is two case: if entity is eligible?
>> +        * if entity is not eligible, we don't need wait deadline, because
>> +        * eevdf don't guarantee
>> +        * an ineligible entity can exec its request time in one go.
>> +        * but when entity eligible, just let it run, which is the
>> +        * same processing logic as before.
>>          */
>> -       se->deadline = se->vruntime + calc_delta_fair(se->slice, se);
>> +
>> +       if (sched_feat(FORWARD_DEADLINE) && (s64)(se->vruntime - se->deadline) < 0) {
>> +               if (entity_eligible(cfs_rq, se)) {
>> +                       return false;
>> +               } else {
>> +                       /*
>> +                        * in this case, entity's request size does not use light，
>> +                        * but considering it is not eligible, we don't need exec it.
>> +                        * and we let vd_i += r_i / w_i, make scheduler fairness.
>> +                        */
>> +                       next_deadline = se->deadline + vslice;
>> +               }
>> +       }
>> +
>> +
>> +       se->deadline = next_deadline;
>>
>>         /*
>>          * The task has consumed its request, reschedule.
>> diff --git a/kernel/sched/features.h b/kernel/sched/features.h
>> index 290874079f60..5c74deec7209 100644
>> --- a/kernel/sched/features.h
>> +++ b/kernel/sched/features.h
>> @@ -24,6 +24,13 @@ SCHED_FEAT(RUN_TO_PARITY, true)
>>   */
>>  SCHED_FEAT(PREEMPT_SHORT, true)
>>
>> +/*
>> + * For some cases where the tick is faster than expected,
>> + * move the deadline forward
>> + */
>> +SCHED_FEAT(FORWARD_DEADLINE, true)
>> +
>> +
>>  /*
>>   * Prefer to schedule the task we woke last (assuming it failed
>>   * wakeup-preemption), since its likely going to consume data we
>> --
>> 2.39.3 (Apple Git-146)
>>