[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <db8b8b80-2e40-4753-ae6f-244cd3ba2312@linux.ibm.com>
Date: Wed, 30 Oct 2024 00:27:26 +0530
From: Shrikanth Hegde <sshegde@...ux.ibm.com>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org,
juri.lelli@...hat.com, vincent.guittot@...aro.org,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, vschneid@...hat.com, ankur.a.arora@...cle.com,
efault@....de, tglx@...utronix.de, mingo@...nel.org
Subject: Re: [PATCH 2/5] sched: Add Lazy preemption model
Hi Sebastian.
On 10/25/24 18:49, Sebastian Andrzej Siewior wrote:
> On 2024-10-22 22:14:41 [+0530], Shrikanth Hegde wrote:
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -1251,7 +1251,7 @@ static void update_curr(struct cfs_rq *c
>>> return;
>>> if (resched || did_preempt_short(cfs_rq, curr)) {
>>
>>
>>
>> If there is a long running task, only after it is not eligible, LAZY would be set and
>> subsequent tick would upgrade it to NR. If one sets sysctl_sched_base_slice to a large
>> value (max 4seconds), LAZY would set thereafter(max 4 seconds) if there in no wakeup in
>> that CPU.
>>
>> If i set sysctl_sched_base_slice=300ms, spawn 2 stress-ng on one CPU, then LAZY bit is
>> set usually after 300ms of sched_switch if there are no wakeups. Subsequent tick NR is set.
>> Initially I was thinking, if there is a long running process, then LAZY would be set after
>> one tick and on subsequent tick NR would set. I was wrong. It might take a long time for LAZY
>> to be set, and On subsequent tick NR would be set.
>>
>> That would be expected behavior since one setting sysctl_sched_base_slice know what to expect?
>
> I guess so. Once the slice is up then the NEED_RESCHED bit is replaced
> with the LAZY bit. That means a return-to-userland (from a syscall) or
> the following tick will lead to a scheduling event.
ok.
>
>>> - resched_curr(rq);
>>> + resched_curr_lazy(rq);
>>> clear_buddies(cfs_rq, curr);
>>> }
>>> }
>>> @@ -5677,7 +5677,7 @@ entity_tick(struct cfs_rq *cfs_rq, struc
>>> * validating it and just reschedule.
>>> */
>>> if (queued) {
>>
>> What's this queued used for? hrtick seems to set it. I haven't understood how it works.
>
> from 20241009074631.GH17263@...sy.programming.kicks-ass.net:
> | hrtick is disabled by default (because expensive) and so it doesn't
> | matter much, but it's purpose is to increase accuracy and hence I left
> | it untouched for now.
>
> This setups a hrtimer for the (remaining) time slice and invokes the
> task_tick from there (instead of the regular tick).
thanks. will take a look and try to understand.
>
>>> - resched_curr(rq_of(cfs_rq));
>>> + resched_curr_lazy(rq_of(cfs_rq));
>>> return;
>>> }
>>> /*
>>> @@ -8832,7 +8832,7 @@ static void check_preempt_wakeup_fair(st
>>> return;
>>> preempt:
>>> - resched_curr(rq);
>>
>> Is it better to call resched_curr here? When the code arrives here, it wants to
>> run pse as soon as possible right?
>
> But wouldn't then every try_to_wakeup()/ wake_up() result in immediate
> preemption? Letting it run and waiting to give up on its own, having the
> preemption on return to userland results usually in better performance.
> At least this is what I observed while playing with this.
>
yes. I agree that preemption at every ttwu is bad. But that may not
happen with latest code. i.e if RUN_TO_PARITY is enabled or pick_eevdf
doesn't pick the waiting task as the best candidate.
My concern was also this code in check_preempt_wakeup_fair
/*
* Preempt an idle entity in favor of a non-idle entity (and
don't preempt
* in the inverse case).
*/
if (cse_is_idle && !pse_is_idle)
goto preempt;
if (cse_is_idle != pse_is_idle)
return;
If the current is idle and waking is not idle, we should set NR instead
of LAZY is what I was thinking. Not sure if there is such pattern that
happen in exit to kernel path, since exit to user is taken care by
setting LAZY bit.
>>> + resched_curr_lazy(rq);
>>> }
>>> static struct task_struct *pick_task_fair(struct rq *rq)
>
> Sebastian
Powered by blists - more mailing lists