linux-kernel - Re: [PATCH 2/5] sched: Add Lazy preemption model

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <db8b8b80-2e40-4753-ae6f-244cd3ba2312@linux.ibm.com>
Date: Wed, 30 Oct 2024 00:27:26 +0530
From: Shrikanth Hegde <sshegde@...ux.ibm.com>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org,
        juri.lelli@...hat.com, vincent.guittot@...aro.org,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, vschneid@...hat.com, ankur.a.arora@...cle.com,
        efault@....de, tglx@...utronix.de, mingo@...nel.org
Subject: Re: [PATCH 2/5] sched: Add Lazy preemption model


Hi Sebastian.

On 10/25/24 18:49, Sebastian Andrzej Siewior wrote:
> On 2024-10-22 22:14:41 [+0530], Shrikanth Hegde wrote:
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -1251,7 +1251,7 @@ static void update_curr(struct cfs_rq *c
>>>    		return;
>>>    	if (resched || did_preempt_short(cfs_rq, curr)) {
>>
>>
>>
>> If there is a long running task, only after it is not eligible, LAZY would be set and
>> subsequent tick would upgrade it to NR. If one sets sysctl_sched_base_slice to a large
>> value (max 4seconds), LAZY would set thereafter(max 4 seconds) if there in no wakeup in
>> that CPU.
>>
>> If i set sysctl_sched_base_slice=300ms, spawn 2 stress-ng on one CPU, then LAZY bit is
>> set usually after 300ms of sched_switch if there are no wakeups. Subsequent tick NR is set.
>> Initially I was thinking, if there is a long running process, then LAZY would be set after
>> one tick and on subsequent tick NR would set. I was wrong. It might take a long time for LAZY
>> to be set, and On subsequent tick NR would be set.
>>
>> That would be expected behavior since one setting sysctl_sched_base_slice know what to expect?
> 
> I guess so. Once the slice is up then the NEED_RESCHED bit is replaced
> with the LAZY bit. That means a return-to-userland (from a syscall) or
> the following tick will lead to a scheduling event.

ok.

> 
>>> -		resched_curr(rq);
>>> +		resched_curr_lazy(rq);
>>>    		clear_buddies(cfs_rq, curr);
>>>    	}
>>>    }
>>> @@ -5677,7 +5677,7 @@ entity_tick(struct cfs_rq *cfs_rq, struc
>>>    	 * validating it and just reschedule.
>>>    	 */
>>>    	if (queued) {
>>
>> What's this queued used for? hrtick seems to set it. I haven't understood how it works.
> 
> from 20241009074631.GH17263@...sy.programming.kicks-ass.net:
> | hrtick is disabled by default (because expensive) and so it doesn't
> | matter much, but it's purpose is to increase accuracy and hence I left
> | it untouched for now.
> 
> This setups a hrtimer for the (remaining) time slice and invokes the
> task_tick from there (instead of the regular tick).

thanks. will take a look and try to understand.

> 
>>> -		resched_curr(rq_of(cfs_rq));
>>> +		resched_curr_lazy(rq_of(cfs_rq));
>>>    		return;
>>>    	}
>>>    	/*
>>> @@ -8832,7 +8832,7 @@ static void check_preempt_wakeup_fair(st
>>>    	return;
>>>    preempt:
>>> -	resched_curr(rq);
>>
>> Is it better to call resched_curr here? When the code arrives here, it wants to
>> run pse as soon as possible right?
> 
> But wouldn't then every try_to_wakeup()/ wake_up() result in immediate
> preemption? Letting it run and waiting to give up on its own, having the
> preemption on return to userland results usually in better performance.
> At least this is what I observed while playing with this.
> 

yes. I agree that preemption at every ttwu is bad. But that may not 
happen with latest code. i.e if RUN_TO_PARITY is enabled or pick_eevdf 
doesn't pick the waiting task as the best candidate.

My concern was also this code in check_preempt_wakeup_fair
         /*
          * Preempt an idle entity in favor of a non-idle entity (and 
don't preempt
          * in the inverse case).
          */
         if (cse_is_idle && !pse_is_idle)
                 goto preempt;
         if (cse_is_idle != pse_is_idle)
                 return;

If the current is idle and waking is not idle, we should set NR instead 
of LAZY is what I was thinking. Not sure if there is such pattern that 
happen in exit to kernel path, since exit to user is taken care by 
setting LAZY bit.


>>> +	resched_curr_lazy(rq);
>>>    }
>>>    static struct task_struct *pick_task_fair(struct rq *rq)
> 
> Sebastian