linux-kernel - Re: [RFC][PATCH 1/2] sched: Extended scheduler time slice

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250212121113.3nJ-kf-6@linutronix.de>
Date: Wed, 12 Feb 2025 13:11:13 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Joel Fernandes <joel@...lfernandes.org>,
	Prakash Sangappa <prakash.sangappa@...cle.com>,
	Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org,
	linux-trace-kernel@...r.kernel.org,
	Thomas Gleixner <tglx@...utronix.de>,
	Ankur Arora <ankur.a.arora@...cle.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>, linux-mm@...ck.org,
	x86@...nel.org, Andrew Morton <akpm@...ux-foundation.org>,
	luto@...nel.org, bp@...en8.de, dave.hansen@...ux.intel.com,
	hpa@...or.com, juri.lelli@...hat.com, vincent.guittot@...aro.org,
	willy@...radead.org, mgorman@...e.de, jon.grimm@....com,
	bharata@....com, raghavendra.kt@....com,
	Boris Ostrovsky <boris.ostrovsky@...cle.com>,
	Konrad Wilk <konrad.wilk@...cle.com>, jgross@...e.com,
	Andrew.Cooper3@...rix.com, Vineeth Pillai <vineethrp@...gle.com>,
	Suleiman Souhlal <suleiman@...gle.com>,
	Ingo Molnar <mingo@...nel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	Clark Williams <clark.williams@...il.com>, daniel.wagner@...e.com,
	Joseph Salisbury <joseph.salisbury@...cle.com>, broonie@...il.com
Subject: Re: [RFC][PATCH 1/2] sched: Extended scheduler time slice

On 2025-02-11 10:28:01 [-0500], Steven Rostedt wrote:
> On Tue, 11 Feb 2025 09:21:38 +0100
> Sebastian Andrzej Siewior <bigeasy@...utronix.de> wrote:
> 
> > We don't follow this behaviour exactly today.
> > 
> > Adding this behaviour back vs the behaviour we have now, doesn't seem to
> > improve anything at visible levels. We don't have a counter but we can
> > look at the RCU nesting counter which should be zero once locks have
> > been dropped. So this can be used for testing.
> > 
> > But as I said: using "run to completion" and preempt on the return
> > userland rather than once the lazy flag is seen and all locks have been
> > released appears to be better.
> > 
> > It is (now) possible that you run for a long time and get preempted
> > while holding a spinlock_t. It is however more likely that you release
> > all locks and get preempted while returning to userland.
> 
> IIUC, today, LAZY causes all SCHED_OTHER tasks to act more like
> PREEMPT_NONE. Is that correct?

Well. First sched-tick will set the LAZY bit, the second sched-tick
forces a resched.
On PREEMPT_NONE the sched-tick would be set NEED_RESCHED while nothing
will force a resched until the task decides to do schedule() on its own.
So it is slightly different for kernel threads.

Unless we talk about userland, here we would have a resched on the
return to userland after the sched-tick LAZY or NONE does not matter.

> Now that the PREEMPT_RT is not one of the preemption selections, when you
> select PREEMPT_RT, you can pick between CONFIG_PREEMPT and
> CONFIG_PREEMPT_LAZY. Where CONFIG_PREEMPT will preempt the kernel at the
> scheduler tick if preemption is enabled and CONFIG_PREEMPT_LAZY will
> not preempt the kernel on a scheduler tick and wait for exit to user space.

This is not specific to RT but FULL vs LAZY. But yes. However the second
sched-tick will force preemption point even without the
exit-to-userland.

> Sebastian,
> 
> It appears you only tested the CONFIG_PREEMPT_LAZY selection. Have you
> tested the difference of how CONFIG_PREEMPT behaves between PREEMPT_RT and
> no PREEMPT_RT? I think that will show a difference like we had in the past.

Not that I remember testing PREEMPT vs PREEMPT_RT. I remember people
complained about high networking load on RT which become visible due to
threaded interrupts (as in top) while for non-RT it was more or less
hidden and not clearly visible due to selected accounting. The network
performance was mostly the same as far as I remember (that is gbit).

> I can see people picking both PREEMPT_RT and CONFIG_PREEMPT (Full), but
> then wondering why their non RT tasks are suffering from a performance
> penalty from that.

We might want to opt in for lazy by default on RT. That was the case in
the RT queue until it was replaced with PREEMPT_AUTO.
But then why not use LAZY in favour of PREEMPT. Mike had numbers
   https://lore.kernel.org/all/9df22ebbc2e6d426099bf380477a0ed885068896.camel@gmx.de/

where LAZY had mostly the voluntary performance with less context
switches than preempt. Which means also without the need for
cond_resched() and friends.

> -- Steve

Sebastian