linux-kernel - Re: [RFC][PATCH 1/2] sched: Extended scheduler time slice

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CABCjUKA2w9Xip2QDjMRDCWnvmZc52SWbn74-57q52gmpXcT+EA@mail.gmail.com>
Date: Tue, 4 Feb 2025 12:28:41 +0900
From: Suleiman Souhlal <suleiman@...gle.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org, 
	linux-trace-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>, 
	Ankur Arora <ankur.a.arora@...cle.com>, Linus Torvalds <torvalds@...ux-foundation.org>, 
	linux-mm@...ck.org, x86@...nel.org, akpm@...ux-foundation.org, 
	luto@...nel.org, bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com, 
	juri.lelli@...hat.com, vincent.guittot@...aro.org, willy@...radead.org, 
	mgorman@...e.de, jon.grimm@....com, bharata@....com, raghavendra.kt@....com, 
	boris.ostrovsky@...cle.com, konrad.wilk@...cle.com, jgross@...e.com, 
	andrew.cooper3@...rix.com, Joel Fernandes <joel@...lfernandes.org>, 
	Vineeth Pillai <vineethrp@...gle.com>, Ingo Molnar <mingo@...nel.org>, 
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, Clark Williams <clark.williams@...il.com>, 
	bigeasy@...utronix.de, daniel.wagner@...e.com, joseph.salisbury@...cle.com, 
	broonie@...il.com
Subject: Re: [RFC][PATCH 1/2] sched: Extended scheduler time slice

On Tue, Feb 4, 2025 at 1:45 AM Steven Rostedt <rostedt@...dmis.org> wrote:
>
> On Mon, 3 Feb 2025 09:43:06 +0100
> Peter Zijlstra <peterz@...radead.org> wrote:
>
> > Lazy is not the default, nor even the recommended preemption method at
> > this time.
>
> That's OK. If it is considered to be the default in the future, this can
> wait.
>
> >
> > Lazy will not ever be the only preemption method, full isn't going
> > anywhere.
>
> That's fine too, as full preemption has the same issue of preempting
> kernel mutexes. Full preemption is for something that likely doesn't want
> this feature anyway.
>
> >
> > Lazy only applies to fair (and whatever bpf things end up using
> > resched_curr_lazy()).
>
> Is that a problem? User spin locks for RT tasks are very dangerous. If an
> RT task preempts the owner that is of lower priority, it can cause a
> deadlock (if the two tasks are pinned to the same CPU). Which BTW,
> Sebastion mentioned in the Stable RT meeting that glibc supplies a
> pthread_spin_lock() and doesn't have in the man page anything about this
> possible scenario.
>
> >
> > Lazy works on tick granularity, which is variable per the HZ config, and
> > way too long for any of this nonsense.
>
> Patch 2 changes that to do what you wrote the last time. It has a max wait
> time of 50us.
>
> >
> > So by tying this to lazy, you get something that doesn't actually work
> > most of the time, and when it works, it has variable and bad behaviour.
>
> Um no. If we wait for lazy to become the default behavior, it will work
> most of the time. And when it does work, it has strict behavior of 50us.
>
> >
> > So yeah, crap.
>
> As your rationale was not correct, I will disagree with this being crap.
>
>
> >
> > This really isn't difficult to understand, and I've told you this
> > before.
>
> And I listened to what you told me before. Patch 2 implements the 50us max
> that you suggested. I separated it out because it made the code simpler to
> understand and debug. The change log even mentioned:
>
>      For the moment, it lets it run for one more tick (which will be
>      changed later).
>
> That "changed later" is the second patch in this series.
>
> With the "this can wait until lazy is default", is because we have an
> "upstream first" policy. As long as there is some buy-in to the changes, we
> can go ahead and implement it on our devices. We do not have to wait for it
> to be accepted. But if there's a strong NAK to the idea, it is much harder
> to get it implemented internally.

Can you explain why this approach requires PREEMPT_LAZY?

Could  exit_to_user_mode_loop() be changed to something like the
following (with maybe some provision to only do it once)?

if ((ti_work & _TIF_NEED_RESCHED) && !rseq_delay_resched())
    schedule();

I suppose there would also need to be some additional changes to make
sure full preemption also doesn't preempt, maybe in
preempt_schedule*().

-- Suleiman