[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20231030095203.33325aee@gandalf.local.home>
Date: Mon, 30 Oct 2023 09:52:03 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: LKML <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ankur Arora <ankur.a.arora@...cle.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
linux-mm@...ck.org, x86@...nel.org, akpm@...ux-foundation.org,
luto@...nel.org, bp@...en8.de, dave.hansen@...ux.intel.com,
hpa@...or.com, mingo@...hat.com, juri.lelli@...hat.com,
vincent.guittot@...aro.org, willy@...radead.org, mgorman@...e.de,
jon.grimm@....com, bharata@....com, raghavendra.kt@....com,
boris.ostrovsky@...cle.com, konrad.wilk@...cle.com,
jgross@...e.com, andrew.cooper3@...rix.com,
Joel Fernandes <joel@...lfernandes.org>,
Youssef Esmat <youssefesmat@...omium.org>,
Vineeth Pillai <vineethrp@...gle.com>,
Suleiman Souhlal <suleiman@...gle.com>,
Ingo Molnar <mingo@...nel.org>,
Daniel Bristot de Oliveira <bristot@...nel.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Subject: Re: [POC][RFC][PATCH] sched: Extended Scheduler Time Slice
On Mon, 30 Oct 2023 14:29:49 +0100
Peter Zijlstra <peterz@...radead.org> wrote:
> On Thu, Oct 26, 2023 at 09:16:58AM -0400, Steven Rostedt wrote:
>
> > I said:
> >
> > If we are worried about abuse, we could even punish tasks that don't call
> > sched_yield() by the time its extended time slice is taken.
>
> This is a user interface, ofcourse I'm worried about abuse. That's the
> first thing you *should* think about.
>
> Userspace is out to get you -- must assume hostile.
100% agree!
>
> Notably, we were talking usec latencies in the Chrome thread, you're
> adding 1000 usec latencies here (in the best case, delaying scheduling
> until the next tick, 10000usec for the HZ=100 folks). This is quite
> 'unfortunate'.
>
> On my very aged IVB-EP I can get 50us scheduling latencies on a good
> day, on my brand spanking new SPR I can get 20us (more faster more
> better etc..).
>
> Ideally we don't allow userspace to extend much (if any) beyond the
> granularity already imposed by the kernel's preempt/IRQ-disable regions.
> Sadly we don't have a self-measure of that around.
>
> So I had a poke at all this and ended up with the below. I still utterly
> detest all this, but it appears to actually work -- although I don't
> much see the improvement, the numbers are somewhat unstable. (I say it
> works because I see the 'yield -- made it' trace_printk when I do it
> right and the 'timeout -- force resched' when I do it 'wrong'.
>
> This thing works across the board and gives userspace 50usec, equal to
> what the kernel already imposes on (on the IVB).
>
> I simply took a bit from the existing flags field, and userspace can use
> BTR to test if the kernel cleared it -- in which case it needs yield
> (and not any other syscall).
>
> Additinally doing a syscall with the bit set will SIGSEGV (when
> DEBUG_RSEQ).
>
Thanks for looking into this even though you detest it ;-)
Unfortunately, now that the merge window has opened (and someone reported a
bug in my code from linux-next :-( ), I need to take a step back from this
and may not be able to work on it again until plumbers. By then, I hope to
have time to dig deeper into what you have done here.
Thanks again Peter!
-- Steve
Powered by blists - more mailing lists