linux-kernel - Re: [POC][RFC][PATCH] sched: Extended Scheduler Time Slice

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20231025135545.GG31201@noisy.programming.kicks-ass.net>
Date:   Wed, 25 Oct 2023 15:55:45 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ankur Arora <ankur.a.arora@...cle.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        linux-mm@...ck.org, x86@...nel.org, akpm@...ux-foundation.org,
        luto@...nel.org, bp@...en8.de, dave.hansen@...ux.intel.com,
        hpa@...or.com, mingo@...hat.com, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, willy@...radead.org, mgorman@...e.de,
        jon.grimm@....com, bharata@....com, raghavendra.kt@....com,
        boris.ostrovsky@...cle.com, konrad.wilk@...cle.com,
        jgross@...e.com, andrew.cooper3@...rix.com,
        Joel Fernandes <joel@...lfernandes.org>,
        Youssef Esmat <youssefesmat@...omium.org>,
        Vineeth Pillai <vineethrp@...gle.com>,
        Suleiman Souhlal <suleiman@...gle.com>,
        Ingo Molnar <mingo@...nel.org>,
        Daniel Bristot de Oliveira <bristot@...nel.org>
Subject: Re: [POC][RFC][PATCH] sched: Extended Scheduler Time Slice

On Wed, Oct 25, 2023 at 08:54:34AM -0400, Steven Rostedt wrote:

> I didn't want to overload that for something completely different. This is
> not a "restartable sequence".

Your hack is arguably worse. At least rseq already exists and most
threads will already have it set up if you have a recent enough glibc.

> > So what if it doesn't ? Can we kill it for not playing nice ?
> 
> No, it's no different than a system call running for a long time. You could

Then why ask for it? What's the point. Also, did you define
sched_yield() semantics for OTHER to something useful? Because if you
didn't you just invoked UB :-) We could be setting your pets on fire.

> set this bit and leave it there for as long as you want, and it should not
> affect anything.

It would affect the worst case interference terms of the system at the
very least.

> If you look at what Thomas's PREEMPT_AUTO.patch

I know what it does, it also means your thing doesn't work the moment
you set things up to have the old full-preempt semantics back. It
doesn't work in the presence of RT/DL tasks, etc..

More importantly, it doesn't work for RT/DL tasks, so having the bit set
and not having OTHER policy is an error.

Do you want an interface that randomly doesn't work ?

> We could possibly make it adjustable. 

Tunables are not a good thing.

> The reason I've been told over the last few decades of why people implement
> 100% user space spin locks is because the overhead of going int the kernel
> is way too high.

Over the last few decades that has been a blatant falsehood. At some
point (right before the whole meltdown trainwreck) amluto had syscall
overhead down to less than 150 cycles.

Then of course meltdown happened and it all went to shit.

But even today (on good hardware or with mitigations=off):

gettid-1m:	179,650,423      cycles
xadd-1m:	 23,036,564      cycles

syscall is the cost of roughly 8 atomic ops. More expensive, sure. But
not insanely so. I've seen atomic ops go up to >1000 cycles if you
contend them hard enough.