linux-kernel - Re: [PATCH v2 7/9] sched: define TIF_ALLOW

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZQAVho72j1zG/HhK@gmail.com>
Date:   Tue, 12 Sep 2023 09:38:46 +0200
From:   Ingo Molnar <mingo@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ankur Arora <ankur.a.arora@...cle.com>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org, x86@...nel.org,
        akpm@...ux-foundation.org, luto@...nel.org, bp@...en8.de,
        dave.hansen@...ux.intel.com, hpa@...or.com, mingo@...hat.com,
        juri.lelli@...hat.com, vincent.guittot@...aro.org,
        willy@...radead.org, mgorman@...e.de, tglx@...utronix.de,
        jon.grimm@....com, bharata@....com, raghavendra.kt@....com,
        boris.ostrovsky@...cle.com, konrad.wilk@...cle.com
Subject: Re: [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED


* Peter Zijlstra <peterz@...radead.org> wrote:

> On Mon, Sep 11, 2023 at 02:16:18PM -0700, Linus Torvalds wrote:
> > On Mon, 11 Sept 2023 at 13:50, Linus Torvalds
> > <torvalds@...ux-foundation.org> wrote:
> > >
> > > Except we've actually been *adding* to this whole mess, rather than
> > > removing it. So we have actively *expanded* on that preemption choice
> > > with PREEMPT_DYNAMIC.
> > 
> > Actually, that config option makes no sense.
> > 
> > It makes the sched_cond() behavior conditional with a static call.
> > 
> > But all the *real* overhead is still there and unconditional (ie all
> > the preempt count updates and the "did it go down to zero and we need
> > to check" code).
> > 
> > That just seems stupid. It seems to have all the overhead of a
> > preemptible kernel, just not doing the preemption.
> > 
> > So I must be mis-reading this, or just missing something important.
> > 
> > The real cost seems to be
> > 
> >    PREEMPT_BUILD -> PREEMPTION -> PREEMPT_COUNT
> > 
> > and PREEMPT vs PREEMPT_DYNAMIC makes no difference to that, since both
> > will end up with that, and thus both cases will have all the spinlock
> > preempt count stuff.
> > 
> > There must be some non-preempt_count cost that people worry about.
> > 
> > Or maybe I'm just mis-reading the Kconfig stuff entirely. That's
> > possible, because this seems *so* pointless to me.
> > 
> > Somebody please hit me with a clue-bat to the noggin.
> 
> Well, I was about to reply to your previous email explaining this, but 
> this one time I did read more email..
> 
> Yes, PREEMPT_DYNAMIC has all the preempt count twiddling and only nops 
> out the schedule()/cond_resched() calls where appropriate.
> 
> This work was done by a distro (SuSE) and if they're willing to ship this 
> I'm thinking the overheads are acceptable to them.
> 
> For a significant number of workloads the real overhead is the extra 
> preepmtions themselves more than the counting -- but yes, the counting is 
> measurable, but probably in the noise compared to other some of the other 
> horrible things we have done the past years.
> 
> Anyway, if distros are fine shipping with PREEMPT_DYNAMIC, then yes, 
> deleting the other options are definitely an option.

Yes, so my understanding is that distros generally worry more about 
macro-overhead, for example material changes to a random subset of key 
benchmarks that specific enterprise customers care about, and distros are 
not nearly as sensitive about micro-overhead that preempt_count() 
maintenance causes.

PREEMPT_DYNAMIC is basically a reflection of that: the desire to have only 
a single kernel image, but a boot-time toggle to differentiate between 
desktop and server loads and have CONFIG_PREEMPT (desktop) but also 
PREEMPT_VOLUNTARY behavior (server).

There's also the view that PREEMPT kernels are a bit more QA-friendly, 
because atomic code sequences are much better defined & enforced via kernel 
warnings. Without preempt_count we only have irqs-off warnings, that are 
only a small fraction of all critical sections in the kernel.

Ideally we'd be able to patch out most of the preempt_count maintenance 
overhead too - OTOH these days it's little more than noise on most CPUs, 
considering the kind of horrible security-workaround overhead we have on 
almost all x86 CPU types ... :-/

Thanks,

	Ingo