[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZQAVho72j1zG/HhK@gmail.com>
Date: Tue, 12 Sep 2023 09:38:46 +0200
From: Ingo Molnar <mingo@...nel.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Steven Rostedt <rostedt@...dmis.org>,
Ankur Arora <ankur.a.arora@...cle.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org, x86@...nel.org,
akpm@...ux-foundation.org, luto@...nel.org, bp@...en8.de,
dave.hansen@...ux.intel.com, hpa@...or.com, mingo@...hat.com,
juri.lelli@...hat.com, vincent.guittot@...aro.org,
willy@...radead.org, mgorman@...e.de, tglx@...utronix.de,
jon.grimm@....com, bharata@....com, raghavendra.kt@....com,
boris.ostrovsky@...cle.com, konrad.wilk@...cle.com
Subject: Re: [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED
* Peter Zijlstra <peterz@...radead.org> wrote:
> On Mon, Sep 11, 2023 at 02:16:18PM -0700, Linus Torvalds wrote:
> > On Mon, 11 Sept 2023 at 13:50, Linus Torvalds
> > <torvalds@...ux-foundation.org> wrote:
> > >
> > > Except we've actually been *adding* to this whole mess, rather than
> > > removing it. So we have actively *expanded* on that preemption choice
> > > with PREEMPT_DYNAMIC.
> >
> > Actually, that config option makes no sense.
> >
> > It makes the sched_cond() behavior conditional with a static call.
> >
> > But all the *real* overhead is still there and unconditional (ie all
> > the preempt count updates and the "did it go down to zero and we need
> > to check" code).
> >
> > That just seems stupid. It seems to have all the overhead of a
> > preemptible kernel, just not doing the preemption.
> >
> > So I must be mis-reading this, or just missing something important.
> >
> > The real cost seems to be
> >
> > PREEMPT_BUILD -> PREEMPTION -> PREEMPT_COUNT
> >
> > and PREEMPT vs PREEMPT_DYNAMIC makes no difference to that, since both
> > will end up with that, and thus both cases will have all the spinlock
> > preempt count stuff.
> >
> > There must be some non-preempt_count cost that people worry about.
> >
> > Or maybe I'm just mis-reading the Kconfig stuff entirely. That's
> > possible, because this seems *so* pointless to me.
> >
> > Somebody please hit me with a clue-bat to the noggin.
>
> Well, I was about to reply to your previous email explaining this, but
> this one time I did read more email..
>
> Yes, PREEMPT_DYNAMIC has all the preempt count twiddling and only nops
> out the schedule()/cond_resched() calls where appropriate.
>
> This work was done by a distro (SuSE) and if they're willing to ship this
> I'm thinking the overheads are acceptable to them.
>
> For a significant number of workloads the real overhead is the extra
> preepmtions themselves more than the counting -- but yes, the counting is
> measurable, but probably in the noise compared to other some of the other
> horrible things we have done the past years.
>
> Anyway, if distros are fine shipping with PREEMPT_DYNAMIC, then yes,
> deleting the other options are definitely an option.
Yes, so my understanding is that distros generally worry more about
macro-overhead, for example material changes to a random subset of key
benchmarks that specific enterprise customers care about, and distros are
not nearly as sensitive about micro-overhead that preempt_count()
maintenance causes.
PREEMPT_DYNAMIC is basically a reflection of that: the desire to have only
a single kernel image, but a boot-time toggle to differentiate between
desktop and server loads and have CONFIG_PREEMPT (desktop) but also
PREEMPT_VOLUNTARY behavior (server).
There's also the view that PREEMPT kernels are a bit more QA-friendly,
because atomic code sequences are much better defined & enforced via kernel
warnings. Without preempt_count we only have irqs-off warnings, that are
only a small fraction of all critical sections in the kernel.
Ideally we'd be able to patch out most of the preempt_count maintenance
overhead too - OTOH these days it's little more than noise on most CPUs,
considering the kind of horrible security-workaround overhead we have on
almost all x86 CPU types ... :-/
Thanks,
Ingo
Powered by blists - more mailing lists