[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZQlfHAXFFPZBPFgD@gmail.com>
Date: Tue, 19 Sep 2023 10:43:08 +0200
From: Ingo Molnar <mingo@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
Ankur Arora <ankur.a.arora@...cle.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org, x86@...nel.org,
akpm@...ux-foundation.org, luto@...nel.org, bp@...en8.de,
dave.hansen@...ux.intel.com, hpa@...or.com, mingo@...hat.com,
juri.lelli@...hat.com, vincent.guittot@...aro.org,
willy@...radead.org, mgorman@...e.de, rostedt@...dmis.org,
jon.grimm@....com, bharata@....com, raghavendra.kt@....com,
boris.ostrovsky@...cle.com, konrad.wilk@...cle.com,
jgross@...e.com, andrew.cooper3@...rix.com
Subject: Re: [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED
* Ingo Molnar <mingo@...nel.org> wrote:
> > Yeah, the fact that we do presumably have PREEMPT_COUNT enabled in most
> > distros does speak for just admitting that the PREEMPT_NONE / VOLUNTARY
> > approach isn't actually used, and is only causing pain.
>
> The macro-behavior of NONE/VOLUNTARY is still used & relied upon in
> server distros - and that's the behavior that enterprise distros truly
> cared about.
>
> Micro-overhead of NONE/VOLUNTARY vs. FULL is nonzero but is in the
> 'noise' category for all major distros I'd say.
>
> And that's what Thomas's proposal achieves: keep the nicely
> execution-batched NONE/VOLUNTARY scheduling behavior for SCHED_OTHER
> tasks, while having the latency advantages of fully-preemptible kernel
> code for RT and critical tasks.
>
> So I'm fully on board with this. It would reduce the number of preemption
> variants to just two: regular kernel and PREEMPT_RT. Yummie!
As an additional side note: with various changes such as EEVDF the
scheduler is a lot less preemption-happy these days, without wrecking
latencies & timeslice distribution.
So in principle we might not even need the NEED_RESCHED_LAZY extra bit,
which -rt uses as a kind of additional layer to make sure they don't change
scheduling policy.
Ie. a modern scheduler might have mooted much of this change:
4542057e18ca ("mm: avoid 'might_sleep()' in get_mmap_lock_carefully()")
... because now we'll only reschedule on timeslice exhaustion, or if a task
comes in with a big deadline deficit.
And even the deadline-deficit wakeup preemption can be turned off further
with:
$ echo NO_WAKEUP_PREEMPTION > /debug/sched/features
And we are considering making that the default behavior for same-prio tasks
- basically turn same-prio SCHED_OTHER tasks into SCHED_BATCH - which
should be quite similar to what NEED_RESCHED_LAZY achieves on -rt.
Thanks,
Ingo
Powered by blists - more mailing lists