linux-kernel - Re: [PATCH v2 7/9] sched: define TIF_ALLOW

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZQlfHAXFFPZBPFgD@gmail.com>
Date:   Tue, 19 Sep 2023 10:43:08 +0200
From:   Ingo Molnar <mingo@...nel.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Ankur Arora <ankur.a.arora@...cle.com>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org, x86@...nel.org,
        akpm@...ux-foundation.org, luto@...nel.org, bp@...en8.de,
        dave.hansen@...ux.intel.com, hpa@...or.com, mingo@...hat.com,
        juri.lelli@...hat.com, vincent.guittot@...aro.org,
        willy@...radead.org, mgorman@...e.de, rostedt@...dmis.org,
        jon.grimm@....com, bharata@....com, raghavendra.kt@....com,
        boris.ostrovsky@...cle.com, konrad.wilk@...cle.com,
        jgross@...e.com, andrew.cooper3@...rix.com
Subject: Re: [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED


* Ingo Molnar <mingo@...nel.org> wrote:

> > Yeah, the fact that we do presumably have PREEMPT_COUNT enabled in most 
> > distros does speak for just admitting that the PREEMPT_NONE / VOLUNTARY 
> > approach isn't actually used, and is only causing pain.
> 
> The macro-behavior of NONE/VOLUNTARY is still used & relied upon in 
> server distros - and that's the behavior that enterprise distros truly 
> cared about.
> 
> Micro-overhead of NONE/VOLUNTARY vs. FULL is nonzero but is in the 
> 'noise' category for all major distros I'd say.
> 
> And that's what Thomas's proposal achieves: keep the nicely 
> execution-batched NONE/VOLUNTARY scheduling behavior for SCHED_OTHER 
> tasks, while having the latency advantages of fully-preemptible kernel 
> code for RT and critical tasks.
> 
> So I'm fully on board with this. It would reduce the number of preemption 
> variants to just two: regular kernel and PREEMPT_RT. Yummie!

As an additional side note: with various changes such as EEVDF the 
scheduler is a lot less preemption-happy these days, without wrecking 
latencies & timeslice distribution.

So in principle we might not even need the NEED_RESCHED_LAZY extra bit, 
which -rt uses as a kind of additional layer to make sure they don't change 
scheduling policy.

Ie. a modern scheduler might have mooted much of this change:

   4542057e18ca ("mm: avoid 'might_sleep()' in get_mmap_lock_carefully()")

... because now we'll only reschedule on timeslice exhaustion, or if a task 
comes in with a big deadline deficit.

And even the deadline-deficit wakeup preemption can be turned off further 
with:

    $ echo NO_WAKEUP_PREEMPTION > /debug/sched/features

And we are considering making that the default behavior for same-prio tasks 
- basically turn same-prio SCHED_OTHER tasks into SCHED_BATCH - which 
should be quite similar to what NEED_RESCHED_LAZY achieves on -rt.

Thanks,

	Ingo