linux-kernel - Re: [PATCH v2 7/9] sched: define TIF_ALLOW

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wi0bXpgULVVLc2AdJcta-fvQP7yyFQ_JtaoHUiPrqf--A@mail.gmail.com>
Date:   Sat, 9 Sep 2023 21:35:54 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Ankur Arora <ankur.a.arora@...cle.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org, x86@...nel.org,
        akpm@...ux-foundation.org, luto@...nel.org, bp@...en8.de,
        dave.hansen@...ux.intel.com, hpa@...or.com, mingo@...hat.com,
        juri.lelli@...hat.com, vincent.guittot@...aro.org,
        willy@...radead.org, mgorman@...e.de, rostedt@...dmis.org,
        tglx@...utronix.de, jon.grimm@....com, bharata@....com,
        raghavendra.kt@....com, boris.ostrovsky@...cle.com,
        konrad.wilk@...cle.com
Subject: Re: [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED

On Sat, 9 Sept 2023 at 20:49, Ankur Arora <ankur.a.arora@...cle.com> wrote:
>
> I think we can keep these checks, but with this fixed definition of
> resched_allowed(). This might be better:
>
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2260,7 +2260,8 @@ static inline void disallow_resched(void)
>
>  static __always_inline bool resched_allowed(void)
>  {
> -       return unlikely(test_tsk_thread_flag(current, TIF_RESCHED_ALLOW));
> +       return unlikely(!preempt_count() &&
> +                        test_tsk_thread_flag(current, TIF_RESCHED_ALLOW));
>  }

I'm not convinced (at all) that the preempt count is the right thing.

It works for interrupts, yes, because interrupts will increment the
preempt count even on non-preempt kernels (since the preempt count is
also the interrupt context level).

But what about any synchronous trap handling?

In other words, just something like a page fault? A page fault doesn't
increment the preemption count (and in fact many page faults _will_
obviously re-schedule as part of waiting for IO).

A page fault can *itself* say "feel free to preempt me", and that's one thing.

But a page fault can also *interupt* something that said "feel free to
preempt me", and that's a completely *different* thing.

So I feel like the "tsk_thread_flag" was sadly completely the wrong
place to add this bit to, and the wrong place to test it in. What we
really want is "current kernel entry context".

So the right thing to do would basically be to put it in the stack
frame at kernel entry - whether that kernel entry was a system call
(which is doing some big copy that should be preemptible without us
having to add "cond_resched()" in places), or is a page fault (which
will also do things like big page clearings for hugepages)

And we don't have that, do we?

We have "on_thread_stack()", which checks for "are we on the system
call stack". But that doesn't work for page faults.

PeterZ - I feel like I might be either very confused, or missing
something You probably go "Duh, Linus, you're off on one of your crazy
tangents, and none of this is relevant because..."

                 Linus