lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <877cmsgsrg.ffs@tglx>
Date:   Wed, 08 Nov 2023 16:38:11 +0100
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Peter Zijlstra <peterz@...radead.org>,
        Ankur Arora <ankur.a.arora@...cle.com>
Cc:     linux-kernel@...r.kernel.org, torvalds@...ux-foundation.org,
        paulmck@...nel.org, linux-mm@...ck.org, x86@...nel.org,
        akpm@...ux-foundation.org, luto@...nel.org, bp@...en8.de,
        dave.hansen@...ux.intel.com, hpa@...or.com, mingo@...hat.com,
        juri.lelli@...hat.com, vincent.guittot@...aro.org,
        willy@...radead.org, mgorman@...e.de, jon.grimm@....com,
        bharata@....com, raghavendra.kt@....com,
        boris.ostrovsky@...cle.com, konrad.wilk@...cle.com,
        jgross@...e.com, andrew.cooper3@...rix.com, mingo@...nel.org,
        bristot@...nel.org, mathieu.desnoyers@...icios.com,
        geert@...ux-m68k.org, glaubitz@...sik.fu-berlin.de,
        anton.ivanov@...bridgegreys.com, mattst88@...il.com,
        krypton@...ich-teichert.org, rostedt@...dmis.org,
        David.Laight@...lab.com, richard@....at, mjguzik@...il.com
Subject: Re: [RFC PATCH 00/86] Make the kernel preemptible

On Wed, Nov 08 2023 at 11:13, Peter Zijlstra wrote:
> On Wed, Nov 08, 2023 at 02:04:02AM -0800, Ankur Arora wrote:
> I'm not understanding, those should stay obviously.
>
> The current preempt_dynamic stuff has 5 toggles:
>
> /*
>  * SC:cond_resched
>  * SC:might_resched
>  * SC:preempt_schedule
>  * SC:preempt_schedule_notrace
>  * SC:irqentry_exit_cond_resched
>  *
>  *
>  * NONE:
>  *   cond_resched               <- __cond_resched
>  *   might_resched              <- RET0
>  *   preempt_schedule           <- NOP
>  *   preempt_schedule_notrace   <- NOP
>  *   irqentry_exit_cond_resched <- NOP
>  *
>  * VOLUNTARY:
>  *   cond_resched               <- __cond_resched
>  *   might_resched              <- __cond_resched
>  *   preempt_schedule           <- NOP
>  *   preempt_schedule_notrace   <- NOP
>  *   irqentry_exit_cond_resched <- NOP
>  *
>  * FULL:
>  *   cond_resched               <- RET0
>  *   might_resched              <- RET0
>  *   preempt_schedule           <- preempt_schedule
>  *   preempt_schedule_notrace   <- preempt_schedule_notrace
>  *   irqentry_exit_cond_resched <- irqentry_exit_cond_resched
>  */
>
> If you kill voluntary as we know it today, you can remove cond_resched
> and might_resched, but the remaining 3 are still needed to switch
> between NONE and FULL.

No. The whole point of LAZY is to keep preempt_schedule(),
preempt_schedule_notrace(), irqentry_exit_cond_resched() always enabled.

Look at my PoC: https://lore.kernel.org/lkml/87jzshhexi.ffs@tglx/

The idea is to always enable preempt count and keep _all_ preemption
points enabled.

For NONE/VOLUNTARY mode let the scheduler set TIF_NEED_RESCHED_LAZY
instead of TIF_NEED_RESCHED. In full mode set TIF_NEED_RESCHED.

Here is where the regular and the lazy flags are evaluated:

                Ret2user        Ret2kernel      PreemptCnt=0  need_resched()

NEED_RESCHED       Y                Y               Y         Y
LAZY_RESCHED       Y                N               N         Y

The trick is that LAZY is not folded into preempt_count so a 1->0
counter transition won't cause preempt_schedule() to be invoked because
the topmost bit (NEED_RESCHED) is set.

The scheduler can still decide to set TIF_NEED_RESCHED which will cause
an immediate preemption at the next preemption point.

This allows to force out a task which loops, e.g. in a massive copy or
clear operation, as it did not reach a point where TIF_NEED_RESCHED_LAZY
is evaluated after a time which is defined by the scheduler itself.

For my PoC I did:

    1) Set TIF_NEED_RESCHED_LAZY

    2) Set TIF_NEED_RESCHED when the task did not react on
       TIF_NEED_RESCHED_LAZY within a tick

I know that's crude but it just works and obviously requires quite some
refinement.

So the way how you switch between preemption modes is to select when the
scheduler sets TIF_NEED_RESCHED/TIF_NEED_RESCHED_LAZY. No static call
switching at all.

In full preemption mode it sets always TIF_NEED_RESCHED and otherwise it
uses the LAZY bit first, grants some time and then gets out the hammer
and sets TIF_NEED_RESCHED when the task did not reach a LAZY preemption
point.

Which means once the whole thing is in place then the whole
PREEMPT_DYNAMIC along with NONE, VOLUNTARY, FULL can go away along with
the cond_resched() hackery.

So I think this series is backwards.

It should add the LAZY muck with a Kconfig switch like I did in my PoC
_first_. Once that is working and agreed on, the existing muck can be
removed.

Thanks,

        tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ