[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <877cmsgsrg.ffs@tglx>
Date: Wed, 08 Nov 2023 16:38:11 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: Peter Zijlstra <peterz@...radead.org>,
Ankur Arora <ankur.a.arora@...cle.com>
Cc: linux-kernel@...r.kernel.org, torvalds@...ux-foundation.org,
paulmck@...nel.org, linux-mm@...ck.org, x86@...nel.org,
akpm@...ux-foundation.org, luto@...nel.org, bp@...en8.de,
dave.hansen@...ux.intel.com, hpa@...or.com, mingo@...hat.com,
juri.lelli@...hat.com, vincent.guittot@...aro.org,
willy@...radead.org, mgorman@...e.de, jon.grimm@....com,
bharata@....com, raghavendra.kt@....com,
boris.ostrovsky@...cle.com, konrad.wilk@...cle.com,
jgross@...e.com, andrew.cooper3@...rix.com, mingo@...nel.org,
bristot@...nel.org, mathieu.desnoyers@...icios.com,
geert@...ux-m68k.org, glaubitz@...sik.fu-berlin.de,
anton.ivanov@...bridgegreys.com, mattst88@...il.com,
krypton@...ich-teichert.org, rostedt@...dmis.org,
David.Laight@...lab.com, richard@....at, mjguzik@...il.com
Subject: Re: [RFC PATCH 00/86] Make the kernel preemptible
On Wed, Nov 08 2023 at 11:13, Peter Zijlstra wrote:
> On Wed, Nov 08, 2023 at 02:04:02AM -0800, Ankur Arora wrote:
> I'm not understanding, those should stay obviously.
>
> The current preempt_dynamic stuff has 5 toggles:
>
> /*
> * SC:cond_resched
> * SC:might_resched
> * SC:preempt_schedule
> * SC:preempt_schedule_notrace
> * SC:irqentry_exit_cond_resched
> *
> *
> * NONE:
> * cond_resched <- __cond_resched
> * might_resched <- RET0
> * preempt_schedule <- NOP
> * preempt_schedule_notrace <- NOP
> * irqentry_exit_cond_resched <- NOP
> *
> * VOLUNTARY:
> * cond_resched <- __cond_resched
> * might_resched <- __cond_resched
> * preempt_schedule <- NOP
> * preempt_schedule_notrace <- NOP
> * irqentry_exit_cond_resched <- NOP
> *
> * FULL:
> * cond_resched <- RET0
> * might_resched <- RET0
> * preempt_schedule <- preempt_schedule
> * preempt_schedule_notrace <- preempt_schedule_notrace
> * irqentry_exit_cond_resched <- irqentry_exit_cond_resched
> */
>
> If you kill voluntary as we know it today, you can remove cond_resched
> and might_resched, but the remaining 3 are still needed to switch
> between NONE and FULL.
No. The whole point of LAZY is to keep preempt_schedule(),
preempt_schedule_notrace(), irqentry_exit_cond_resched() always enabled.
Look at my PoC: https://lore.kernel.org/lkml/87jzshhexi.ffs@tglx/
The idea is to always enable preempt count and keep _all_ preemption
points enabled.
For NONE/VOLUNTARY mode let the scheduler set TIF_NEED_RESCHED_LAZY
instead of TIF_NEED_RESCHED. In full mode set TIF_NEED_RESCHED.
Here is where the regular and the lazy flags are evaluated:
Ret2user Ret2kernel PreemptCnt=0 need_resched()
NEED_RESCHED Y Y Y Y
LAZY_RESCHED Y N N Y
The trick is that LAZY is not folded into preempt_count so a 1->0
counter transition won't cause preempt_schedule() to be invoked because
the topmost bit (NEED_RESCHED) is set.
The scheduler can still decide to set TIF_NEED_RESCHED which will cause
an immediate preemption at the next preemption point.
This allows to force out a task which loops, e.g. in a massive copy or
clear operation, as it did not reach a point where TIF_NEED_RESCHED_LAZY
is evaluated after a time which is defined by the scheduler itself.
For my PoC I did:
1) Set TIF_NEED_RESCHED_LAZY
2) Set TIF_NEED_RESCHED when the task did not react on
TIF_NEED_RESCHED_LAZY within a tick
I know that's crude but it just works and obviously requires quite some
refinement.
So the way how you switch between preemption modes is to select when the
scheduler sets TIF_NEED_RESCHED/TIF_NEED_RESCHED_LAZY. No static call
switching at all.
In full preemption mode it sets always TIF_NEED_RESCHED and otherwise it
uses the LAZY bit first, grants some time and then gets out the hammer
and sets TIF_NEED_RESCHED when the task did not reach a LAZY preemption
point.
Which means once the whole thing is in place then the whole
PREEMPT_DYNAMIC along with NONE, VOLUNTARY, FULL can go away along with
the cond_resched() hackery.
So I think this series is backwards.
It should add the LAZY muck with a Kconfig switch like I did in my PoC
_first_. Once that is working and agreed on, the existing muck can be
removed.
Thanks,
tglx
Powered by blists - more mailing lists