[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20231107215742.363031-42-ankur.a.arora@oracle.com>
Date: Tue, 7 Nov 2023 13:57:27 -0800
From: Ankur Arora <ankur.a.arora@...cle.com>
To: linux-kernel@...r.kernel.org
Cc: tglx@...utronix.de, peterz@...radead.org,
torvalds@...ux-foundation.org, paulmck@...nel.org,
linux-mm@...ck.org, x86@...nel.org, akpm@...ux-foundation.org,
luto@...nel.org, bp@...en8.de, dave.hansen@...ux.intel.com,
hpa@...or.com, mingo@...hat.com, juri.lelli@...hat.com,
vincent.guittot@...aro.org, willy@...radead.org, mgorman@...e.de,
jon.grimm@....com, bharata@....com, raghavendra.kt@....com,
boris.ostrovsky@...cle.com, konrad.wilk@...cle.com,
jgross@...e.com, andrew.cooper3@...rix.com, mingo@...nel.org,
bristot@...nel.org, mathieu.desnoyers@...icios.com,
geert@...ux-m68k.org, glaubitz@...sik.fu-berlin.de,
anton.ivanov@...bridgegreys.com, mattst88@...il.com,
krypton@...ich-teichert.org, rostedt@...dmis.org,
David.Laight@...LAB.COM, richard@....at, mjguzik@...il.com,
Ankur Arora <ankur.a.arora@...cle.com>
Subject: [RFC PATCH 41/86] sched: handle resched policy in resched_curr()
One of the last ports of call before rescheduling is triggered
is resched_curr().
It's task is to set TIF_NEED_RESCHED and, if running locally,
either fold it in the preempt_count, or send a resched-IPI so
the target CPU folds it in.
To handle TIF_NEED_RESCHED_LAZY -- since the reschedule is not
imminent -- it only needs to set the appropriate bit.
Move all of underlying mechanism in __resched_curr(). And, define
resched_curr() which handles the policy on when we want to set
which need-resched variant.
For now the approach is to run to completion (TIF_NEED_RESCHED_LAZY)
with the following exceptions where we always want to reschedule
at the next preemptible point (TIF_NEED_RESCHED):
- idle: if we are polling in idle, then set_nr_if_polling() will do
the right thing. When not polling, we force TIF_NEED_RESCHED
and send a resched-IPI if needed.
- the target CPU is in userspace: run to completion semantics are
only for kernel tasks
- running under the full preemption model
Originally-by: Thomas Gleixner <tglx@...utronix.de>
Signed-off-by: Ankur Arora <ankur.a.arora@...cle.com>
---
kernel/sched/core.c | 80 +++++++++++++++++++++++++++++++++++++++------
1 file changed, 70 insertions(+), 10 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 01df5ac2982c..f65bf3ce0e9d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1027,13 +1027,13 @@ void wake_up_q(struct wake_q_head *head)
}
/*
- * resched_curr - mark rq's current task 'to be rescheduled now'.
+ * __resched_curr - mark rq's current task 'to be rescheduled'.
*
- * On UP this means the setting of the need_resched flag, on SMP it
- * might also involve a cross-CPU call to trigger the scheduler on
- * the target CPU.
+ * On UP this means the setting of the need_resched flag, on SMP, for
+ * eager resched it might also involve a cross-CPU call to trigger
+ * the scheduler on the target CPU.
*/
-void resched_curr(struct rq *rq)
+void __resched_curr(struct rq *rq, resched_t rs)
{
struct task_struct *curr = rq->curr;
int cpu;
@@ -1046,17 +1046,77 @@ void resched_curr(struct rq *rq)
cpu = cpu_of(rq);
if (cpu == smp_processor_id()) {
- set_tsk_need_resched(curr, RESCHED_eager);
- set_preempt_need_resched();
+ set_tsk_need_resched(curr, rs);
+ if (rs == RESCHED_eager)
+ set_preempt_need_resched();
return;
}
- if (set_nr_and_not_polling(curr, RESCHED_eager))
- smp_send_reschedule(cpu);
- else
+ if (set_nr_and_not_polling(curr, rs)) {
+ if (rs == RESCHED_eager)
+ smp_send_reschedule(cpu);
+ } else if (rs == RESCHED_eager)
trace_sched_wake_idle_without_ipi(cpu);
}
+/*
+ * resched_curr - mark rq's current task 'to be rescheduled' eagerly
+ * or lazily according to the current policy.
+ *
+ * Always schedule eagerly, if:
+ *
+ * - running under full preemption
+ *
+ * - idle: when not polling (or if we don't have TIF_POLLING_NRFLAG)
+ * force TIF_NEED_RESCHED to be set and send a resched IPI.
+ * (the polling case has already set TIF_NEED_RESCHED via
+ * set_nr_if_polling()).
+ *
+ * - in userspace: run to completion semantics are only for kernel tasks
+ *
+ * Otherwise (regardless of priority), run to completion.
+ */
+void resched_curr(struct rq *rq)
+{
+ resched_t rs = RESCHED_lazy;
+ int context;
+
+ if (IS_ENABLED(CONFIG_PREEMPT) ||
+ (rq->curr->sched_class == &idle_sched_class)) {
+ rs = RESCHED_eager;
+ goto resched;
+ }
+
+ /*
+ * We might race with the target CPU while checking its ct_state:
+ *
+ * 1. The task might have just entered the kernel, but has not yet
+ * called user_exit(). We will see stale state (CONTEXT_USER) and
+ * send an unnecessary resched-IPI.
+ *
+ * 2. The user task is through with exit_to_user_mode_loop() but has
+ * not yet called user_enter().
+ *
+ * We'll see the thread's state as CONTEXT_KERNEL and will try to
+ * schedule it lazily. There's obviously nothing that will handle
+ * this need-resched bit until the thread enters the kernel next.
+ *
+ * The scheduler will still do tick accounting, but a potentially
+ * higher priority task waited to be scheduled for a user tick,
+ * instead of execution time in the kernel.
+ */
+ context = ct_state_cpu(cpu_of(rq));
+ if ((context == CONTEXT_USER) ||
+ (context == CONTEXT_GUEST)) {
+
+ rs = RESCHED_eager;
+ goto resched;
+ }
+
+resched:
+ __resched_curr(rq, rs);
+}
+
void resched_cpu(int cpu)
{
struct rq *rq = cpu_rq(cpu);
--
2.31.1
Powered by blists - more mailing lists