[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a76adfd2-a17d-4342-af7e-5d17cf10dab7@arm.com>
Date: Thu, 21 Dec 2023 10:43:54 +0000
From: Metin Kaya <metin.kaya@....com>
To: John Stultz <jstultz@...gle.com>, LKML <linux-kernel@...r.kernel.org>
Cc: Peter Zijlstra <peterz@...radead.org>, Joel Fernandes
<joelaf@...gle.com>, Qais Yousef <qyousef@...gle.com>,
Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Valentin Schneider <vschneid@...hat.com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Zimuzo Ezeozue <zezeozue@...gle.com>, Youssef Esmat
<youssefesmat@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Will Deacon <will@...nel.org>, Waiman Long <longman@...hat.com>,
Boqun Feng <boqun.feng@...il.com>, "Paul E. McKenney" <paulmck@...nel.org>,
Xuewen Yan <xuewen.yan94@...il.com>, K Prateek Nayak
<kprateek.nayak@....com>, Thomas Gleixner <tglx@...utronix.de>,
kernel-team@...roid.com, Connor O'Brien <connoro@...gle.com>
Subject: Re: [PATCH v7 08/23] sched: Split scheduler and execution contexts
On 20/12/2023 12:18 am, John Stultz wrote:
> From: Peter Zijlstra <peterz@...radead.org>
>
> Let's define the scheduling context as all the scheduler state
> in task_struct for the task selected to run, and the execution
> context as all state required to actually run the task.
>
> Currently both are intertwined in task_struct. We want to
> logically split these such that we can use the scheduling
> context of the task selected to be scheduled, but use the
> execution context of a different task to actually be run.
Should we update Documentation/kernel-hacking/hacking.rst (line #348:
:c:macro:`current`) or another appropriate doc to announce separation of
scheduling & execution contexts?
>
> To this purpose, introduce rq_selected() macro to point to the
> task_struct selected from the runqueue by the scheduler, and
> will be used for scheduler state, and preserve rq->curr to
> indicate the execution context of the task that will actually be
> run.
>
> NOTE: Peter previously mentioned he didn't like the name
> "rq_selected()", but I've not come up with a better alternative.
> I'm very open to other name proposals.
>
> Question for Peter: Dietmar suggested you'd prefer I drop the
> conditionalization of the scheduler context pointer on the rq
> (so rq_selected() would be open coded as rq->curr_selected or
> whatever we agree on for a name), but I'd think in the
> !CONFIG_PROXY_EXEC case we'd want to avoid the wasted pointer
> and its use (since it curr_selected would always be == curr)?
> If I'm wrong I'm fine switching this, but would appreciate
> clarification.
>
> Cc: Joel Fernandes <joelaf@...gle.com>
> Cc: Qais Yousef <qyousef@...gle.com>
> Cc: Ingo Molnar <mingo@...hat.com>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Juri Lelli <juri.lelli@...hat.com>
> Cc: Vincent Guittot <vincent.guittot@...aro.org>
> Cc: Dietmar Eggemann <dietmar.eggemann@....com>
> Cc: Valentin Schneider <vschneid@...hat.com>
> Cc: Steven Rostedt <rostedt@...dmis.org>
> Cc: Ben Segall <bsegall@...gle.com>
> Cc: Zimuzo Ezeozue <zezeozue@...gle.com>
> Cc: Youssef Esmat <youssefesmat@...gle.com>
> Cc: Mel Gorman <mgorman@...e.de>
> Cc: Daniel Bristot de Oliveira <bristot@...hat.com>
> Cc: Will Deacon <will@...nel.org>
> Cc: Waiman Long <longman@...hat.com>
> Cc: Boqun Feng <boqun.feng@...il.com>
> Cc: "Paul E. McKenney" <paulmck@...nel.org>
> Cc: Xuewen Yan <xuewen.yan94@...il.com>
> Cc: K Prateek Nayak <kprateek.nayak@....com>
> Cc: Metin Kaya <Metin.Kaya@....com>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: kernel-team@...roid.com
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> Signed-off-by: Juri Lelli <juri.lelli@...hat.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> Link: https://lkml.kernel.org/r/20181009092434.26221-5-juri.lelli@redhat.com
> [add additional comments and update more sched_class code to use
> rq::proxy]
> Signed-off-by: Connor O'Brien <connoro@...gle.com>
> [jstultz: Rebased and resolved minor collisions, reworked to use
> accessors, tweaked update_curr_common to use rq_proxy fixing rt
> scheduling issues]
> Signed-off-by: John Stultz <jstultz@...gle.com>
> ---
> v2:
> * Reworked to use accessors
> * Fixed update_curr_common to use proxy instead of curr
> v3:
> * Tweaked wrapper names
> * Swapped proxy for selected for clarity
> v4:
> * Minor variable name tweaks for readability
> * Use a macro instead of a inline function and drop
> other helper functions as suggested by Peter.
> * Remove verbose comments/questions to avoid review
> distractions, as suggested by Dietmar
> v5:
> * Add CONFIG_PROXY_EXEC option to this patch so the
> new logic can be tested with this change
> * Minor fix to grab rq_selected when holding the rq lock
> v7:
> * Minor spelling fix and unused argument fixes suggested by
> Metin Kaya
> * Switch to curr_selected for consistency, and minor rewording
> of commit message for clarity
> * Rename variables selected instead of curr when we're using
> rq_selected()
> * Reduce macros in CONFIG_SCHED_PROXY_EXEC ifdef sections,
> as suggested by Metin Kaya
> ---
> kernel/sched/core.c | 46 ++++++++++++++++++++++++++---------------
> kernel/sched/deadline.c | 35 ++++++++++++++++---------------
> kernel/sched/fair.c | 18 ++++++++--------
> kernel/sched/rt.c | 40 +++++++++++++++++------------------
> kernel/sched/sched.h | 35 +++++++++++++++++++++++++++++--
> 5 files changed, 109 insertions(+), 65 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index e06558fb08aa..0ce34f5c0e0c 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -822,7 +822,7 @@ static enum hrtimer_restart hrtick(struct hrtimer *timer)
>
> rq_lock(rq, &rf);
> update_rq_clock(rq);
> - rq->curr->sched_class->task_tick(rq, rq->curr, 1);
> + rq_selected(rq)->sched_class->task_tick(rq, rq_selected(rq), 1);
> rq_unlock(rq, &rf);
>
> return HRTIMER_NORESTART;
> @@ -2242,16 +2242,18 @@ static inline void check_class_changed(struct rq *rq, struct task_struct *p,
>
> void wakeup_preempt(struct rq *rq, struct task_struct *p, int flags)
> {
> - if (p->sched_class == rq->curr->sched_class)
> - rq->curr->sched_class->wakeup_preempt(rq, p, flags);
> - else if (sched_class_above(p->sched_class, rq->curr->sched_class))
> + struct task_struct *selected = rq_selected(rq);
> +
> + if (p->sched_class == selected->sched_class)
> + selected->sched_class->wakeup_preempt(rq, p, flags);
> + else if (sched_class_above(p->sched_class, selected->sched_class))
> resched_curr(rq);
>
> /*
> * A queue event has occurred, and we're going to schedule. In
> * this case, we can save a useless back to back clock update.
> */
> - if (task_on_rq_queued(rq->curr) && test_tsk_need_resched(rq->curr))
> + if (task_on_rq_queued(selected) && test_tsk_need_resched(rq->curr))
> rq_clock_skip_update(rq);
> }
>
> @@ -2780,7 +2782,7 @@ __do_set_cpus_allowed(struct task_struct *p, struct affinity_context *ctx)
> lockdep_assert_held(&p->pi_lock);
>
> queued = task_on_rq_queued(p);
> - running = task_current(rq, p);
> + running = task_current_selected(rq, p);
>
> if (queued) {
> /*
> @@ -5600,7 +5602,7 @@ unsigned long long task_sched_runtime(struct task_struct *p)
> * project cycles that may never be accounted to this
> * thread, breaking clock_gettime().
> */
> - if (task_current(rq, p) && task_on_rq_queued(p)) {
> + if (task_current_selected(rq, p) && task_on_rq_queued(p)) {
> prefetch_curr_exec_start(p);
> update_rq_clock(rq);
> p->sched_class->update_curr(rq);
> @@ -5668,7 +5670,8 @@ void scheduler_tick(void)
> {
> int cpu = smp_processor_id();
> struct rq *rq = cpu_rq(cpu);
> - struct task_struct *curr = rq->curr;
> + /* accounting goes to the selected task */
> + struct task_struct *selected;
> struct rq_flags rf;
> unsigned long thermal_pressure;
> u64 resched_latency;
> @@ -5679,16 +5682,17 @@ void scheduler_tick(void)
> sched_clock_tick();
>
> rq_lock(rq, &rf);
> + selected = rq_selected(rq);
>
> update_rq_clock(rq);
> thermal_pressure = arch_scale_thermal_pressure(cpu_of(rq));
> update_thermal_load_avg(rq_clock_thermal(rq), rq, thermal_pressure);
> - curr->sched_class->task_tick(rq, curr, 0);
> + selected->sched_class->task_tick(rq, selected, 0);
> if (sched_feat(LATENCY_WARN))
> resched_latency = cpu_resched_latency(rq);
> calc_global_load_tick(rq);
> sched_core_tick(rq);
> - task_tick_mm_cid(rq, curr);
> + task_tick_mm_cid(rq, selected);
>
> rq_unlock(rq, &rf);
>
> @@ -5697,8 +5701,8 @@ void scheduler_tick(void)
>
> perf_event_task_tick();
>
> - if (curr->flags & PF_WQ_WORKER)
> - wq_worker_tick(curr);
> + if (selected->flags & PF_WQ_WORKER)
> + wq_worker_tick(selected);
>
> #ifdef CONFIG_SMP
> rq->idle_balance = idle_cpu(cpu);
> @@ -5763,6 +5767,12 @@ static void sched_tick_remote(struct work_struct *work)
> struct task_struct *curr = rq->curr;
>
> if (cpu_online(cpu)) {
> + /*
> + * Since this is a remote tick for full dynticks mode,
> + * we are always sure that there is no proxy (only a
> + * single task is running).
> + */
> + SCHED_WARN_ON(rq->curr != rq_selected(rq));
> update_rq_clock(rq);
>
> if (!is_idle_task(curr)) {
> @@ -6685,6 +6695,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
> }
>
> next = pick_next_task(rq, prev, &rf);
> + rq_set_selected(rq, next);
> clear_tsk_need_resched(prev);
> clear_preempt_need_resched();
> #ifdef CONFIG_SCHED_DEBUG
> @@ -7185,7 +7196,7 @@ void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task)
>
> prev_class = p->sched_class;
> queued = task_on_rq_queued(p);
> - running = task_current(rq, p);
> + running = task_current_selected(rq, p);
> if (queued)
> dequeue_task(rq, p, queue_flag);
> if (running)
> @@ -7275,7 +7286,7 @@ void set_user_nice(struct task_struct *p, long nice)
> }
>
> queued = task_on_rq_queued(p);
> - running = task_current(rq, p);
> + running = task_current_selected(rq, p);
> if (queued)
> dequeue_task(rq, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK);
> if (running)
> @@ -7868,7 +7879,7 @@ static int __sched_setscheduler(struct task_struct *p,
> }
>
> queued = task_on_rq_queued(p);
> - running = task_current(rq, p);
> + running = task_current_selected(rq, p);
> if (queued)
> dequeue_task(rq, p, queue_flags);
> if (running)
> @@ -9295,6 +9306,7 @@ void __init init_idle(struct task_struct *idle, int cpu)
> rcu_read_unlock();
>
> rq->idle = idle;
> + rq_set_selected(rq, idle);
> rcu_assign_pointer(rq->curr, idle);
> idle->on_rq = TASK_ON_RQ_QUEUED;
> #ifdef CONFIG_SMP
> @@ -9384,7 +9396,7 @@ void sched_setnuma(struct task_struct *p, int nid)
>
> rq = task_rq_lock(p, &rf);
> queued = task_on_rq_queued(p);
> - running = task_current(rq, p);
> + running = task_current_selected(rq, p);
>
> if (queued)
> dequeue_task(rq, p, DEQUEUE_SAVE);
> @@ -10489,7 +10501,7 @@ void sched_move_task(struct task_struct *tsk)
>
> update_rq_clock(rq);
>
> - running = task_current(rq, tsk);
> + running = task_current_selected(rq, tsk);
> queued = task_on_rq_queued(tsk);
>
> if (queued)
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 6140f1f51da1..9cf20f4ac5f9 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1150,7 +1150,7 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
> #endif
>
> enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
> - if (dl_task(rq->curr))
> + if (dl_task(rq_selected(rq)))
> wakeup_preempt_dl(rq, p, 0);
> else
> resched_curr(rq);
> @@ -1273,7 +1273,7 @@ static u64 grub_reclaim(u64 delta, struct rq *rq, struct sched_dl_entity *dl_se)
> */
> static void update_curr_dl(struct rq *rq)
> {
> - struct task_struct *curr = rq->curr;
> + struct task_struct *curr = rq_selected(rq);
> struct sched_dl_entity *dl_se = &curr->dl;
> s64 delta_exec, scaled_delta_exec;
> int cpu = cpu_of(rq);
> @@ -1784,7 +1784,7 @@ static int find_later_rq(struct task_struct *task);
> static int
> select_task_rq_dl(struct task_struct *p, int cpu, int flags)
> {
> - struct task_struct *curr;
> + struct task_struct *curr, *selected;
> bool select_rq;
> struct rq *rq;
>
> @@ -1795,6 +1795,7 @@ select_task_rq_dl(struct task_struct *p, int cpu, int flags)
>
> rcu_read_lock();
> curr = READ_ONCE(rq->curr); /* unlocked access */
> + selected = READ_ONCE(rq_selected(rq));
>
> /*
> * If we are dealing with a -deadline task, we must
> @@ -1805,9 +1806,9 @@ select_task_rq_dl(struct task_struct *p, int cpu, int flags)
> * other hand, if it has a shorter deadline, we
> * try to make it stay here, it might be important.
> */
> - select_rq = unlikely(dl_task(curr)) &&
> + select_rq = unlikely(dl_task(selected)) &&
> (curr->nr_cpus_allowed < 2 ||
> - !dl_entity_preempt(&p->dl, &curr->dl)) &&
> + !dl_entity_preempt(&p->dl, &selected->dl)) &&
> p->nr_cpus_allowed > 1;
>
> /*
> @@ -1870,7 +1871,7 @@ static void check_preempt_equal_dl(struct rq *rq, struct task_struct *p)
> * let's hope p can move out.
> */
> if (rq->curr->nr_cpus_allowed == 1 ||
> - !cpudl_find(&rq->rd->cpudl, rq->curr, NULL))
> + !cpudl_find(&rq->rd->cpudl, rq_selected(rq), NULL))
> return;
>
> /*
> @@ -1909,7 +1910,7 @@ static int balance_dl(struct rq *rq, struct task_struct *p, struct rq_flags *rf)
> static void wakeup_preempt_dl(struct rq *rq, struct task_struct *p,
> int flags)
> {
> - if (dl_entity_preempt(&p->dl, &rq->curr->dl)) {
> + if (dl_entity_preempt(&p->dl, &rq_selected(rq)->dl)) {
> resched_curr(rq);
> return;
> }
> @@ -1919,7 +1920,7 @@ static void wakeup_preempt_dl(struct rq *rq, struct task_struct *p,
> * In the unlikely case current and p have the same deadline
> * let us try to decide what's the best thing to do...
> */
> - if ((p->dl.deadline == rq->curr->dl.deadline) &&
> + if ((p->dl.deadline == rq_selected(rq)->dl.deadline) &&
> !test_tsk_need_resched(rq->curr))
> check_preempt_equal_dl(rq, p);
> #endif /* CONFIG_SMP */
> @@ -1954,7 +1955,7 @@ static void set_next_task_dl(struct rq *rq, struct task_struct *p, bool first)
> if (hrtick_enabled_dl(rq))
> start_hrtick_dl(rq, p);
>
> - if (rq->curr->sched_class != &dl_sched_class)
> + if (rq_selected(rq)->sched_class != &dl_sched_class)
> update_dl_rq_load_avg(rq_clock_pelt(rq), rq, 0);
>
> deadline_queue_push_tasks(rq);
> @@ -2268,8 +2269,8 @@ static int push_dl_task(struct rq *rq)
> * can move away, it makes sense to just reschedule
> * without going further in pushing next_task.
> */
> - if (dl_task(rq->curr) &&
> - dl_time_before(next_task->dl.deadline, rq->curr->dl.deadline) &&
> + if (dl_task(rq_selected(rq)) &&
> + dl_time_before(next_task->dl.deadline, rq_selected(rq)->dl.deadline) &&
> rq->curr->nr_cpus_allowed > 1) {
> resched_curr(rq);
> return 0;
> @@ -2394,7 +2395,7 @@ static void pull_dl_task(struct rq *this_rq)
> * deadline than the current task of its runqueue.
> */
> if (dl_time_before(p->dl.deadline,
> - src_rq->curr->dl.deadline))
> + rq_selected(src_rq)->dl.deadline))
> goto skip;
>
> if (is_migration_disabled(p)) {
> @@ -2435,9 +2436,9 @@ static void task_woken_dl(struct rq *rq, struct task_struct *p)
> if (!task_on_cpu(rq, p) &&
> !test_tsk_need_resched(rq->curr) &&
> p->nr_cpus_allowed > 1 &&
> - dl_task(rq->curr) &&
> + dl_task(rq_selected(rq)) &&
> (rq->curr->nr_cpus_allowed < 2 ||
> - !dl_entity_preempt(&p->dl, &rq->curr->dl))) {
> + !dl_entity_preempt(&p->dl, &rq_selected(rq)->dl))) {
> push_dl_tasks(rq);
> }
> }
> @@ -2612,12 +2613,12 @@ static void switched_to_dl(struct rq *rq, struct task_struct *p)
> return;
> }
>
> - if (rq->curr != p) {
> + if (rq_selected(rq) != p) {
> #ifdef CONFIG_SMP
> if (p->nr_cpus_allowed > 1 && rq->dl.overloaded)
> deadline_queue_push_tasks(rq);
> #endif
> - if (dl_task(rq->curr))
> + if (dl_task(rq_selected(rq)))
> wakeup_preempt_dl(rq, p, 0);
> else
> resched_curr(rq);
> @@ -2646,7 +2647,7 @@ static void prio_changed_dl(struct rq *rq, struct task_struct *p,
> if (!rq->dl.overloaded)
> deadline_queue_pull_task(rq);
>
> - if (task_current(rq, p)) {
> + if (task_current_selected(rq, p)) {
> /*
> * If we now have a earlier deadline task than p,
> * then reschedule, provided p is still on this
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 1251fd01a555..07216ea3ed53 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1157,7 +1157,7 @@ static s64 update_curr_se(struct rq *rq, struct sched_entity *curr)
> */
> s64 update_curr_common(struct rq *rq)
> {
> - struct task_struct *curr = rq->curr;
> + struct task_struct *curr = rq_selected(rq);
> s64 delta_exec;
>
> delta_exec = update_curr_se(rq, &curr->se);
> @@ -1203,7 +1203,7 @@ static void update_curr(struct cfs_rq *cfs_rq)
>
> static void update_curr_fair(struct rq *rq)
> {
> - update_curr(cfs_rq_of(&rq->curr->se));
> + update_curr(cfs_rq_of(&rq_selected(rq)->se));
> }
>
> static inline void
> @@ -6611,7 +6611,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
> s64 delta = slice - ran;
>
> if (delta < 0) {
> - if (task_current(rq, p))
> + if (task_current_selected(rq, p))
> resched_curr(rq);
> return;
> }
> @@ -6626,7 +6626,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
> */
> static void hrtick_update(struct rq *rq)
> {
> - struct task_struct *curr = rq->curr;
> + struct task_struct *curr = rq_selected(rq);
>
> if (!hrtick_enabled_fair(rq) || curr->sched_class != &fair_sched_class)
> return;
> @@ -8235,7 +8235,7 @@ static void set_next_buddy(struct sched_entity *se)
> */
> static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int wake_flags)
> {
> - struct task_struct *curr = rq->curr;
> + struct task_struct *curr = rq_selected(rq);
> struct sched_entity *se = &curr->se, *pse = &p->se;
> struct cfs_rq *cfs_rq = task_cfs_rq(curr);
> int next_buddy_marked = 0;
> @@ -8268,7 +8268,7 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
> * prevents us from potentially nominating it as a false LAST_BUDDY
> * below.
> */
> - if (test_tsk_need_resched(curr))
> + if (test_tsk_need_resched(rq->curr))
> return;
>
> /* Idle tasks are by definition preempted by non-idle tasks. */
> @@ -9252,7 +9252,7 @@ static bool __update_blocked_others(struct rq *rq, bool *done)
> * update_load_avg() can call cpufreq_update_util(). Make sure that RT,
> * DL and IRQ signals have been updated before updating CFS.
> */
> - curr_class = rq->curr->sched_class;
> + curr_class = rq_selected(rq)->sched_class;
>
> thermal_pressure = arch_scale_thermal_pressure(cpu_of(rq));
>
> @@ -12640,7 +12640,7 @@ prio_changed_fair(struct rq *rq, struct task_struct *p, int oldprio)
> * our priority decreased, or if we are not currently running on
> * this runqueue and our priority is higher than the current's
> */
> - if (task_current(rq, p)) {
> + if (task_current_selected(rq, p)) {
> if (p->prio > oldprio)
> resched_curr(rq);
> } else
> @@ -12743,7 +12743,7 @@ static void switched_to_fair(struct rq *rq, struct task_struct *p)
> * kick off the schedule if running, otherwise just see
> * if we can still preempt the current task.
> */
> - if (task_current(rq, p))
> + if (task_current_selected(rq, p))
> resched_curr(rq);
> else
> wakeup_preempt(rq, p, 0);
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 9cdea3ea47da..2682cec45aaa 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -530,7 +530,7 @@ static void dequeue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags)
>
> static void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
> {
> - struct task_struct *curr = rq_of_rt_rq(rt_rq)->curr;
> + struct task_struct *curr = rq_selected(rq_of_rt_rq(rt_rq));
> struct rq *rq = rq_of_rt_rq(rt_rq);
> struct sched_rt_entity *rt_se;
>
> @@ -1000,7 +1000,7 @@ static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq)
> */
> static void update_curr_rt(struct rq *rq)
> {
> - struct task_struct *curr = rq->curr;
> + struct task_struct *curr = rq_selected(rq);
> struct sched_rt_entity *rt_se = &curr->rt;
> s64 delta_exec;
>
> @@ -1545,7 +1545,7 @@ static int find_lowest_rq(struct task_struct *task);
> static int
> select_task_rq_rt(struct task_struct *p, int cpu, int flags)
> {
> - struct task_struct *curr;
> + struct task_struct *curr, *selected;
> struct rq *rq;
> bool test;
>
> @@ -1557,6 +1557,7 @@ select_task_rq_rt(struct task_struct *p, int cpu, int flags)
>
> rcu_read_lock();
> curr = READ_ONCE(rq->curr); /* unlocked access */
> + selected = READ_ONCE(rq_selected(rq));
>
> /*
> * If the current task on @p's runqueue is an RT task, then
> @@ -1585,8 +1586,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, int flags)
> * systems like big.LITTLE.
> */
> test = curr &&
> - unlikely(rt_task(curr)) &&
> - (curr->nr_cpus_allowed < 2 || curr->prio <= p->prio);
> + unlikely(rt_task(selected)) &&
> + (curr->nr_cpus_allowed < 2 || selected->prio <= p->prio);
>
> if (test || !rt_task_fits_capacity(p, cpu)) {
> int target = find_lowest_rq(p);
> @@ -1616,12 +1617,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, int flags)
>
> static void check_preempt_equal_prio(struct rq *rq, struct task_struct *p)
> {
> - /*
> - * Current can't be migrated, useless to reschedule,
> - * let's hope p can move out.
> - */
> if (rq->curr->nr_cpus_allowed == 1 ||
> - !cpupri_find(&rq->rd->cpupri, rq->curr, NULL))
> + !cpupri_find(&rq->rd->cpupri, rq_selected(rq), NULL))
> return;
>
> /*
> @@ -1664,7 +1661,9 @@ static int balance_rt(struct rq *rq, struct task_struct *p, struct rq_flags *rf)
> */
> static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int flags)
> {
> - if (p->prio < rq->curr->prio) {
> + struct task_struct *curr = rq_selected(rq);
> +
> + if (p->prio < curr->prio) {
> resched_curr(rq);
> return;
> }
> @@ -1682,7 +1681,7 @@ static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int flags)
> * to move current somewhere else, making room for our non-migratable
> * task.
> */
> - if (p->prio == rq->curr->prio && !test_tsk_need_resched(rq->curr))
> + if (p->prio == curr->prio && !test_tsk_need_resched(rq->curr))
> check_preempt_equal_prio(rq, p);
> #endif
> }
> @@ -1707,7 +1706,7 @@ static inline void set_next_task_rt(struct rq *rq, struct task_struct *p, bool f
> * utilization. We only care of the case where we start to schedule a
> * rt task
> */
> - if (rq->curr->sched_class != &rt_sched_class)
> + if (rq_selected(rq)->sched_class != &rt_sched_class)
> update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 0);
>
> rt_queue_push_tasks(rq);
> @@ -1988,6 +1987,7 @@ static struct task_struct *pick_next_pushable_task(struct rq *rq)
>
> BUG_ON(rq->cpu != task_cpu(p));
> BUG_ON(task_current(rq, p));
> + BUG_ON(task_current_selected(rq, p));
> BUG_ON(p->nr_cpus_allowed <= 1);
>
> BUG_ON(!task_on_rq_queued(p));
> @@ -2020,7 +2020,7 @@ static int push_rt_task(struct rq *rq, bool pull)
> * higher priority than current. If that's the case
> * just reschedule current.
> */
> - if (unlikely(next_task->prio < rq->curr->prio)) {
> + if (unlikely(next_task->prio < rq_selected(rq)->prio)) {
> resched_curr(rq);
> return 0;
> }
> @@ -2375,7 +2375,7 @@ static void pull_rt_task(struct rq *this_rq)
> * p if it is lower in priority than the
> * current task on the run queue
> */
> - if (p->prio < src_rq->curr->prio)
> + if (p->prio < rq_selected(src_rq)->prio)
> goto skip;
>
> if (is_migration_disabled(p)) {
> @@ -2419,9 +2419,9 @@ static void task_woken_rt(struct rq *rq, struct task_struct *p)
> bool need_to_push = !task_on_cpu(rq, p) &&
> !test_tsk_need_resched(rq->curr) &&
> p->nr_cpus_allowed > 1 &&
> - (dl_task(rq->curr) || rt_task(rq->curr)) &&
> + (dl_task(rq_selected(rq)) || rt_task(rq_selected(rq))) &&
> (rq->curr->nr_cpus_allowed < 2 ||
> - rq->curr->prio <= p->prio);
> + rq_selected(rq)->prio <= p->prio);
>
> if (need_to_push)
> push_rt_tasks(rq);
> @@ -2505,7 +2505,7 @@ static void switched_to_rt(struct rq *rq, struct task_struct *p)
> if (p->nr_cpus_allowed > 1 && rq->rt.overloaded)
> rt_queue_push_tasks(rq);
> #endif /* CONFIG_SMP */
> - if (p->prio < rq->curr->prio && cpu_online(cpu_of(rq)))
> + if (p->prio < rq_selected(rq)->prio && cpu_online(cpu_of(rq)))
> resched_curr(rq);
> }
> }
> @@ -2520,7 +2520,7 @@ prio_changed_rt(struct rq *rq, struct task_struct *p, int oldprio)
> if (!task_on_rq_queued(p))
> return;
>
> - if (task_current(rq, p)) {
> + if (task_current_selected(rq, p)) {
> #ifdef CONFIG_SMP
> /*
> * If our priority decreases while running, we
> @@ -2546,7 +2546,7 @@ prio_changed_rt(struct rq *rq, struct task_struct *p, int oldprio)
> * greater than the current running task
> * then reschedule.
> */
> - if (p->prio < rq->curr->prio)
> + if (p->prio < rq_selected(rq)->prio)
> resched_curr(rq);
> }
> }
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 3e0e4fc8734b..6ea1dfbe502a 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -994,7 +994,10 @@ struct rq {
> */
> unsigned int nr_uninterruptible;
>
> - struct task_struct __rcu *curr;
> + struct task_struct __rcu *curr; /* Execution context */
> +#ifdef CONFIG_SCHED_PROXY_EXEC
> + struct task_struct __rcu *curr_selected; /* Scheduling context (policy) */
> +#endif
> struct task_struct *idle;
> struct task_struct *stop;
> unsigned long next_balance;
> @@ -1189,6 +1192,20 @@ DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
> #define cpu_curr(cpu) (cpu_rq(cpu)->curr)
> #define raw_rq() raw_cpu_ptr(&runqueues)
>
> +#ifdef CONFIG_SCHED_PROXY_EXEC
> +#define rq_selected(rq) ((rq)->curr_selected)
> +static inline void rq_set_selected(struct rq *rq, struct task_struct *t)
> +{
> + rcu_assign_pointer(rq->curr_selected, t);
> +}
> +#else
> +#define rq_selected(rq) ((rq)->curr)
> +static inline void rq_set_selected(struct rq *rq, struct task_struct *t)
> +{
> + /* Do nothing */
> +}
> +#endif
> +
> struct sched_group;
> #ifdef CONFIG_SCHED_CORE
> static inline struct cpumask *sched_group_span(struct sched_group *sg);
> @@ -2112,11 +2129,25 @@ static inline u64 global_rt_runtime(void)
> return (u64)sysctl_sched_rt_runtime * NSEC_PER_USEC;
> }
>
> +/*
> + * Is p the current execution context?
> + */
> static inline int task_current(struct rq *rq, struct task_struct *p)
> {
> return rq->curr == p;
> }
>
> +/*
> + * Is p the current scheduling context?
> + *
> + * Note that it might be the current execution context at the same time if
> + * rq->curr == rq_selected() == p.
> + */
> +static inline int task_current_selected(struct rq *rq, struct task_struct *p)
> +{
> + return rq_selected(rq) == p;
> +}
> +
> static inline int task_on_cpu(struct rq *rq, struct task_struct *p)
> {
> #ifdef CONFIG_SMP
> @@ -2280,7 +2311,7 @@ struct sched_class {
>
> static inline void put_prev_task(struct rq *rq, struct task_struct *prev)
> {
> - WARN_ON_ONCE(rq->curr != prev);
> + WARN_ON_ONCE(rq_selected(rq) != prev);
> prev->sched_class->put_prev_task(rq, prev);
> }
>
Powered by blists - more mailing lists