[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251117185550.365156-1-kprateek.nayak@amd.com>
Date: Mon, 17 Nov 2025 18:55:45 +0000
From: K Prateek Nayak <kprateek.nayak@....com>
To: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot
<vincent.guittot@...aro.org>, John Stultz <jstultz@...gle.com>, "Johannes
Weiner" <hannes@...xchg.org>, Suren Baghdasaryan <surenb@...gle.com>,
<linux-kernel@...r.kernel.org>
CC: Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt
<rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman
<mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, K Prateek Nayak
<kprateek.nayak@....com>
Subject: [RFC PATCH 0/5] sched/psi: Fix PSI accounting with proxy execution
When booting into a kernel with CONFIG_SCHED_PROXY_EXEC and CONFIG_PSI,
a inconsistent task state warning was noticed soon after the boot
similar to:
psi: inconsistent task state! task=... cpu=... psi_flags=4 clear=0 set=4
On analysis, the following sequence of event was found to be the cause
of the splat:
o Blocked task is retained on the runqueue.
o psi_sched_switch() sees task_on_rq_queued() and retains the runnable
signals for the task.
o Tasks blocks later via proxy_deactivate() but psi_dequeue() doesn't
adjust the PSI flags since DEQUEUE_SLEEP is set expecting
psi_sched_switch() to fix the signals.
o The blocked task is woken up with the PSI state still reflecting that
the task is runnable (TSK_RUNNING) leading to the splat.
Simply tracking proxy_deactivate() is not enough since the task's
blocked_on relationship can be cleared remotely without acquiring the
runqueue lock which can force a blocked task to run before a wakeup -
pick_next_task() pickes the blocked donor and since blocked on
relationship was cleared remotely, task_is_blocked() returns false
leading to the task being run on the CPU.
If the task blocks again before it is woken up, psi_sched_switch() will
try to clear the runnable signals (TSK_RUNNING) unconditionally leading
to a different splat similar to:
psi: inconsistent task state! task=... cpu=... psi_flags=10 clear=14 set=0
To get around this, track the complete lifecycle of a blocked doner
right from delaying the deactivation to the wakeup. When in
blocked/donor state, PSI will consider these tasks similar to delayed
tasks - blocked but migratable.
When the ttwu_runnable() finally wakeups up the task, or if the donor is
deactivated via proxy_deactivate(), the proxy indicator is cleared to
show that the task is either fully blocked or fully runnable now.
Patch 1 and 2 were cleanups to make life slightly easier when auditing
the implementation and inspecting the debug logs. Patch 3 to 5 implement
the tracking of donor states and a couple of fixes on top.
Series was tested on top of tip:sched/core for a while running
sched-messaging without observing any inconsistent task state warning
and should apply cleanly on top of:
git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core
at commit 33cf66d88306 ("sched/fair: Proportional newidle balance").
---
K Prateek Nayak (5):
sched/psi: Make psi stubs consistent for !CONFIG_PSI
sched/psi: Prepend "0x" to format specifiers when printing PSI flags
sched/core: Track blocked tasks retained on rq for proxy
sched/core: Block proxy task on pick when blocked_on is cleared before
wakeup
sched/psi: Fix PSI signals of blocked tasks retained for proxy
include/linux/sched.h | 4 +++
kernel/sched/core.c | 59 +++++++++++++++++++++++++++++++++++++++++--
kernel/sched/psi.c | 4 +--
kernel/sched/sched.h | 2 ++
kernel/sched/stats.h | 6 ++---
5 files changed, 68 insertions(+), 7 deletions(-)
base-commit: 33cf66d88306663d16e4759e9d24766b0aaa2e17
--
2.34.1
Powered by blists - more mailing lists