[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251104210456.652800-1-sieberf@amazon.com>
Date: Tue, 4 Nov 2025 23:04:55 +0200
From: Fernand Sieber <sieberf@...zon.com>
To: kernel test robot <oliver.sang@...el.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
<x86@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
<aubrey.li@...ux.intel.com>, <yu.c.chen@...el.com>
Subject: Re: [tip:sched/core] [sched/fair] 79104becf4: BUG:kernel_NULL_pointer_dereference,address
Hi Peter,
I spent some time today investigating this report. The crash happens when
a proxy task yields.
Since it probably doesn't make sense that a task blocking the best pick
yields, a simple workaround is to ignore the yield in this case:
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8993,6 +8993,11 @@ static void yield_task_fair(struct rq *rq)
if (unlikely(rq->nr_running == 1))
return;
+ /* Don't yield if we're running a proxy task */
+ if (rq->donor && rq->donor != curr) {
+ return;
+ }
+
However, more generally, I am not sure that the logic in update_min_vruntime()
is sound when we are running a proxy task, which I suspect is the ultimate
root cause of the problem. It seems to assume that cfs_rq->curr is the
running task, which is not the case.
In my troubleshooting I have seen inconsistent calculations with underflows
of cfs_rq->avg_vruntime and avg_vruntime(cfs_rq) being lower than
min_vruntime. I'll see if I can invest more time diving into this, in the
meantime do you have any thoughts?
Thanks,
--Fernand
Amazon Development Centre (South Africa) (Proprietary) Limited
29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa
Registration Number: 2004 / 034463 / 07
Powered by blists - more mailing lists