[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230904222351.GC2568@noisy.programming.kicks-ass.net>
Date: Tue, 5 Sep 2023 00:23:51 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Hao Jia <jiahao.os@...edance.com>
Cc: Benjamin Segall <bsegall@...gle.com>,
Bagas Sanjaya <bagasdotme@...il.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Igor Raits <igor.raits@...il.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Linux Regressions <regressions@...ts.linux.dev>,
Linux Stable <stable@...r.kernel.org>
Subject: Re: [External] Re: Fwd: WARNING: CPU: 13 PID: 3837105 at
kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160
On Thu, Aug 31, 2023 at 04:48:29PM +0800, Hao Jia wrote:
> If I understand correctly, rq->clock_update_flags may be set to
> RQCF_ACT_SKIP after __schedule() holds the rq lock, and sometimes the rq
> lock may be released briefly in __schedule(), such as newidle_balance(). At
> this time Other CPUs hold this rq lock, and then calling
> rq_clock_start_loop_update() may trigger this warning.
>
> This warning check might be wrong. We need to add assert_clock_updated() to
> check that the rq clock has been updated before calling
> rq_clock_start_loop_update().
>
> Maybe some things can be like this?
Urgh, aside from it being white space mangled, I think this is entirely
going in the wrong direction.
Leaking ACT_SKIP is dodgy as heck.. it's entirely too late to think
clearly though, I'll have to try again tomorrow.
Powered by blists - more mailing lists