[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <YNBwjnRbQrWiG57/@google.com>
Date: Mon, 21 Jun 2021 10:57:18 +0000
From: Quentin Perret <qperret@...gle.com>
To: Dietmar Eggemann <dietmar.eggemann@....com>
Cc: Peter Zijlstra <peterz@...radead.org>, mingo@...hat.com,
vincent.guittot@...aro.org, qais.yousef@....com,
rickyiu@...gle.com, wvw@...gle.com, patrick.bellasi@...bug.net,
xuewen.yan94@...il.com, linux-kernel@...r.kernel.org,
kernel-team@...roid.com
Subject: Re: [PATCH v2 1/3] sched: Fix UCLAMP_FLAG_IDLE setting
Hi Dietmar,
On Thursday 17 Jun 2021 at 17:27:56 (+0200), Dietmar Eggemann wrote:
> On 11/06/2021 09:25, Quentin Perret wrote:
> > On Thursday 10 Jun 2021 at 21:05:12 (+0200), Peter Zijlstra wrote:
> >> On Thu, Jun 10, 2021 at 03:13:04PM +0000, Quentin Perret wrote:
> >>> The UCLAMP_FLAG_IDLE flag is set on a runqueue when dequeueing the last
> >>> active task to maintain the last uclamp.max and prevent blocked util
> >>> from suddenly becoming visible.
> >>>
> >>> However, there is an asymmetry in how the flag is set and cleared which
> >>> can lead to having the flag set whilst there are active tasks on the rq.
> >>> Specifically, the flag is cleared in the uclamp_rq_inc() path, which is
> >>> called at enqueue time, but set in uclamp_rq_dec_id() which is called
> >>> both when dequeueing a task _and_ in the update_uclamp_active() path. As
> >>> a result, when both uclamp_rq_{dec,ind}_id() are called from
> >>> update_uclamp_active(), the flag ends up being set but not cleared,
> >>> hence leaving the runqueue in a broken state.
> >>>
> >>> Fix this by setting the flag in the uclamp_rq_inc_id() path to ensure
> >>> things remain symmetrical.
> >>
> >> The code you moved is neither in uclamp_rq_inc_id(), although
> >> uclamp_idle_reset() is called from there
> >
> > Yep, that is what I was trying to say.
> >
> >> nor does it _set_ the flag.
> >
> > Ahem. That I don't have a good excuse for ...
>
> (A) dequeue -> set
>
> (1) dequeue_task() -> uclamp_rq_dec() ->
>
> (2) cpu_util_update_eff() -> ... -> uclamp_update_active() ->
>
> uclamp_rq_dec_id()
>
> uclamp_rq_max_value()
>
> /* No tasks -- default clamp values */
> uclamp_idle_value() {
>
> if (clamp_id == UCLAMP_MAX)
> rq->uclamp_flags |= UCLAMP_FLAG_IDLE; <-- set
> }
>
> ---
>
> (B) enqueue -> clear
>
> (1) enqueue_task() ->
>
> uclamp_rq_inc() {
>
> (2) cpu_util_update_eff() -> ... -> uclamp_update_active() ->
>
> uclamp_rq_inc_id() {
>
> uclamp_idle_reset() {
> <-- new clear
> } ^
> } |
> |
> if (rq->uclamp_flags & UCLAMP_FLAG_IDLE) |
> rq->uclamp_flags &= ~UCLAMP_FLAG_IDLE; <-- old clear
> }
>
> ---
>
> uclamp_update_active()
>
> if (p->uclamp[clamp_id].active) {
> uclamp_rq_dec_id() <-- (A2)
> uclamp_rq_inc_id() <-- (B2)
> }
>
> Is this existing asymmetry in setting the flag but not clearing it in
> uclamp_update_active() the only issue this patch fixes?
I think this is the root of the problem, but it can have odd symptoms.
In a bad case that can lead to hitting the WARN in uclamp_rq_dec_id
(which is how we've found the bug in the first place).
I'll try and repost this with a correct commit message soon -- still
fighting with my inbox right now.
Thanks,
Quentin
Powered by blists - more mailing lists