[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20251128115445.GA1526246@bytedance.com>
Date: Fri, 28 Nov 2025 19:54:45 +0800
From: "Aaron Lu" <ziqianlu@...edance.com>
To: "xupengbo" <xupengbo1029@....com>, "Ingo Molnar" <mingo@...hat.com>,
"Peter Zijlstra" <peterz@...radead.org>
Cc: "Juri Lelli" <juri.lelli@...hat.com>,
"Vincent Guittot" <vincent.guittot@...aro.org>,
"Dietmar Eggemann" <dietmar.eggemann@....com>,
"Steven Rostedt" <rostedt@...dmis.org>,
"Ben Segall" <bsegall@...gle.com>, "Mel Gorman" <mgorman@...e.de>,
"Valentin Schneider" <vschneid@...hat.com>,
"David Vernet" <void@...ifault.com>, <linux-kernel@...r.kernel.org>,
<cgroups@...r.kernel.org>
Subject: Re: [PATCH v5] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out.
Hello,
On Wed, Aug 27, 2025 at 10:22:07AM +0800, xupengbo wrote:
> When a task is migrated out, there is a probability that the tg->load_avg
> value will become abnormal. The reason is as follows.
>
> 1. Due to the 1ms update period limitation in update_tg_load_avg(), there
> is a possibility that the reduced load_avg is not updated to tg->load_avg
> when a task migrates out.
> 2. Even though __update_blocked_fair() traverses the leaf_cfs_rq_list and
> calls update_tg_load_avg() for cfs_rqs that are not fully decayed, the key
> function cfs_rq_is_decayed() does not check whether
> cfs->tg_load_avg_contrib is null. Consequently, in some cases,
> __update_blocked_fair() removes cfs_rqs whose avg.load_avg has not been
> updated to tg->load_avg.
>
> Add a check of cfs_rq->tg_load_avg_contrib in cfs_rq_is_decayed(),
> which fixes the case (2.) mentioned above.
>
> Fixes: 1528c661c24b ("sched/fair: Ratelimit update to tg->load_avg")
> Tested-by: Aaron Lu <ziqianlu@...edance.com>
> Reviewed-by: Aaron Lu <ziqianlu@...edance.com>
> Reviewed-by: Vincent Guittot <vincent.guittot@...aro.org>
> Signed-off-by: xupengbo <xupengbo@...o.com>
I wonder if there are any more concerns about this patch? If no, I hope
this fix can be merged. It's a rare case but it does happen for some
specific setup.
Sorry if this is a bad timing, but I just hit an oncall where this exact
problem occurred so I suppose it's worth a ping :)
Best regards,
Aaron
Powered by blists - more mailing lists