[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20251128141557.GA1598584@bytedance.com>
Date: Fri, 28 Nov 2025 22:15:57 +0800
From: "Aaron Lu" <ziqianlu@...edance.com>
To: "Peter Zijlstra" <peterz@...radead.org>
Cc: "xupengbo" <xupengbo1029@....com>, "Ingo Molnar" <mingo@...hat.com>,
"Juri Lelli" <juri.lelli@...hat.com>,
"Vincent Guittot" <vincent.guittot@...aro.org>,
"Dietmar Eggemann" <dietmar.eggemann@....com>,
"Steven Rostedt" <rostedt@...dmis.org>,
"Ben Segall" <bsegall@...gle.com>, "Mel Gorman" <mgorman@...e.de>,
"Valentin Schneider" <vschneid@...hat.com>,
"David Vernet" <void@...ifault.com>, <linux-kernel@...r.kernel.org>,
<cgroups@...r.kernel.org>
Subject: Re: [PATCH v5] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out.
On Fri, Nov 28, 2025 at 02:40:17PM +0100, Peter Zijlstra wrote:
> On Fri, Nov 28, 2025 at 07:54:45PM +0800, Aaron Lu wrote:
> > Hello,
> >
> > On Wed, Aug 27, 2025 at 10:22:07AM +0800, xupengbo wrote:
> > > When a task is migrated out, there is a probability that the tg->load_avg
> > > value will become abnormal. The reason is as follows.
> > >
> > > 1. Due to the 1ms update period limitation in update_tg_load_avg(), there
> > > is a possibility that the reduced load_avg is not updated to tg->load_avg
> > > when a task migrates out.
> > > 2. Even though __update_blocked_fair() traverses the leaf_cfs_rq_list and
> > > calls update_tg_load_avg() for cfs_rqs that are not fully decayed, the key
> > > function cfs_rq_is_decayed() does not check whether
> > > cfs->tg_load_avg_contrib is null. Consequently, in some cases,
> > > __update_blocked_fair() removes cfs_rqs whose avg.load_avg has not been
> > > updated to tg->load_avg.
> > >
> > > Add a check of cfs_rq->tg_load_avg_contrib in cfs_rq_is_decayed(),
> > > which fixes the case (2.) mentioned above.
> > >
> > > Fixes: 1528c661c24b ("sched/fair: Ratelimit update to tg->load_avg")
> > > Tested-by: Aaron Lu <ziqianlu@...edance.com>
> > > Reviewed-by: Aaron Lu <ziqianlu@...edance.com>
> > > Reviewed-by: Vincent Guittot <vincent.guittot@...aro.org>
> > > Signed-off-by: xupengbo <xupengbo@...o.com>
> >
> > I wonder if there are any more concerns about this patch? If no, I hope
> > this fix can be merged. It's a rare case but it does happen for some
> > specific setup.
> >
> > Sorry if this is a bad timing, but I just hit an oncall where this exact
> > problem occurred so I suppose it's worth a ping :)
>
> Totally missed it. Seems okay, let me go queue the thing.
Thanks Peter!
Powered by blists - more mailing lists