[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250207-tunneling-tested-koel-c59d33@leitao>
Date: Fri, 7 Feb 2025 04:25:02 -0800
From: Breno Leitao <leitao@...ian.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: mingo@...nel.org, vincent.guittot@...aro.org,
linux-kernel@...r.kernel.org, juri.lelli@...hat.com,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, bristot@...hat.com, corbet@....net,
qyousef@...alina.io, chris.hyser@...cle.com,
patrick.bellasi@...bug.net, pjt@...gle.com, pavel@....cz,
qperret@...gle.com, tim.c.chen@...ux.intel.com, joshdon@...gle.com,
timj@....org, kprateek.nayak@....com, yu.c.chen@...el.com,
youssefesmat@...omium.org, joel@...lfernandes.org, efault@....de,
tglx@...utronix.de, kernel-team@...a.com
Subject: Re: [PATCH 03/15] sched/fair: Add lag based placement
Hello Peter,
On Fri, Feb 07, 2025 at 12:11:41PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 07, 2025 at 02:07:18AM -0800, Breno Leitao wrote:
> > Hello Peter,
> >
> > On Wed, May 31, 2023 at 01:58:42PM +0200, Peter Zijlstra wrote:
> > >
> > > place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> > > {
> > <snip>
> > > - vruntime -= thresh;
> > > + lag *= load + se->load.weight;
> > > + if (WARN_ON_ONCE(!load))
> >
> > I have 6.13 running on some hosts, and in some cases, where the system
> > is getting some OOMs, I see the following stack:
> >
> > WARNING: CPU: 29 PID: 593474 at kernel/sched/fair.c:5250 place_entity+0x199/0x1b0
> >
> > Call Trace:
> > <TASK>
> > ? place_entity+0x199/0x1b0
> > reweight_entity+0x188/0x200
> > enqueue_task_fair.llvm.15448040313737105663+0x28c/0x560
> > enqueue_task+0x30/0x120
> > ttwu_do_activate+0x99/0x230
> > try_to_wake_up+0x25a/0x4a0
> > ? hrtimer_dummy_timeout+0x10/0x10
> > hrtimer_wakeup+0x25/0x30
> > __hrtimer_run_queues+0xf1/0x250
> > hrtimer_interrupt+0xfb/0x220
> > __sysvec_apic_timer_interrupt+0x47/0x140
> > sysvec_apic_timer_interrupt+0x35/0x80
> > asm_sysvec_apic_timer_interrupt+0x16/0x20
> >
> > I am sorry for not decoding the stack, but I am having a hard time
> > decoding the stack properly. The values I got was misleading, and I am
> > working to understand what is happening.
> >
> > Anyway, I don't have a reproducer and this problem doesn't happen
> > frequent enough. I have 1K hosts with 6.13 and I saw it 5 times in the
> > last week.
>
> Weird. Would you mind trying with the below patch on top?
Thanks for the quick answer. I will pick it on top of our tree, and
start rolling it to those 1k hosts I have been playing with.
I haven't seen this patch in stable. Is it queue up for the next
submission for stable?
Powered by blists - more mailing lists