linux-kernel - Re: [PATCH 03/15] sched/fair: Add lag based placement

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtAZM8Y6bcJ9fTEz5C__ohNwhQEiaNEZBMXK-0xDs0_kvw@mail.gmail.com>
Date: Fri, 7 Feb 2025 14:39:28 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Breno Leitao <leitao@...ian.org>
Cc: Peter Zijlstra <peterz@...radead.org>, mingo@...nel.org, linux-kernel@...r.kernel.org, 
	juri.lelli@...hat.com, dietmar.eggemann@....com, rostedt@...dmis.org, 
	bsegall@...gle.com, mgorman@...e.de, bristot@...hat.com, corbet@....net, 
	qyousef@...alina.io, chris.hyser@...cle.com, patrick.bellasi@...bug.net, 
	pjt@...gle.com, pavel@....cz, qperret@...gle.com, tim.c.chen@...ux.intel.com, 
	joshdon@...gle.com, timj@....org, kprateek.nayak@....com, yu.c.chen@...el.com, 
	youssefesmat@...omium.org, joel@...lfernandes.org, efault@....de, 
	tglx@...utronix.de, kernel-team@...a.com
Subject: Re: [PATCH 03/15] sched/fair: Add lag based placement

On Fri, 7 Feb 2025 at 14:38, Breno Leitao <leitao@...ian.org> wrote:
>
> On Fri, Feb 07, 2025 at 12:11:41PM +0100, Peter Zijlstra wrote:
> > On Fri, Feb 07, 2025 at 02:07:18AM -0800, Breno Leitao wrote:
> > > Hello Peter,
> > >
> > > On Wed, May 31, 2023 at 01:58:42PM +0200, Peter Zijlstra wrote:
> > > >
> > > >  place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> > > >  {
> > > <snip>
> > > > -         vruntime -= thresh;
> > > > +         lag *= load + se->load.weight;
> > > > +         if (WARN_ON_ONCE(!load))
> > >
> > > I have 6.13 running on some hosts, and in some cases, where the system
> > > is getting some OOMs, I see the following stack:
> > >
> > >           WARNING: CPU: 29 PID: 593474 at kernel/sched/fair.c:5250 place_entity+0x199/0x1b0
> > >
> > >            Call Trace:
> > >             <TASK>
> > >             ? place_entity+0x199/0x1b0
> > >             reweight_entity+0x188/0x200
> > >             enqueue_task_fair.llvm.15448040313737105663+0x28c/0x560
> > >             enqueue_task+0x30/0x120
> > >             ttwu_do_activate+0x99/0x230
> > >             try_to_wake_up+0x25a/0x4a0
> > >             ? hrtimer_dummy_timeout+0x10/0x10
> > >             hrtimer_wakeup+0x25/0x30
> > >             __hrtimer_run_queues+0xf1/0x250
> > >             hrtimer_interrupt+0xfb/0x220
> > >             __sysvec_apic_timer_interrupt+0x47/0x140
> > >             sysvec_apic_timer_interrupt+0x35/0x80
> > >             asm_sysvec_apic_timer_interrupt+0x16/0x20
> > >
> > > I am sorry for not decoding the stack, but I am having a hard time
> > > decoding the stack properly. The values I got was misleading, and I am
> > > working to understand what is happening.
> > >
> > > Anyway, I don't have a reproducer and this problem doesn't happen
> > > frequent enough. I have 1K hosts with 6.13 and I saw it 5 times in the
> > > last week.
> >
> > Weird. Would you mind trying with the below patch on top?
>
> I tried to get this patch on top of latest 6.13 stable (6.13.1), and it
> seems it misses some dependencies in stable. Field cfs_rq->nr_queued
> don't exist in 6.13, it was added/renamed later by 736c55a02c477a
> ("sched/fair: Rename cfs_rq.nr_running into nr_queued").
>
> Is it safe to just s/nr_queued/nr_running in our patch?

Yes, I was about to mention that nr_queued appeared in v6.14-rc1 and
it' still nr_running in v6.13

>
> Thanks