lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANDhNCq7qvW-CujA+bYzoK1=BJ_TEk6WD2fQJtOpcTC1fjNcfA@mail.gmail.com>
Date: Wed, 30 Apr 2025 15:04:06 -0700
From: John Stultz <jstultz@...gle.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: LKML <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...hat.com>, 
	Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>, 
	Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, 
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Valentin Schneider <vschneid@...hat.com>, K Prateek Nayak <kprateek.nayak@....com>, kernel-team@...roid.com, 
	peter-yc.chang@...iatek.com
Subject: Re: [PATCH v3] sched/core: Tweak wait_task_inactive() to force
 dequeue sched_delayed tasks

On Wed, Apr 30, 2025 at 5:43 AM Peter Zijlstra <peterz@...radead.org> wrote:
> On Tue, Apr 29, 2025 at 08:07:26AM -0700, John Stultz wrote:
> > It was reported that in 6.12, smpboot_create_threads() was
> > taking much longer then in 6.6.
> >
> > I narrowed down the call path to:
> >  smpboot_create_threads()
> >  -> kthread_create_on_cpu()
> >     -> kthread_bind()
> >        -> __kthread_bind_mask()
> >           ->wait_task_inactive()
> >
> > Where in wait_task_inactive() we were regularly hitting the
> > queued case, which sets a 1 tick timeout, which when called
> > multiple times in a row, accumulates quickly into a long
> > delay.
> >
> > I noticed disabling the DELAY_DEQUEUE sched feature recovered
> > the performance, and it seems the newly create tasks are usually
> > sched_delayed and left on the runqueue.
> >
> > So in wait_task_inactive() when we see the task
> > p->se.sched_delayed, manually dequeue the sched_delayed task
> > with DEQUEUE_DELAYED, so we don't have to constantly wait a
> > tick.
>
> ---
>
> (that is, I'll trim the Changelog a this point, seeing how the rest is
> 'discussion')
>

Ah, thanks. I've noted you tweaking my commit messages before merging,
so I'll try to do better about leaving ephemeral notes (and Cc lists,
apparently) after the "---" fold.
My apologies for the trouble!


> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index c81cf642dba05..b986cd2fb19b7 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -2283,6 +2283,12 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state
> >                * just go back and repeat.
> >                */
> >               rq = task_rq_lock(p, &rf);
> > +             /*
> > +              * If task is sched_delayed, force dequeue it, to avoid always
> > +              * hitting the tick timeout in the queued case
> > +              */
> > +             if (p->se.sched_delayed)
> > +                     dequeue_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_DELAYED);
> >               trace_sched_wait_task(p);
> >               running = task_on_cpu(rq, p);
> >               queued = task_on_rq_queued(p);
>
> Lets just do this. I'll to stick it in queue/sched/core.

Ok, thanks so much!
-john

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ