linux-kernel - Re: [PATCH] sched/rt: Use cpu_active_mask to prevent rto_push_irq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20221120233543.yerjypgym7laxe42@airbuntu>
Date:   Sun, 20 Nov 2022 23:35:43 +0000
From:   Qais Yousef <qyousef@...alina.io>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     Xuewen Yan <xuewen.yan@...soc.com>, peterz@...radead.org,
        mingo@...hat.com, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        bsegall@...gle.com, mgorman@...e.de, bristot@...hat.com,
        vschneid@...hat.com, ke.wang@...soc.com,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched/rt: Use cpu_active_mask to prevent
 rto_push_irq_work's dead loop

On 11/17/22 17:00, Steven Rostedt wrote:
> On Mon, 14 Nov 2022 20:04:53 +0800
> Xuewen Yan <xuewen.yan@...soc.com> wrote:
> 
> > +++ b/kernel/sched/rt.c
> > @@ -2219,6 +2219,7 @@ static int rto_next_cpu(struct root_domain *rd)
> >  {
> >  	int next;
> >  	int cpu;
> > +	struct cpumask tmp_cpumask;
> 
> If you have a machine with thousands of CPUs, this will likely kill the
> stack.
> 
> >  
> >  	/*
> >  	 * When starting the IPI RT pushing, the rto_cpu is set to -1,
> > @@ -2238,6 +2239,11 @@ static int rto_next_cpu(struct root_domain *rd)
> >  		/* When rto_cpu is -1 this acts like cpumask_first() */
> >  		cpu = cpumask_next(rd->rto_cpu, rd->rto_mask);
> >  
> > +		cpumask_and(&tmp_cpumask, rd->rto_mask, cpu_active_mask);
> > +		if (rd->rto_cpu == -1 && cpumask_weight(&tmp_cpumask) == 1 &&
> > +		    cpumask_test_cpu(smp_processor_id(), &tmp_cpumask))
> > +			break;
> > +
> 
> Kill the above.
> 
> >  		rd->rto_cpu = cpu;
> >  
> >  		if (cpu < nr_cpu_ids) {
> 
> Why not just add here:
> 
> 			if (!cpumask_test_cpu(cpu, cpu_active_mask))
> 				continue;
> 			return cpu;
> 		}
> 
> ?

Or this?


	diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
	index ed2a47e4ddae..5073e06e34c3 100644
	--- a/kernel/sched/rt.c
	+++ b/kernel/sched/rt.c
	@@ -2236,7 +2236,7 @@ static int rto_next_cpu(struct root_domain *rd)
		for (;;) {

			/* When rto_cpu is -1 this acts like cpumask_first() */
	-               cpu = cpumask_next(rd->rto_cpu, rd->rto_mask);
	+               cpu = cpumask_next_and(rd->rto_cpu, rd->rto_mask, cpu_active_mask);

			rd->rto_cpu = cpu;


I think we might be circumventing the root cause though. It looks like that
there are only 2 cpus online in the system and one of them going offline (cpu1)
while the other is being overloaded (cpu0), ending in ping-ponging the tasks?

If that's the case, it seems to me the rto state needs to be cleared for cpu0
and stop attempting to push tasks. Which I'd assume it usually happens but
there's a race that prevents it from realizing this.

Maybe something like this would be better? Assuming I understood the problem
correctly of course.


	diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
	index ed2a47e4ddae..d80072cc2596 100644
	--- a/kernel/sched/rt.c
	+++ b/kernel/sched/rt.c
	@@ -334,6 +334,10 @@ static inline void rt_set_overload(struct rq *rq)
		if (!rq->online)
			return;

	+       /* Overload is meaningless if we shrank to become a UP system */
	+       if (cpumask_weight(cpu_online_mask) == 1)
	+               return;
	+
		cpumask_set_cpu(rq->cpu, rq->rd->rto_mask);
		/*
		 * Make sure the mask is visible before we set

This could be working around the problem too, not sure. But the bug IMHO is
that if we shrink to a UP system, the whole push-pull mechanism should retire
temporarily. Which I assume this usually happens, but there's a race somewhere.


Thanks

--
Qais Yousef