lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20160708111214.25ef91e6@gandalf.local.home>
Date:	Fri, 8 Jul 2016 11:12:14 -0400
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Clark Williams <williams@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] sched/rt: Avoid sending an IPI to a CPU already doing a
 push

On Fri, 8 Jul 2016 16:51:53 +0200
Peter Zijlstra <peterz@...radead.org> wrote:

> On Fri, Jun 24, 2016 at 11:26:13AM -0400, Steven Rostedt wrote:
> > The IPI code now needs to call push_tasks() instead of just push_task() as
> > it will not be receiving an IPI for each CPU that is requesting a PULL.  
> 
> My brain just skidded on that, can you try again with a few more words?

Sure. One of those cases where I lose track of knowing what I know, and
don't describe enough.

The original logic expected each pull request to have its own IPI sent
out. Thus, a CPU would only do a single push, knowing that only one CPU
opened up (lowered its priority). Although, it still sends another IPI
out to the next CPU regardless of a push if a higher task is waiting.
This may be solved with a cpupri of next highest waiters. But that's a
bit complex for now.

Anyway, the old code would have multiple CPUs sending IPIs out to CPUs.
One for each CPU that lowers its prio. The new code checks if a CPU is
already processing an IPI (via the ipi counter) and will skip sending
to that CPU. That means that if two CPUs lower its priority, and only
one CPU has multiple RT tasks waiting, it needs to do multiple
push_task() calls (which push_tasks() does).

Better? I can update the change log.

> 
> 
> > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > index d5690b722691..165bcfdbd94b 100644
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
> > @@ -100,6 +100,7 @@ void init_rt_rq(struct rt_rq *rt_rq)
> >  	rt_rq->push_flags = 0;
> >  	rt_rq->push_cpu = nr_cpu_ids;
> >  	raw_spin_lock_init(&rt_rq->push_lock);
> > +	atomic_set(&rt_rq->ipi_count, 0);
> >  	init_irq_work(&rt_rq->push_work, push_irq_work_func);
> >  #endif
> >  #endif /* CONFIG_SMP */
> > @@ -1917,6 +1918,10 @@ static int find_next_push_cpu(struct rq *rq)
> >  			break;
> >  		next_rq = cpu_rq(cpu);
> >  
> > +		/* If pushing was already started, ignore */
> > +		if (atomic_read(&next_rq->rt.ipi_count))
> > +			continue;
> > +
> >  		/* Make sure the next rq can push to this rq */
> >  		if (next_rq->rt.highest_prio.next < rq->rt.highest_prio.curr)
> >  			break;
> > @@ -1955,6 +1960,7 @@ static void tell_cpu_to_push(struct rq *rq)
> >  		return;
> >  
> >  	rq->rt.push_flags = RT_PUSH_IPI_EXECUTING;
> > +	atomic_inc(&cpu_rq(cpu)->rt.ipi_count);
> >  
> >  	irq_work_queue_on(&rq->rt.push_work, cpu);
> >  }
> > @@ -1974,11 +1980,12 @@ static void try_to_push_tasks(void *arg)
> >  
> >  	rq = cpu_rq(this_cpu);
> >  	src_rq = rq_of_rt_rq(rt_rq);
> > +	WARN_ON_ONCE(!atomic_read(&rq->rt.ipi_count));
> >  
> >  again:
> >  	if (has_pushable_tasks(rq)) {
> >  		raw_spin_lock(&rq->lock);
> > -		push_rt_task(rq);
> > +		push_rt_tasks(rq);  
> 
> Maybe as a comment around here?

Sure.

> 
> >  		raw_spin_unlock(&rq->lock);
> >  	}
> >  
> > @@ -2000,7 +2007,7 @@ again:
> >  	raw_spin_unlock(&rt_rq->push_lock);
> >  
> >  	if (cpu >= nr_cpu_ids)
> > -		return;
> > +		goto out;
> >  
> >  	/*
> >  	 * It is possible that a restart caused this CPU to be
> > @@ -2011,7 +2018,10 @@ again:
> >  		goto again;
> >  
> >  	/* Try the next RT overloaded CPU */
> > +	atomic_inc(&cpu_rq(cpu)->rt.ipi_count);
> >  	irq_work_queue_on(&rt_rq->push_work, cpu);
> > +out:
> > +	atomic_dec(&rq->rt.ipi_count);
> >  }  
> 
> I have a vague feeling we're duplicating state, but I can't seem to spot
> it, maybe I'm wrong.

The above is can be deceiving because there's two atomics close, but
subtly different.

The inc updates curr_rq(cpu) the dec updates just "rq".

> 
> Looks about right, but could use a comment, this stuff is getting rather
> subtle.

Agreed. I'll add that and update it. Again, this patch added to 4.6-rt
makes it pass rteval with a 200us max on that 80 CPU box.

Thanks for looking into this.

-- Steve

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ