[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110616171644.GK2582@linux.vnet.ibm.com>
Date: Thu, 16 Jun 2011 10:16:44 -0700
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: Ingo Molnar <mingo@...e.hu>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Tim Chen <tim.c.chen@...ux.intel.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Hugh Dickins <hughd@...gle.com>,
KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
David Miller <davem@...emloft.net>,
Martin Schwidefsky <schwidefsky@...ibm.com>,
Russell King <rmk@....linux.org.uk>,
Paul Mundt <lethal@...ux-sh.org>,
Jeff Dike <jdike@...toit.com>,
Richard Weinberger <richard@....at>,
Tony Luck <tony.luck@...el.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
Mel Gorman <mel@....ul.ie>, Nick Piggin <npiggin@...nel.dk>,
Namhyung Kim <namhyung@...il.com>, ak@...ux.intel.com,
shaohua.li@...el.com, alex.shi@...el.com,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
"Rafael J. Wysocki" <rjw@...k.pl>
Subject: Re: [GIT PULL] Re: REGRESSION: Performance regressions from
switching anon_vma->lock to mutex
On Thu, Jun 16, 2011 at 09:03:35AM +0200, Ingo Molnar wrote:
>
> * Linus Torvalds <torvalds@...ux-foundation.org> wrote:
>
> >
> >
> > Ingo Molnar <mingo@...e.hu> wrote:
> > >
> > > I have this fix queued up currently:
> > >
> > > 09223371deac: rcu: Use softirq to address performance regression
> >
> > I really don't think that is even close to enough.
>
> Yeah.
>
> > It still does all the callbacks in the threads, and according to
> > Peter, about half the rcu time in the threads remained..
>
> You are right - things that are a few percent on a 24 core machine
> will definitely go exponentially worse on larger boxen. We'll get rid
> of the kthreads entirely.
I did indeed at one time have access to larger test systems than I
do now, and I clearly need to fix that. :-/
> The funny thing about this workload is that context-switches are
> really a fastpath here and we are using anonymous IRQ-triggered
> softirqs embedded in random task contexts as a workaround for that.
The other thing that the IRQ-triggered softirqs do is to get the callbacks
invoked in cases where a CPU-bound user thread is never context switching.
Of course, one alternative might be to set_need_resched() to force entry
into the scheduler as needed.
> [ I think we'll have to revisit this issue and do it properly:
> quiescent state is mostly defined by context-switches here, so we
> could do the RCU callbacks from the task that turns a CPU
> quiescent, right in the scheduler context-switch path - perhaps
> with an option for SCHED_FIFO tasks to *not* do GC.
I considered this approach for TINY_RCU, but dropped it in favor of
reducing the interlocking between the scheduler and RCU callbacks.
Might be worth revisiting, though. If SCHED_FIFO task omit RCU callback
invocation, then there will need to be some override for CPUs with lots
of SCHED_FIFO load, probably similar to RCU's current blimit stuff.
> That could possibly be more cache-efficient than softirq execution,
> as we'll process a still-hot pool of callbacks instead of doing
> them only once per timer tick. It will also make the RCU GC
> behavior HZ independent. ]
Well, the callbacks will normally be cache-cold in any case due to the
grace-period delay, but on the other hand, both tick-independence and
the ability to shield a given CPU from RCU callback execution might be
quite useful. The tick currently does the following for RCU:
1. Informs RCU of user-mode execution (rcu_sched and rcu_bh
quiescent state).
2. Informs RCU of non-dyntick idle mode (again, rcu_sched and
rcu_bh quiescent state).
3. Kicks the current CPU's RCU core processing as needed in
response to actions from other CPUs.
Frederic's work avoiding ticks in long-running user-mode tasks
might take care of #1, and it should be possible to make use of
the current dyntick-idle APIs to deal with #2. Replacing #3
efficiently will take some thought.
> In any case the proxy kthread model clearly sucked, no argument about
> that.
Indeed, I lost track of the global nature of real-time scheduling. :-(
Whatever does the boosting will need to have process context and
can be subject to delays, so that pretty much needs to be a kthread.
But it will context-switch quite rarely, so should not be a problem.
Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists