[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100914111954.GB2201@osiris.boeblingen.de.ibm.com>
Date: Tue, 14 Sep 2010 13:19:54 +0200
From: Heiko Carstens <heiko.carstens@...ibm.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Suresh Siddha <suresh.b.siddha@...el.com>,
Venkatesh Pallipadi <venki@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Ingo Molnar <mingo@...e.hu>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Jens Axboe <axboe@...nel.dk>
Subject: Re: [PATCH] generic-ipi: fix deadlock in __smp_call_function_single
On Tue, Sep 14, 2010 at 10:03:47AM +0200, Peter Zijlstra wrote:
> On Mon, 2010-09-13 at 11:02 -0700, Suresh Siddha wrote:
> > On Sat, 2010-09-11 at 09:42 -0700, Venkatesh Pallipadi wrote:
> > > Also, as we don't have rq lock around this point, it seems possible
> > > that the CPU that was busy and wants to kick idle load balance on
> > > remote CPU, could have become idle and nominated itself as idle load
> > > balancer.
> >
> > A busy cpu (currently running something -- one task on the rq atleast)
> > can't become idle in the middle of trigger_load_balance().
> >
> > What might be happening is similar what you said but the opposite of it.
> >
> > cpu-x is idle which is also ilb_cpu
> > got a scheduler tick during idle
> > and the nohz_kick_needed() in trigger_load_balance() checks for
> > rq_x->nr_running which might not be zero (because of someone waking a
> > task on this rq etc) and this leads to the situation of the cpu-x
> > sending a kick to itself.
>
> So what patches are we going to merge?
>
> I share Heiko's opinion on that its somewhat surprising to have
> __smp_call_function_single() differ in this detail from
> smp_call_function_single() and think that merging his patch would be
> good in that respect. But Andrew seemed to have reservations.
>
> We can also merge either my or Suresh's patch (which I think makes
> sense, but is kinda subtle) to avoid the needless self kick.
I would prefer to see your's or Suresh's scheduler patch to be merged to
fix the bug.
My patch could be merged for 2.6.37 or be dropped in favour of a WARN_ON
in __smp_call_function_single() if remote cpu == current cpu.
However I think it would be better if smp_call_function_single() and
__smp_call_function_single() wouldn't differ here.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists