[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51AE667F.6030702@candelatech.com>
Date:	Tue, 04 Jun 2013 15:13:19 -0700
From:	Ben Greear <greearb@...delatech.com>
To:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
CC:	Rusty Russell <rusty@...tcorp.com.au>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: 3.9.x:  Possible race related to stop_machine leads to lockup.
On 06/04/2013 02:18 PM, Ben Greear wrote:
> I've been trying to figure out why I see the migration/* processes
> hang in a busy loop....
>
> While reading the stop_machine.c file, I think I might have an
> answer.
>
> The set_state() method sets the thread_ack to the current number
> of threads.  Each thread's state machine then decrements it down to
> zero where it bumps the state to the next level.  This lets each
> cpu stop in lock-step it seems.
>
> But, from what I can tell, the __stop_machine() method can
> (re)set the state to STOPMACHINE_PREPARE while the migration
> processes are in their loop.  That would explain why they sometimes
> loop forever.
>
> Does this make sense?
Err, no..that doesn't make sense.  'smdata' is on the stack.
More printk debugging makes it look like one thread just
never notices that smdata->state has been updated by another
thread.
There is this comment..maybe cpu_relax only does the chill out part
and we need something else to make sure smdata->state is freshly
read from the other CPU's cache?
		/* Chill out and ensure we re-read stopmachine_state. */
		cpu_relax();
		if (smdata->state != curstate) {
Gah..way out of my league :P
Ben
-- 
Ben Greear <greearb@...delatech.com>
Candela Technologies Inc  http://www.candelatech.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Powered by blists - more mailing lists
 
