[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51AE667F.6030702@candelatech.com>
Date: Tue, 04 Jun 2013 15:13:19 -0700
From: Ben Greear <greearb@...delatech.com>
To: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
CC: Rusty Russell <rusty@...tcorp.com.au>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: 3.9.x: Possible race related to stop_machine leads to lockup.
On 06/04/2013 02:18 PM, Ben Greear wrote:
> I've been trying to figure out why I see the migration/* processes
> hang in a busy loop....
>
> While reading the stop_machine.c file, I think I might have an
> answer.
>
> The set_state() method sets the thread_ack to the current number
> of threads. Each thread's state machine then decrements it down to
> zero where it bumps the state to the next level. This lets each
> cpu stop in lock-step it seems.
>
> But, from what I can tell, the __stop_machine() method can
> (re)set the state to STOPMACHINE_PREPARE while the migration
> processes are in their loop. That would explain why they sometimes
> loop forever.
>
> Does this make sense?
Err, no..that doesn't make sense. 'smdata' is on the stack.
More printk debugging makes it look like one thread just
never notices that smdata->state has been updated by another
thread.
There is this comment..maybe cpu_relax only does the chill out part
and we need something else to make sure smdata->state is freshly
read from the other CPU's cache?
/* Chill out and ensure we re-read stopmachine_state. */
cpu_relax();
if (smdata->state != curstate) {
Gah..way out of my league :P
Ben
--
Ben Greear <greearb@...delatech.com>
Candela Technologies Inc http://www.candelatech.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists