[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151007123046.GA21460@redhat.com>
Date: Wed, 7 Oct 2015 14:30:46 +0200
From: Oleg Nesterov <oleg@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: heiko.carstens@...ibm.com, linux-kernel@...r.kernel.org,
Tejun Heo <tj@...nel.org>, Ingo Molnar <mingo@...nel.org>,
Rik van Riel <riel@...hat.com>
Subject: Re: [RFC][PATCH] sched: Start stopper early
On 10/07, Peter Zijlstra wrote:
>
> So Heiko reported some 'interesting' fail where stop_two_cpus() got
> stuck in multi_cpu_stop() with one cpu waiting for another that never
> happens.
>
> It _looks_ like the 'other' cpu isn't running and the current best
> theory is that we race on cpu-up and get the stop_two_cpus() call in
> before the stopper task is running.
>
> This _is_ possible because we set 'online && active'
Argh. Can't really comment this change right now, but this reminds me
that stop_two_cpus() path should not rely on cpu_active() at all. I mean
we should not use this check to avoid the deadlock, migrate_swap_stop()
can check it itself. And cpu_stop_park()->cpu_stop_signal_done() should
be replaced by BUG_ON().
Probably slightly off-topic, but what do you finally think about the old
"[PATCH v2 6/6] stop_machine: kill stop_cpus_lock and lg_double_lock/unlock()"
we discussed in http://marc.info/?t=143750670300014 ?
I won't really insist if you still dislike it, but it seems we both
agree that "lg_lock stop_cpus_lock" must die in any case, and after that
we can the cleanups mentioned above.
And, Peter, I see a lot of interesting emails from you, but currently
can't even read them. I hope very much I will read them later and perhaps
even reply ;)
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists