[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150803145836.GA13173@redhat.com>
Date: Mon, 3 Aug 2015 16:58:36 +0200
From: Oleg Nesterov <oleg@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...nel.org>, Rik van Riel <riel@...hat.com>,
Tejun Heo <tj@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 6/6] stop_machine: kill stop_cpus_lock and
lg_double_lock/unlock()
On 07/30, Peter Zijlstra wrote:
>
> On Tue, Jul 21, 2015 at 09:22:47PM +0200, Oleg Nesterov wrote:
>
> > + err = -EDEADLK;
> > + if (stop_work_pending(stopper1) != stop_work_pending(stopper2))
> > + goto unlock;
>
> You could DoS/false positive this by running stop_one_cpu() in a loop,
> and thereby 'always' having work pending on one but not the other.
as we already discussed this is not a problem.
> > + if (unlikely(err == -EDEADLK)) {
> > + cond_resched();
> > + goto retry;
>
> And this just gives me -rt nightmares.
Why?
> As it is, -rt does horrible things to stop_machine, and I would very
> much like to make it such that we don't need to do that.
>
> Now, obviously, stop_cpus() is _BAD_ for -rt, and we try real hard to
> make sure that doesn't happen,
Yes. stop_cpus() is already bad so I am not sure I understand why this
change make the things really worse.
stop_two_cpus() needs to spin/retry if it races with the main loop in
queue_stop_cpus_work(),
preempt_disable();
for_each_cpu(cpu, cpumask) {
work = &per_cpu(cpu_stopper.stop_work, cpu);
work->fn = fn;
work->arg = arg;
work->done = done;
cpu_stop_queue_work(cpu, work);
}
preempt_enable();
and iirc preempt_disable() means "disable preemption" even in -rt, but
I am not sure. So "goto retry" should be really, really unlikely.
Besides, whatever we do stop_two_cpus(X, Y) will wait anyway if ->stop_work
was queued on X or Y anyway. And with your patch in the next email it will
spin too (yes, yes, -rt differs).
Another case when stop_two_cpus(X, Y) needs to retry is when ->stop_work
was already dequeued on CPU X but not on CPU Y (and this is why it needs
cond_resched() for CONFIG_PREEMPT=n, it can run on CPU Y). This does not
look really bad too, the migration/Y thread is already activated and it
has the highest priority.
So I still think that at least correctness wise this patch is fine. Am I
missed something else?
> Paul's RCU branch already kills try_stop_cpus() dead, so that wart is
> also gone. But we're still stuck with stop_machine_from_inactive_cpu()
> which does a spin-wait for exclusive state. So I suppose we'll have to
> keep stop_cpus_mutex :/
Yes, we still need stop_cpus_mutex. Even if we remove try_stop_cpus() and
stop_machine_from_inactive_cpu(). But this is another issue.
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists