[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090216203055.GA5135@redhat.com>
Date: Mon, 16 Feb 2009 21:30:55 +0100
From: Oleg Nesterov <oleg@...hat.com>
To: Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Nick Piggin <npiggin@...e.de>,
Jens Axboe <jens.axboe@...cle.com>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Ingo Molnar <mingo@...e.hu>,
Rusty Russell <rusty@...tcorp.com.au>,
Steven Rostedt <rostedt@...dmis.org>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/4] generic-smp: remove single ipi fallback for
smp_call_function_many()
On 02/16, Peter Zijlstra wrote:
>
> On Mon, 2009-02-16 at 20:10 +0100, Oleg Nesterov wrote:
> > Actually I don't understand the counter/free_list logic.
> >
> > generic_smp_call_function_interrupt:
> >
> > /*
> > * When the global queue is empty, its guaranteed that no cpu
> > * is still observing any entry on the free_list, therefore
> > * we can go ahead and unlock them.
> > */
> > if (!--call_function.counter)
> > list_splice_init(&call_function.free_list, &free_list);
> >
> > I can't see why "its guaranteed that no cpu ...". Let's suppose CPU 0
> > "hangs" for some reason in generic_smp_call_function_interrupt() right
> > before "if (!cpumask_test_cpu(cpu, data->cpumask))" test. Then it is
> > possible that another CPU removes the single entry (which doesn't have
> > CPU 0 in data->cpumask) from call_function.queue.
>
> Then call_function.counter wouldn't be 0, and we would not release the
> list entries.
Why it wouldn't be 0? IOW, do you mean that the spurious CALL_FUNCTION_VECTOR
is not possible?
OK, let's suppose CPUs 1 and 2 both do smp_call_function_many(cpumask_of(0)).
CPU_1 does arch_send_call_function_ipi_mask() and returns.
CPU_2 inserts cfd_data[2] and hangs before arch_send_call_function_ipi_mask().
CPU_0 sees the interrupt and removes both entries.
CPU_2 proceeds, and sends IPI to CPU_0 again.
Now, when CPU_0 sees CALL_FUNCTION_VECTOR interrupt and calls
generic_smp_call_function_interrupt(), there is no work for this CPU,
so the above race is possible even with counter/free_list.
The new entry can be inserted (counter becomes 1) and quickly removed
(counter becomes 0) while CPU 0 still sees it on list.
Unless I misread the patch of course...
> > Now, if that entry is re-added, we can have a problem if CPU 0 sees
> > the partly initialized entry.
>
> Right, so the scenario is that a cpu observes the list entry after we do
> list_del_rcu() -- most likely a cpu not in the original mask, taversing
> the list for a next entry -- we have to wait for some quiescent state to
> ensure nobody is still referencing it.
Yes I see. But if we change generic_smp_call_function_interrupt() to
re-check cpumask_test_cpu(cpu, data->cpumask) under data->lock then
we don't have to wait for quiescent state, afaics. And we have to
take this lock anyway.
Because smp_call_function_many() does mb(), but I can't see how it
can help. Some CPU from ->cpumask can already execute
generic_smp_call_function_interrupt() before we do
arch_send_call_function_ipi_mask().
> > But why do we need counter/free_list at all?
> > Can't we do the following:
> >
> > - initialize call_function_data.lock at boot time
> >
> > - change smp_call_function_many() to initialize cfd_data
> > under ->lock
> >
> > - change generic_smp_call_function_interrupt() to do
> >
> > list_for_each_entry_rcu(data) {
> >
> > if (!cpumask_test_cpu(cpu, data->cpumask))
> > continue;
> >
> > spin_lock(&data->lock);
> > if (!cpumask_test_cpu(cpu, data->cpumask)) {
> > spin_unlock(data->lock);
> > continue;
> > }
> >
> > cpumask_clear_cpu(cpu, data->cpumask);
> > refs = --data->refs;
> >
> > func = data->func;
> > info = data->info;
> > spin_unlock(&data->lock);
> >
> > func(info);
> >
> > if (refs)
> > continue;
> >
> > ...
> > }
> >
> > Afaics, it is OK if smp_call_function_many() sees the "unneeded" entry on
> > list, the only thing we must ensure is that we can't "misunderstand" this
> > entry.
> >
> > No?
>
> Hmm, could that not leed to loops, and or missed entries in the
> list-iteration?
How? when we lock data->lock and see this cpu in the ->cpumask,
we can be sure we can proceed?
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists