[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4B13935C.3040407@redhat.com>
Date: Mon, 30 Nov 2009 11:41:48 +0200
From: Avi Kivity <avi@...hat.com>
To: Tejun Heo <tj@...nel.org>
CC: Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
Jiri Slaby <jirislaby@...il.com>, linux-kernel@...r.kernel.org,
akpm@...ux-foundation.org, mm-commits@...r.kernel.org,
Marcelo Tosatti <mtosatti@...hat.com>, kvm@...r.kernel.org,
the arch/x86 maintainers <x86@...nel.org>,
Ingo Molnar <mingo@...e.hu>
Subject: Re: WARNING: kernel/smp.c:292 smp_call_function_single [Was: mmotm
2009-11-24-16-47 uploaded]
On 11/30/2009 10:58 AM, Tejun Heo wrote:
> Hello,
>
> On 11/28/2009 09:12 PM, Avi Kivity wrote:
>
>>> Hmm, commit 498657a moved the fire_sched_in_preempt_notifiers() call
>>> into the irqs disabled section recently.
>>>
>>> sched, kvm: Fix race condition involving sched_in_preempt_notifers
>>>
>>> In finish_task_switch(), fire_sched_in_preempt_notifiers() is
>>> called after finish_lock_switch().
>>>
>>> However, depending on architecture, preemption can be enabled after
>>> finish_lock_switch() which breaks the semantics of preempt
>>> notifiers.
>>>
>>> So move it before finish_arch_switch(). This also makes the in-
>>> notifiers symmetric to out- notifiers in terms of locking - now
>>> both are called under rq lock.
>>>
>>> It's not a surprise that this breaks the existing code which does the
>>> smp function call.
>>>
>> Yes, kvm expects preempt notifiers to be run with irqs enabled. Copying
>> patch author.
>>
> Hmmm... then, it's broken both ways. The previous code may get
> preempted after scheduling but before the notifier is run (which
> breaks the semantics of the callback horribly), the current code
> doesn't satisfy kvm's requirement. Another thing is that in the
> previous implementation the context is different between the 'in' and
> 'out' callbacks, which is subtle and nasty. Can kvm be converted to
> not do smp calls directly?
>
No. kvm uses preempt notifiers to manage extended processor registers
(much like the fpu). If we're scheduled into cpu A but state is
currently live on cpu B, we need to go ahead and pull it in.
Technically, we can delay the IPI to happen after the sched in notifier;
we can set some illegal state in cpu A and handle the exception by
sending the IPI and fixing up the state. But that would be slower, and
not help the problem at all since some accesses happen with interrupts
disabled.
Since this is essentially the same problem as the fpu, maybe we can
solve it the same way. How does the fpu migrate its state across
processors? One way would be to save the state when the task is
selected for migration.
> For the time being, maybe it's best to back out the fix given that the
> only architecture which may be affected by the original bug is ia64
> which is the only one with both kvm and the unlocked context switch.
>
Agreed.
--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists