[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <45E6BA3A.2040306@vmware.com>
Date: Thu, 01 Mar 2007 03:34:18 -0800
From: Zachary Amsden <zach@...are.com>
To: Andrew Morton <akpm@...ux-foundation.org>
CC: Daniel Hecht <dhecht@...are.com>,
Rusty Russell <rusty@...tcorp.com.au>,
Linus Torvalds <torvalds@...l.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Bug in on_each_cpu?
Andrew Morton wrote:
> On Thu, 01 Mar 2007 02:34:02 -0800 Zachary Amsden <zach@...are.com> wrote:
>
>
>> I think on_each_cpu has a serious bug now that hrtimers has been
>> integrated. Basically, there are many callsites of clock_was_set, none
>> of which actually know the current interrupt state - enabled or
>> disabled. So they save and restore irqs, as they should. But
>> on_each_cpu does not do so, creating bugs in any function which calls
>> these functions with interrupts disabled.
>>
>
> on_each_cpu() calls smp_call_function(). It is a bug to call
> smp_call_function() with local interrupts disabled, hence it is a bug to
> call on_each_cpu() with local interrupts disabled.
>
> There is a WARN_ON() in smp_call_function() to catch this.
>
Ok, I thought that might be the case. We probably infinite looped in
the interrupt handler because of bad state and thus missed the WARN_ON
output.
> Why is it a bug? Because there's a deadlock where this CPU is waiting for
> CPU A to take the IPI, but CPU A is waiting (with interrupts disabled) for
> this CPU to take an IPI.
>
Then the bug is not in on_each_cpu(). It is in the usage of
clock_was_set(). For example, look at do_settimeofday in kernel/timer.c:
write_sequnlock_irqrestore(&xtime_lock, flags);
/* signal hrtimers about time change */
clock_was_set();
return 0;
And timekeeping_resume has similar code (and called from a sysdev
callback, so I don't know what the interrupt state should be ). Either
the write_sequnlock_irqrestore is redundant, and should be merely an
write_sequnlock_irq, or the callsite is not prepared to handle enabling
interrupts temporarily as must be done for on_each_cpu(), which is a
pretty scary scenario.
What would be really, really nice would be to statically check all
callsites that issue irq disables actually keep irqs disabled.
Presumably, there was a reason they disabled irqs, and re-enabling them
underneath their noses, even if it is to avoid a race, breaks the logic
behind that reason.
Basically, it is just always a bug to re-enable interrupts from a caller
which has disabled them. Either there is no reason they needed to
disable them in the first place, and it is just a performance issue, or
they are updating some shared state used by the interrupt handler, which
can then operate incorrectly, or they were taking some lock to prevent
this, and the interrupt handler can then attempt to take a nested lock
and hang forever.
In our case, the bug is easily worked around - we don't actually need to
set the wallclock time here. But there is a deeper fundamental issue
which I think needs to be addressed.
Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists