[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <m1ljq5r7lw.fsf@fess.ebiederm.org>
Date: Sun, 12 Apr 2009 12:32:11 -0700
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Gary Hade <garyhade@...ibm.com>
Cc: mingo@...e.hu, mingo@...hat.com, tglx@...utronix.de, hpa@...or.com,
x86@...nel.org, linux-kernel@...r.kernel.org, lcm@...ibm.com
Subject: Re: [PATCH 2/3] [BUGFIX] x86/x86_64: fix CPU offlining triggered inactive device IRQ interrruption
Gary Hade <garyhade@...ibm.com> writes:
> Impact: Eliminates a race that can leave the system in an
> unusable state
>
> During rapid offlining of multiple CPUs there is a chance
> that an IRQ affinity move destination CPU will be offlined
> before the IRQ affinity move initiated during the offlining
> of a previous CPU completes. This can happen when the device
> is not very active and thus fails to generate the IRQ that is
> needed to complete the IRQ affinity move before the move
> destination CPU is offlined. When this happens there is an
> -EBUSY return from __assign_irq_vector() during the offlining
> of the IRQ move destination CPU which prevents initiation of
> a new IRQ affinity move operation to an online CPU. This
> leaves the IRQ affinity set to an offlined CPU.
>
> I have been able to reproduce the problem on some of our
> systems using the following script. When the system is idle
> the problem often reproduces during the first CPU offlining
> sequence.
Ok. I have had a chance to think through what you your patches
are doing and it is assuming the broken logic in cpu_down is correct
and patching over some but not all of the problems.
First the problem is not migrating irqs when IRR is set. The general
problem is that the state machines in most ioapics are fragile and
can get confused if you reprogram them at any point when an irq can
come in. In the middle of an interrupt handler is the one time we
know interrupts can not come in.
To really fix this problem we need to do two things.
1) Tack when irqs that can not be migrated from process context are
on a cpu, and deny cpu hot-unplug.
2) Modify every interrupt that can be safely migrated in interrupt context
to migrate irqs in interrupt context so no one encounters this problem
in practice.
We can update MSIs and do a pci read to know when the update has made it
to a device. Multi MSI is a disaster but I won't go there.
In lowest priority delivery mode when the irq is not changing domain but
just changing the set of possible cpus the interrupt can be delivered to.
And then of course all of the fun iommus that remap irqs.
Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists