linux-kernel - Re: [PATCH 2/3] [BUGFIX] x86/x86_64: fix CPU offlining triggered inactive device IRQ interrruption

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090413210913.GC8393@us.ibm.com>
Date:	Mon, 13 Apr 2009 14:09:13 -0700
From:	Gary Hade <garyhade@...ibm.com>
To:	"Eric W. Biederman" <ebiederm@...ssion.com>
Cc:	Gary Hade <garyhade@...ibm.com>, mingo@...e.hu, mingo@...hat.com,
	tglx@...utronix.de, hpa@...or.com, x86@...nel.org,
	linux-kernel@...r.kernel.org, lcm@...ibm.com
Subject: Re: [PATCH 2/3] [BUGFIX] x86/x86_64: fix CPU offlining triggered
	inactive device IRQ interrruption

On Sun, Apr 12, 2009 at 12:32:11PM -0700, Eric W. Biederman wrote:
> Gary Hade <garyhade@...ibm.com> writes:
> 
> > Impact: Eliminates a race that can leave the system in an
> >         unusable state
> >
> > During rapid offlining of multiple CPUs there is a chance
> > that an IRQ affinity move destination CPU will be offlined
> > before the IRQ affinity move initiated during the offlining
> > of a previous CPU completes.  This can happen when the device
> > is not very active and thus fails to generate the IRQ that is
> > needed to complete the IRQ affinity move before the move
> > destination CPU is offlined.  When this happens there is an
> > -EBUSY return from __assign_irq_vector() during the offlining
> > of the IRQ move destination CPU which prevents initiation of
> > a new IRQ affinity move operation to an online CPU.  This
> > leaves the IRQ affinity set to an offlined CPU.
> >
> > I have been able to reproduce the problem on some of our
> > systems using the following script.  When the system is idle
> > the problem often reproduces during the first CPU offlining
> > sequence.
> 
> Ok.  I have had a chance to think through what you your patches
> are doing and it is assuming the broken logic in cpu_down is correct
> and patching over some but not all of the problems.
> 
> First the problem is not migrating irqs when IRR is set.

When the device is very active, a printk in __target_IO_APIC_irq()
immediately prior to
    io_apic_modify(apic, 0x10 + pin*2, reg);
intermittently displays 'reg' values indicating that the
Remote IRR bit is set.

With PATCH 3/3 the same printk displays no 'reg' values
indicating that the Remote IRR bit is set _and_ the IRQ
interruption problem disappears.

This is what led me to very strongly believe that the
problem was caused by writing the I/O redirection table
register while the Remote IRR bit was set.

> The general
> problem is that the state machines in most ioapics are fragile and
> can get confused if you reprogram them at any point when an irq can
> come in.

IRQs are masked [from fixup_irqs() when offlining a CPU, from
ack_apic_level() when not offlining a CPU] during the reprogramming.
Does this not help avoid the issue?  Sorry if this is a nieve
question.

>  In the middle of an interrupt handler is the one time we
> know interrupts can not come in.
> 
> To really fix this problem we need to do two things.
> 1) Tack when irqs that can not be migrated from process context are
>    on a cpu, and deny cpu hot-unplug.
> 2) Modify every interrupt that can be safely migrated in interrupt context
>    to migrate irqs in interrupt context so no one encounters this problem
>    in practice.
> 
> We can update MSIs and do a pci read to know when the update has made it
> to a device.  Multi MSI is a disaster but I won't go there.
> 
> In lowest priority delivery mode when the irq is not changing domain but
> just changing the set of possible cpus the interrupt can be delivered to.
> 
> And then of course all of the fun iommus that remap irqs.

Sounds non-trivial.

Gary

-- 
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503  IBM T/L: 775-4503
garyhade@...ibm.com
http://www.ibm.com/linux/ltc

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/