lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <86802c440904110044t7a932c41w95c32de1acaf7c5@mail.gmail.com>
Date:	Sat, 11 Apr 2009 00:44:01 -0700
From:	Yinghai Lu <yhlu.kernel@...il.com>
To:	"Eric W. Biederman" <ebiederm@...ssion.com>
Cc:	Gary Hade <garyhade@...ibm.com>, mingo@...e.hu, mingo@...hat.com,
	tglx@...utronix.de, hpa@...or.com, x86@...nel.org,
	linux-kernel@...r.kernel.org, lcm@...ibm.com
Subject: Re: [PATCH 2/3] [BUGFIX] x86/x86_64: fix CPU offlining triggered 
	inactive device IRQ interrruption

On Fri, Apr 10, 2009 at 3:02 PM, Eric W. Biederman
<ebiederm@...ssion.com> wrote:
> Gary Hade <garyhade@...ibm.com> writes:
>
>> On Thu, Apr 09, 2009 at 06:29:10PM -0700, Eric W. Biederman wrote:
>>> Gary Hade <garyhade@...ibm.com> writes:
>>>
>>> > Impact: Eliminates a race that can leave the system in an
>>> >         unusable state
>>> >
>>> > During rapid offlining of multiple CPUs there is a chance
>>> > that an IRQ affinity move destination CPU will be offlined
>>> > before the IRQ affinity move initiated during the offlining
>>> > of a previous CPU completes.  This can happen when the device
>>> > is not very active and thus fails to generate the IRQ that is
>>> > needed to complete the IRQ affinity move before the move
>>> > destination CPU is offlined.  When this happens there is an
>>> > -EBUSY return from __assign_irq_vector() during the offlining
>>> > of the IRQ move destination CPU which prevents initiation of
>>> > a new IRQ affinity move operation to an online CPU.  This
>>> > leaves the IRQ affinity set to an offlined CPU.
>>> >
>>> > I have been able to reproduce the problem on some of our
>>> > systems using the following script.  When the system is idle
>>> > the problem often reproduces during the first CPU offlining
>>> > sequence.
>>>
>>> You appear to be focusing on the IBM x460 and x3835.
>>
>> True.  I have also observed IRQ interruptions on an IBM x3950 M2
>> which I believe, but am not certain, were due to the other
>> "I/O redirection table register write with Remote IRR bit set"
>> caused problem.
>>
>> I intend to do more testing on the x3950 M2 and other
>> IBM System x servers but I unfortunately do not currently
>> have access to any Intel based non-IBM MP servers.  I was
>> hoping that my testing request might at least get some
>> others interested in running the simple test script on their
>> systems and reporting their results.  Have you perhaps tried
>> the test on any of the Intel based MP systems that you have
>> access to?
>>
>>> Can you describe
>>> what kind of interrupt setup you are running.
>>
>> Being somewhat of a ioapic neophyte I am not exactly sure
>> what you are asking for here.  This is ioapic information
>> logged during boot if that helps at all.
>> x3850:
>>     ACPI: IOAPIC (id[0x0f] address[0xfec00000] gsi_base[0])
>>     IOAPIC[0]: apic_id 15, version 0, address 0xfec00000, GSI 0-35
>>     ACPI: IOAPIC (id[0x0e] address[0xfec01000] gsi_base[36])
>>     IOAPIC[1]: apic_id 14, version 0, address 0xfec01000, GSI 36-71
>> x460:
>>     ACPI: IOAPIC (id[0x0f] address[0xfec00000] gsi_base[0])
>>     IOAPIC[0]: apic_id 15, version 17, address 0xfec00000, GSI 0-35
>>     ACPI: IOAPIC (id[0x0e] address[0xfec01000] gsi_base[36])
>>     IOAPIC[1]: apic_id 14, version 17, address 0xfec01000, GSI 36-71
>>     ACPI: IOAPIC (id[0x0d] address[0xfec02000] gsi_base[72])
>>     IOAPIC[2]: apic_id 13, version 17, address 0xfec02000, GSI 72-107
>>     ACPI: IOAPIC (id[0x0c] address[0xfec03000] gsi_base[108])
>>     IOAPIC[3]: apic_id 12, version 17, address 0xfec03000, GSI 108-143
>
> Sorry.  My real question is which mode you are running the ioapics in.
>

looks like ack_level_irq.

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ