[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0d48a902-3168-ed1e-3c25-f7af19f19fbc@linux.intel.com>
Date: Fri, 6 Mar 2020 11:29:30 -0800
From: Kuppuswamy Sathyanarayanan
<sathyanarayanan.kuppuswamy@...ux.intel.com>
To: Thomas Gleixner <tglx@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>
Cc: Marc Zyngier <maz@...nel.org>, x86@...nel.org,
Bjorn Helgaas <bhelgaas@...gle.com>, linux-pci@...r.kernel.org,
Keith Busch <kbusch@...nel.org>
Subject: Re: [patch 7/7] PCI/AER: Fix the broken interrupt injection
On 3/6/20 10:32 AM, Kuppuswamy Sathyanarayanan wrote:
>
> On 3/6/20 5:03 AM, Thomas Gleixner wrote:
>> The AER error injection mechanism just blindly abuses
>> generic_handle_irq()
>> which is really not meant for consumption by random drivers. The
>> include of
>> linux/irq.h should have been a red flag in the first place. Driver code,
>> unless implementing interrupt chips or low level hypervisor
>> functionality
>> has absolutely no business with that.
>>
>> Invoking generic_handle_irq() from non interrupt handling context can
>> have
>> nasty side effects at least on x86 due to the hardware trainwreck which
>> makes interrupt affinity changes a fragile beast. Sathyanarayanan
>> triggered
>> a NULL pointer dereference in the low level APIC code that way. While
>> the
>> particular pointer could be checked this would only paper over the issue
>> because there are other ways to trigger warnings or silently corrupt
>> state.
>>
>> Invoke the new irq_inject_interrupt() mechanism, which has the necessary
>> sanity checks in place and injects the interrupt via the irq_retrigger()
>> mechanism, which is at least halfways safe vs. the fragile x86 affinity
>> change mechanics.
>>
>> It's safe on x86 as it does not corrupt state, but it still can cause a
>> premature completion of an interrupt affinity change causing the
>> interrupt
>> line to become stale. Very unlikely, but possible.
>>
>> For regular operations this is a non issue as AER error injection is
>> meant
>> for debugging and testing and not for usage on production systems.
>> People
>> using this should better know what they are doing.
> It looks good to me.
>
> Reviewed-by: Kuppuswamy Sathyanarayanan
> <sathyanarayanan.kuppuswamy@...ux.intel.com>
> Tested-by: Kuppuswamy Sathyanarayanan
> <sathyanarayanan.kuppuswamy@...ux.intel.com>
>>
>> Fixes: 390e2db82480 ("PCI/AER: Abstract AER interrupt handling")
This patch is merged in v4.20 kernel. So this fix could be a candidate
for stable fix.
>> Reported-by: sathyanarayanan.kuppuswamy@...ux.intel.com
>> Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
>> ---
>> drivers/pci/pcie/Kconfig | 1 +
>> drivers/pci/pcie/aer_inject.c | 6 ++----
>> 2 files changed, 3 insertions(+), 4 deletions(-)
>>
>> --- a/drivers/pci/pcie/Kconfig
>> +++ b/drivers/pci/pcie/Kconfig
>> @@ -34,6 +34,7 @@ config PCIEAER
>> config PCIEAER_INJECT
>> tristate "PCI Express error injection support"
>> depends on PCIEAER
>> + select GENERIC_IRQ_INJECTION
>> help
>> This enables PCI Express Root Port Advanced Error Reporting
>> (AER) software error injector.
>> --- a/drivers/pci/pcie/aer_inject.c
>> +++ b/drivers/pci/pcie/aer_inject.c
>> @@ -16,7 +16,7 @@
>> #include <linux/module.h>
>> #include <linux/init.h>
>> -#include <linux/irq.h>
>> +#include <linux/interrupt.h>
>> #include <linux/miscdevice.h>
>> #include <linux/pci.h>
>> #include <linux/slab.h>
>> @@ -468,9 +468,7 @@ static int aer_inject(struct aer_error_i
>> }
>> pci_info(edev->port, "Injecting errors %08x/%08x into
>> device %s\n",
>> einj->cor_status, einj->uncor_status, pci_name(dev));
>> - local_irq_disable();
>> - generic_handle_irq(edev->irq);
>> - local_irq_enable();
>> + ret = irq_inject_interrupt(edev->irq);
>> } else {
>> pci_err(rpdev, "AER device not found\n");
>> ret = -ENODEV;
>>
--
Sathyanarayanan Kuppuswamy
Linux kernel developer
Powered by blists - more mailing lists