[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130506214913.GE22041@pd.tnic>
Date: Mon, 6 May 2013 23:49:13 +0200
From: Borislav Petkov <bp@...en8.de>
To: "Ortiz, Lance E" <lance.oritz@...com>
Cc: Bruno Prémont <bonbons@...ux-vserver.org>,
LKML <linux-kernel@...r.kernel.org>,
Linux-ACPI <linux-acpi@...r.kernel.org>,
Len Brown <lenb@...nel.org>, "Rafael J. Wysocki" <rjw@...k.pl>,
Tony Luck <tony.luck@...el.com>,
Matthew Garrett <mjg59@...f.ucam.org>,
"Dall, Elizabeth J (MCLinux)" <betty.dall@...com>
Subject: Re: WARNING at drivers/pci/search.c:214 for 3.9
On Mon, May 06, 2013 at 09:20:04PM +0000, Ortiz, Lance E wrote:
> Right Boris, looks like we are hitting the WARN_ON(in_interrupt)
> in pci_get_dev_by_id(). We recently started seeing this on our
> test systems when injecting errors.
Ok, I think I have it. That comes from cper_print_pcie(), i.e. your
enhanced PCIe logging in 1d5210008bd3a26daf4b06aed9d6c330dd4c83e2 which
came in 3.9. And since 3.9 is just out now, people are starting to see
the issue.
If you look at the call stack, you land in cper_print_pcie() down
from ghes_proc() which can be called from the polling routine
ghes_poll_func() but also from the interrupt handler ghes_irq_func.
> The only reason we are calling pci_get_domain_bus_and_slot() is to get
> the pci_dev* to pass into cper_print_aer() so we can have the device's
> name to put into the trace event for AER. If we can find another way
> to get the device name for the trace event we could remove this call
> to pci_get_domain_bus_and_slot(). I will continue to look into an
> alternative. If you have any ideas on how to get the device data from
> this context let me know.
Hmm, not sure.
Off the top of my head, maybe add the whole code around:
#ifdef CONFIG_ACPI_APEI_PCIEAER
...
#endif
in cper_print_pcie() into a separate function which is called from a
workqueue right after the interrupt is done.. Or something to that
effect.
> I'm not sure why the pci_get_domain_bus_and_slot() is failing to find
> the PCI device though. We are not hitting that issue. We are just
> seeing the in_interrupt warning.
Well, it could be corrupted error info or such because it used to say
[ 65.782664] {1}[Hardware Error]: device_id: 0000:00:02.3
but he doesn't have a 02.3 device in the lspci output.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists