lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 06 Jun 2013 14:29:16 +0800
From:	Yanmin Zhang <yanmin_zhang@...ux.intel.com>
To:	Bjorn Helgaas <bhelgaas@...gle.com>
Cc:	"Liu, Joseph" <Joseph.Liu@...lex.com>,
	"Zhang, LongX" <longx.zhang@...el.com>,
	"linasvepstas@...il.com" <linasvepstas@...il.com>,
	"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: Subject : [ PATCH ]
 pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

On Wed, 2013-06-05 at 07:30 -0600, Bjorn Helgaas wrote:
> On Tue, Jun 4, 2013 at 6:38 PM, Yanmin Zhang
> <yanmin_zhang@...ux.intel.com> wrote:
> > On Tue, 2013-06-04 at 12:04 -0600, Bjorn Helgaas wrote:
> >> I'm not sure where we are with this patch.  I think Joseph initially
> >> reported a problem (though I haven't actually seen that), and this
> >> patch fixed it, so it seems like there's something we want to do here.
> > Yes, indeed. We checked Powerpc error handling and plan to use the
> > similar method to deal with the issue.
> >
> > Sorry for replying very late. I move to Android smartphone area and am
> > very busy on many top issues.
> 
> OK.  Can you point us at a bugzilla or email archive of Joseph's
> original problem report, or post it to this thread?  Then maybe
> somebody else can pick it up and make progress on this.

Joseph sent email to my outlook emailbox. I changed it a little to delete company
sensitive product name.

===========================Joseph's original email to me================
Hi Tom and Yanmin,
 
Hope this email can reach you well. I am working on a driver's PCI error recovery with AER. I have a question with one of my observations from platform AER driver's behavior. As your names and emails are listed to the PCIe AER driver coming with the kernel, I send this question to you:
 
During injecting AER Non-Correctable/Fatal error and PCIe error recovery process, I observed the following from our Low Level Driver (LLD):
 
1. The LLD's error_detected() callback got called and the PCI channel state passed in as "pci_channel_io_frozen", as expected;
 
2. The LLD's error_detected() callback function returned with PCI_ERS_RESULT_NEED_RESET, requesting a PCI slot reset;
 
3. The LLD's slot_reset() callback got called and it attempted to re-initialize the device hardware for the recovery. However, the PCI slot state was still in "pci_channel_io_frozen" and the pci_channel_offline() helper routine returned 1. This is not expected, and it actually prevented LLD in performing needed access to memory mapped I/O space for preparing the device for recovery;
 
4. Later, the LLD's resume() callback got called and the PCI slot state was set to "pci_channel_io_normal";
 
In the upstream Linux kernel 3.7.0's pci-error-recovery.txt, "STEP 4 Slot Reset", it stated after platform has performed PCI slot reset and then calls LLD's slot_reset() callback:
 
"This call gives drivers the chance to re-initialize the hardware (re-download firmware, etc.).  At this point, the driver may assume that the card is in a fresh state and is fully functional. The slot is unfrozen and the driver has full access to PCI config space, memory mapped I/O space and DMA. Interrupts (Legacy, MSI, or MSI-X) will also be available."
 
I looked into the kernel 3.7.0's AER driver's aerdrv_core.c and see that the PCI slot state was set to "pci_channel_io_frozen" on AER_FATAL serverity, and only set back to "pci_channel_io_normal" in report_resume() function. The PCI slot state was not set to "pci_channel_io_normal" when invoking report_slot_reset().
 
As a comparison, I also perform the EEH error recovery with the LLD driver on a PPC platform, which indeed set the PCI slot state to "pci_channel_io_normal" when calling LLD's slot_reset() callback function.
 
Can you please verify this platform AER driver's behavior is intended and will stay with the AER platform support, or it should be changed to be consistent with the PCI error recovery procedure described in the pci-error-recovery.txt documentation? I also noticed that before invoking the broadcast_error_message() function with "slot_reset", there is a comment in the 3.7.0 kernel source:
 
/*
 * TODO: Should call platform-specific
 * functions to reset slot before calling
 * drivers' slot_reset callbacks?
 */
And I do see that only the PCIe Link_Reset was performed, no PCI fundamental reset was performed even the LLD set the PCIe device's "pdev->needs_freset = 1", is this the area that later will be added for performing needed PCI hot reset or fundamental reset at this stage?
 
Your timely response is very appreciated. Thanks in advance and please let me know if you think I should redirect the question to someone else.
 
Best Regards,
Joseph Liu, Ph.D.
Senior Principal Engineer
Emulex Corporation


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists