lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1742386474-13717-1-git-send-email-wenxiong@linux.ibm.com>
Date: Wed, 19 Mar 2025 07:14:34 -0500
From: wenxiong@...ux.ibm.com
To: linux-kernel@...r.kernel.org, gjoyce@...ux.ibm.com
Cc: Wen Xiong <wenxiong@...ux.ibm.com>
Subject: [PATCH 1/1] genirq/msi: Dynamic remove/add stroage adapter hits EEH

From: Wen Xiong <wenxiong@...ux.ibm.com>

When enable irqbalance daemon, Dynamic remove/add stroage
adapter(Scsi IPR and FC Qlogic) test hits EEH on PPC.

EEH: [c00000000004f75c] __eeh_send_failure_event+0x7c/0x160
EEH: [c000000000048444] eeh_dev_check_failure.part.0+0x254/0x650
EEH: [c008000001650678] eeh_readl+0x60/0x90 [ipr]
EEH: [c00800000166746c] ipr_cancel_op+0x2b8/0x524 [ipr]
EEH: [c008000001656524] ipr_eh_abort+0x6c/0x130 [ipr]
EEH: [c000000000ab0d20] scmd_eh_abort_handler+0x140/0x440
EEH: [c00000000017e558] process_one_work+0x298/0x590
EEH: [c00000000017eef8] worker_thread+0xa8/0x620
EEH: [c00000000018be34] kthread+0x124/0x130
EEH: [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64
EEH: This PCI device has failed 1 times in the last hour and will be.

We took a pcie bus trace and found out that a vector of msix is clear
to 0 by irqbalance daemon. If we disable irqbalance daemon, we won't
see the issue on both of adapters.

We enabled debug in ipr driver,
[   44.103071] ipr: Entering __ipr_remove
[   44.103083] ipr: Entering ipr_initiate_ioa_bringdown
[   44.103091] ipr: Entering ipr_reset_shutdown_ioa
[   44.103099] ipr: Leaving ipr_reset_shutdown_ioa
[   44.103105] ipr: Leaving ipr_initiate_ioa_bringdown
[   44.149918] ipr: Entering ipr_reset_ucode_download
[   44.149935] ipr: Entering ipr_reset_alert
[   44.150032] ipr: Entering ipr_reset_start_timer
[   44.150038] ipr: Leaving ipr_reset_alert
[   44.244343] scsi 1:2:3:0: alua: Detached
[   44.254300] ipr: Entering ipr_reset_start_bist
[   44.254320] ipr: Entering ipr_reset_start_timer
[   44.254325] ipr: Leaving ipr_reset_start_bist
[   44.364329] scsi 1:2:4:0: alua: Detached
[   45.134341] scsi 1:2:5:0: alua: Detached
[   45.860949] ipr: Entering ipr_reset_shutdown_ioa
[   45.860962] ipr: Leaving ipr_reset_shutdown_ioa
[   45.860966] ipr: Entering ipr_reset_alert
[   45.861028] ipr: Entering ipr_reset_start_timer
[   45.861035] ipr: Leaving ipr_reset_alert
[   45.964302] ipr: Entering ipr_reset_start_bist
[   45.964309] ipr: Entering ipr_reset_start_timer
[   45.964313] ipr: Leaving ipr_reset_start_bist
[   46.264301] ipr: Entering ipr_reset_bist_done
[   46.264309] ipr: Leaving ipr_reset_bist_done

--->
There is very small window: irqbalance daemon kicks in before ipr driver
calls pci_restore_state(pdev), irqbalance daemon read back all 0 for that
msix vector in __pci_read_msi_msg(). When ipr driver call
pci_restore_state(pdev) in ipr_reset_restore_cfg_space(), the msix vector
has been cleared by irqbalance daemon in pci_write_msg_msix().

Below is MSIX table for ipr adapter after 'irqbalance" dameon kicked in.

Dump MSIx table: index=0 address_lo=c800 address_hi=10000000 msg_data=0
Dump MSIx table: index=1 address_lo=c810 address_hi=10000000 msg_data=0
Dump MSIx table: index=2 address_lo=c820 address_hi=10000000 msg_data=0
Dump MSIx table: index=3 address_lo=c830 address_hi=10000000 msg_data=0
Dump MSIx table: index=4 address_lo=c840 address_hi=10000000 msg_data=0
Dump MSIx table: index=5 address_lo=c850 address_hi=10000000 msg_data=0
Dump MSIx table: index=6 address_lo=c860 address_hi=10000000 msg_data=0
Dump MSIx table: index=7 address_lo=c870 address_hi=10000000 msg_data=0
Dump MSIx table: index=8 address_lo=0 address_hi=0 msg_data=0
					-------> hit EEH
Dump MSIx table: index=9 address_lo=c890 address_hi=10000000 msg_data=0
Dump MSIx table: index=10 address_lo=c8a0 address_hi=10000000 msg_data=0
Dump MSIx table: index=11 address_lo=c8b0 address_hi=10000000 msg_data=0
Dump MSIx table: index=12 address_lo=c8c0 address_hi=10000000 msg_data=0
Dump MSIx table: index=13 address_lo=c8d0 address_hi=10000000 msg_data=0
Dump MSIx table: index=14 address_lo=c8e0 address_hi=10000000 msg_data=0
Dump MSIx table: index=15 address_lo=c8f0 address_hi=10000000 msg_data=0

[   46.264312] ipr: Entering ipr_reset_restore_cfg_space
[   46.267439] ipr: Entering ipr_fail_all_ops
[   46.267447] ipr: Leaving ipr_fail_all_ops
[   46.267451] ipr: Leaving ipr_reset_restore_cfg_space
[   46.267454] ipr: Entering ipr_ioa_bringdown_done
[   46.267458] ipr: Leaving ipr_ioa_bringdown_done
[   46.267467] ipr: Entering ipr_worker_thread
[   46.267470] ipr: Leaving ipr_worker_thread

irabalance daemon calls this:
In _pci_read_msi_msg(),
void __pci_read_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
{
    struct pci_dev *dev = msi_desc_to_pci_dev(entry);

    BUG_ON(dev->current_state != PCI_D0);

    if (entry->pci.msi_attrib.is_msix) {
        void __iomem *base = pci_msix_desc_addr(entry);

        if (WARN_ON_ONCE(entry->pci.msi_attrib.is_virtual))
            return;

        msg->address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR);
			-> it is 0 before calling pci_restore_state()

        msg->address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR);
			-> it is 0 before calling pci_restore_state()

        msg->data = readl(base + PCI_MSIX_ENTRY_DATA);
...
...
}

Then call pseries_msi_write_msg to set 0 to entry->msg.

static void pseries_msi_write_msg(struct irq_data *data,...)
{
    struct msi_desc *entry = irq_data_get_msi_desc(data);

    entry->msg = *msg;
}

Later ipr driver calls pci_restore_save(pdev)
-> __pci_restore_msix_state()

pci_restore_msix_state(struct pci_dev *dev)
  -> pci_write_msg_msix()

static inline void pci_write_msg_msix()
{
..   writel(msg->address_lo, base + PCI_MSIX_ENTRY_LOWER_ADDR);
			->already clear to 0 by irqbalance daemon

    writel(msg->address_hi, base + PCI_MSIX_ENTRY_UPPER_ADDR); 
			->already clear to 0 by irqbalance daemon
    writel(msg->data, base + PCI_MSIX_ENTRY_DATA);

}

I tried the following patch and we didn't hit the issue. If you are
familiar with MSI domain code, Please suggest the better solution.

Thanks,
Wendy

Signed-off-by: Wen Xiong <wenxiong@...ux.ibm.com>
---
 kernel/irq/msi.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 396a067a8a56..fcde35efb64c 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -671,7 +671,8 @@ int msi_domain_set_affinity(struct irq_data *irq_data,
 	if (ret >= 0 && ret != IRQ_SET_MASK_OK_DONE) {
 		BUG_ON(irq_chip_compose_msi_msg(irq_data, msg));
 		msi_check_level(irq_data->domain, msg);
-		irq_chip_write_msi_msg(irq_data, msg);
+		if ((msg->address_lo != 0) && (msg->address_hi != 0))
+			irq_chip_write_msi_msg(irq_data, msg);
 	}
 
 	return ret;
-- 
2.43.5


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ