[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 21 Nov 2021 09:50:38 +0800
From: luanshi <zhangliguang@...ux.alibaba.com>
To: Lukas Wunner <lukas@...ner.de>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>,
Kuppuswamy Sathyanarayanan
<sathyanarayanan.kuppuswamy@...ux.intel.com>,
Amey Narkhede <ameynarkhede03@...il.com>,
linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] PCI: pciehp: clear cmd_busy bit when Command Completed in
polling mode
Hi,thanks for your comments.
在 2021/11/19 20:00, Lukas Wunner 写道:
> On Thu, Nov 11, 2021 at 01:42:58PM +0800, Liguang Zhang wrote:
>> Both the CCIE and HPIE bits are masked in pcie_disable_notification(),
>> when we issue a hotplug command, pcie_wait_cmd() will polling the
>> Command Completed bit instead of waiting for an interrupt. But cmd_busy
>> bit was not cleared when Command Completed which results in timeouts
>> like this in pciehp_power_off_slot() and pcie_init_notification():
>>
>> pcieport 0000:00:03.0: pciehp: Timeout on hotplug command 0x01c0
>> (issued 2264 msec ago)
>> pcieport 0000:00:03.0: pciehp: Timeout on hotplug command 0x05c0
>> (issued 2288 msec ago)
> The first timeout occurs with the following bits set in ctrl->slot_ctrl:
> PCI_EXP_SLTCTL_PWR_IND_ON | PCI_EXP_SLTCTL_ATTN_IND_OFF
After some debug, the first timeout occurs:
pciehp_power_off_slot
pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_OFF, PCI_EXP_SLTCTL_PCC)
pcie_do_write_cmd
/*
* Always wait for any previous command that might
still be in progress
*/
pcie_wait_cmd(ctrl); // at the beginning
if (!ctrl->cmd_busy) // cmd_busy was not zero
return;
Why cmd_busy was not flase, because in function
pcie_disable_notification cmd_busy was not setting correctly:
pcie_disable_notification // Both the CCIE and HPIE bits are masked
pcie_write_cmd
pcie_do_write_cmd
ctrl->cmd_busy = 1 // cmd_busy was setting to 1
pcie_wait_cmd
pcie_poll_cmd // pcie_wait_cmd() will polling the
Command Completed bit instead of waiting for an interrupt
After debug we found Command Completed can be
signaled without cmd_busy was setting to 0.
If we use interrupt mode pciehp_isr:
pciehp_isr
if (events & PCI_EXP_SLTSTA_CC) {
ctrl->cmd_busy = 0; // cmd_busy was setting to zero,
this was left in pcie_poll_cmd.
Thanks,
Liguang
>
> Those bits are set by:
> board_added()
> pciehp_set_indicators()
>
>
> The second timeout occurs with:
> PCI_EXP_SLTCTL_PWR_IND_ON | PCI_EXP_SLTCTL_ATTN_IND_OFF |
> PCI_EXP_SLTCTL_PWR_OFF
>
> This might be triggered by:
> remove_board()
> pciehp_power_off_slot()
>
>
> So it seems Command Completed is not signaled when changing the
> Power Indicator, Attention Indicator and Power Controller Control
> bits in the Slot Control register. But apparently it works for
> the other bits.
>
> We know there are hotplug controllers out there which suffer from
> broken Command Completed support. They support it for the bits
> mentioned above but not the others. So the inverse behavior from
> your hotplug controller. See this code comment in pcie_do_write_cmd():
>
> /*
> * Controllers with the Intel CF118 and similar errata advertise
> * Command Completed support, but they only set Command Completed
> * if we change the "Control" bits for power, power indicator,
> * attention indicator, or interlock. If we only change the
> * "Enable" bits, they never set the Command Completed bit.
> */
>
>
>> --- a/drivers/pci/hotplug/pciehp_hpc.c
>> +++ b/drivers/pci/hotplug/pciehp_hpc.c
>> @@ -98,6 +98,8 @@ static int pcie_poll_cmd(struct controller *ctrl, int timeout)
>> if (slot_status & PCI_EXP_SLTSTA_CC) {
>> pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
>> PCI_EXP_SLTSTA_CC);
>> + ctrl->cmd_busy = 0;
>> + smp_mb();
>> return 1;
>> }
>> msleep(10);
> I suspect that this patch merely papers over the problem and that
> the real solution would be to either apply quirk_cmd_compl or a
> similar quirk to your hotplug controller.
>
> Please open a bug on bugzilla.kernel.org and attach full output
> of lspci -vv and dmesg. Be sure to add the following to the
> command line:
> pciehp.pciehp_debug=1 dyndbg="file pciehp* +p"
>
> Once you've done that, please report the bugzilla link here
> so that we can analyze the issue properly.
>
> Thanks,
>
> Lukas
Powered by blists - more mailing lists