linux-kernel - Re: [PATCH v6 1/2] PCI: iproc: Retry request when CRS returned from EP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMSpPPd3K6hPR=XpoAHXxNEUYbAxgQOc_Vh9T2owbkAFtH4KLw@mail.gmail.com>
Date:   Sun, 20 Aug 2017 00:22:56 +0530
From:   Oza Oza <oza.oza@...adcom.com>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     Bjorn Helgaas <bhelgaas@...gle.com>, Ray Jui <rjui@...adcom.com>,
        Scott Branden <sbranden@...adcom.com>,
        Jon Mason <jonmason@...adcom.com>,
        BCM Kernel Feedback <bcm-kernel-feedback-list@...adcom.com>,
        Andy Gospodarek <gospo@...adcom.com>,
        linux-pci <linux-pci@...r.kernel.org>,
        linux-kernel@...r.kernel.org,
        Oza Pawandeep <oza.pawandeep@...il.com>
Subject: Re: [PATCH v6 1/2] PCI: iproc: Retry request when CRS returned from EP

On Sat, Aug 19, 2017 at 11:56 PM, Bjorn Helgaas <helgaas@...nel.org> wrote:
> On Fri, Aug 04, 2017 at 09:18:15PM +0530, Oza Pawandeep wrote:
>> PCIe spec r3.1, sec 2.3.2
>> If CRS software visibility is not enabled, the RC must reissue the
>> config request as a new request.
>>
>> - If CRS software visibility is enabled,
>> - for a config read of Vendor ID, the RC must return 0x0001 data
>> - for all other config reads/writes, the RC must reissue the
>>   request
>>
>> iproc PCIe Controller spec:
>> 4.7.3.3. Retry Status On Configuration Cycle
>> Endpoints are allowed to generate retry status on configuration
>> cycles. In this case, the RC needs to re-issue the request. The IP
>> does not handle this because the number of configuration cycles needed
>> will probably be less than the total number of non-posted operations
>> needed.
>>
>> When a retry status is received on the User RX interface for a
>> configuration request that was sent on the User TX interface,
>> it will be indicated with a completion with the CMPL_STATUS field set
>> to 2=CRS, and the user will have to find the address and data values
>> and send a new transaction on the User TX interface.
>> When the internal configuration space returns a retry status during a
>> configuration cycle (user_cscfg = 1) on the Command/Status interface,
>> the pcie_cscrs will assert with the pcie_csack signal to indicate the
>> CRS status.
>> When the CRS Software Visibility Enable register in the Root Control
>> register is enabled, the IP will return the data value to 0x0001 for
>> the Vendor ID value and 0xffff  (all 1’s) for the rest of the data in
>> the request for reads of offset 0 that return with CRS status.  This
>> is true for both the User RX Interface and for the Command/Status
>> interface.  When CRS Software Visibility is enabled, the CMPL_STATUS
>> field of the completion on the User RX Interface will not be 2=CRS and
>> the pcie_cscrs signal will not assert on the Command/Status interface.
>>
>> Note that, neither PCIe host bridge nor PCIe core re-issues the
>> request for any configuration offset.
>>
>> As a result of the fact, PCIe RC driver (sw) should take care of
>> retrying.
>> This patch fixes the problem, and attempts to read config space again
>> in case of PCIe code forwarding the CRS back to CPU.
>
> I think "fixes the problem" is a little bit of an overstatement.  I
> think it covers up some cases of the problem, but if I understand
> correctly, we're still susceptible to data corruptions on both read
> and write:
>
>   Per PCIe r3.1, sec 2.3.2, config requests that receive completions
>   with Configuration Request Retry Status (CRS) should be reissued by
>   the hardware *except* reads of the Vendor ID when CRS Software
>   Visibility is enabled.
>
>   This hardware never reissues configuration requests when it receives
>   CRS completions.
>
>   For config writes, there is no way for this driver to learn about
>   CRS completions.  A write that receives a CRS completion will not be
>   retried and the data will be discarded.  This is a potential data
>   corruption.
>
>   For config reads, this hardware returns CFG_RETRY_STATUS data when
>   it receives a CRS completion for a config read, regardless of the
>   address of the read or the CRS Software Visibility Enable bit.  As a
>   partial workaround for this, we retry in software any read that
>   returns CFG_RETRY_STATUS.
>
>   CFG_RETRY_STATUS could be valid data for other config registers.  It
>   is impossible to read such registers correctly because we can't tell
>   whether the hardware (1) received a CRS completion and synthesized
>   data of CFG_RETRY_STATUS or (2) received a successful completion
>   with data of CFG_RETRY_STATUS.
>

this will be fixed in our SOC in next revision, we will have separate register
in host bridge space to tell us, whether it has to be retried.
so there will not be any data corruption.
it will be minor change (instead of checking for CFG_RETRY_STATUS, we
will directly check that register)
so that will be clean.

>   The only thing we can really do is return 0xffffffff data, which
>   indicates a config read failure in the first case but is a data
>   corruption in the second case.

Instead of returning PCIBIOS_DEVICE_NOT_FOUND,
you want me to return 0xffffffff data, and I suppose that you have
suggsted in the code at the end also.
Is my understanding right on this comment ?

And do you want me to add all the texts in commit description,
which you have written above regarding config read and write handling ?
instead of "fixes the problem"

It will be quiet long commit description :),
but that's okay.

>
> The write case is probably not that important because we have a
> similar situation on other systems.  On other systems, the root
> complex automatically reissues config writes with CRS completions, but
> it may limit the number of retries.  In that case, the failed write is
> probably logged as a Target Abort error, but nobody pays attention to
> that, so this isn't really much worse.
>

yes, we will not be handling write.

>> It implements iproc_pcie_config_read which gets called for Stingray,
>> Otherwise it falls back to PCI generic APIs.
>>
>> Signed-off-by: Oza Pawandeep <oza.oza@...adcom.com>
>> Reviewed-by: Ray Jui <ray.jui@...adcom.com>
>> Reviewed-by: Scott Branden <scott.branden@...adcom.com>
>>
>> diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c
>> index 0f39bd2..583cee0 100644
>> --- a/drivers/pci/host/pcie-iproc.c
>> +++ b/drivers/pci/host/pcie-iproc.c
>> @@ -68,6 +68,9 @@
>>  #define APB_ERR_EN_SHIFT             0
>>  #define APB_ERR_EN                   BIT(APB_ERR_EN_SHIFT)
>>
>> +#define CFG_RETRY_STATUS             0xffff0001
>> +#define CFG_RETRY_STATUS_TIMEOUT_US  500000 /* 500 milli-seconds. */
>> +
>>  /* derive the enum index of the outbound/inbound mapping registers */
>>  #define MAP_REG(base_reg, index)      ((base_reg) + (index) * 2)
>>
>> @@ -448,6 +451,66 @@ static inline void iproc_pcie_apb_err_disable(struct pci_bus *bus,
>>       }
>>  }
>>
>> +static int iproc_pcie_cfg_retry(void __iomem *cfg_data_p)
>> +{
>> +     int timeout = CFG_RETRY_STATUS_TIMEOUT_US;
>> +     unsigned int ret;
>> +
>> +     /*
>> +      * As per PCIe spec r3.1, sec 2.3.2, CRS Software
>> +      * Visibility only affects config read of the Vendor ID.
>> +      * For config write or any other config read the Root must
>> +      * automatically re-issue configuration request again as a
>> +      * new request.
>> +      *
>> +      * Iproc based PCIe RC (hw) does not retry
>> +      * request on its own, so handle it here.
>> +      * Note that, AXI data 0xffff0001 returned by PAXB could be
>> +      * valid data in configuration space, for e.g. BAR requesting
>> +      * 64bit IO memory, so we retry till timeout.
>> +      * And leave to be handled by upper layers.
>> +      *
>> +      * Also, iproc PCIe RC does not have any effect on
>> +      * CRS Software visibility bit, irrespective if it
>> +      * set or unset, the RC will never re-issue any configuration
>> +      * access request.
>> +      */
>> +     do {
>> +             ret = readl(cfg_data_p);
>> +             if (ret == CFG_RETRY_STATUS)
>> +                     udelay(1);
>> +             else
>> +                     return PCIBIOS_SUCCESSFUL;
>> +     } while (timeout--);
>> +
>> +     return PCIBIOS_DEVICE_NOT_FOUND;
>> +}
>> +
>> +static void __iomem *iproc_pcie_map_ep_cfg_reg(struct iproc_pcie *pcie,
>> +                                            unsigned int busno,
>> +                                            unsigned int slot,
>> +                                            unsigned int fn,
>> +                                            int where)
>> +{
>> +     u16 offset;
>> +     u32 val;
>> +
>> +     /* EP device access */
>> +     val = (busno << CFG_ADDR_BUS_NUM_SHIFT) |
>> +             (slot << CFG_ADDR_DEV_NUM_SHIFT) |
>> +             (fn << CFG_ADDR_FUNC_NUM_SHIFT) |
>> +             (where & CFG_ADDR_REG_NUM_MASK) |
>> +             (1 & CFG_ADDR_CFG_TYPE_MASK);
>> +
>> +     iproc_pcie_write_reg(pcie, IPROC_PCIE_CFG_ADDR, val);
>> +     offset = iproc_pcie_reg_offset(pcie, IPROC_PCIE_CFG_DATA);
>> +
>> +     if (iproc_pcie_reg_is_invalid(offset))
>> +             return NULL;
>> +
>> +     return (pcie->base + offset);
>> +}
>
> This is a duplicate of most of the code in iproc_pcie_map_cfg_bus().
> Can you use that directly, or factor this out somehow so it's not
> repeated?

this whole things has to be aligned to Lorenzo's patch, because I see
his patches are in.

>
>> +
>>  /**
>>   * Note access to the configuration registers are protected at the higher layer
>>   * by 'pci_lock' in drivers/pci/access.c
>> @@ -499,13 +562,48 @@ static void __iomem *iproc_pcie_map_cfg_bus(struct pci_bus *bus,
>>               return (pcie->base + offset);
>>  }
>>
>> +static int iproc_pcie_config_read(struct pci_bus *bus, unsigned int devfn,
>> +                                 int where, int size, u32 *val)
>> +{
>> +     struct iproc_pcie *pcie = iproc_data(bus);
>> +     unsigned int slot = PCI_SLOT(devfn);
>> +     unsigned int fn = PCI_FUNC(devfn);
>> +     unsigned int busno = bus->number;
>> +     void __iomem *cfg_data_p;
>> +     int ret;
>> +
>> +     /* root complex access. */
>> +     if (busno == 0)
>> +             return pci_generic_config_read32(bus, devfn, where, size, val);
>> +
>> +     cfg_data_p = iproc_pcie_map_ep_cfg_reg(pcie, busno, slot, fn, where);
>> +
>> +     if (!cfg_data_p)
>> +             return PCIBIOS_DEVICE_NOT_FOUND;
>> +
>> +     ret = iproc_pcie_cfg_retry(cfg_data_p);
>> +     if (ret)
>
> This relies on the fact that PCIBIOS_SUCCESSFUL == 0, which defeats
> the whole point of having a #define.  You could test for
> PCIBIOS_SUCCESSFUL (but I think the restructuring below would be even
> better).
>
>> +             return ret;
>> +
>> +     *val = readl(cfg_data_p);
>
> This is an extra read.  iproc_pcie_cfg_retry() read it once and got
> valid data (something other than CFG_RETRY_STATUS), returned
> PCIBIOS_SUCCESSFUL, and now you're reading it again.
>
> I think you should do something like this instead so you don't do the
> MMIO read any more times than necessary:
>
>   static u32 iproc_pcie_cfg_retry(void __iomem *cfg_data_p)
>   {
>     u32 data;
>     int timeout = CFG_RETRY_STATUS_TIMEOUT_US;
>
>     data = readl(cfg_data_p);
>     while (data == CFG_RETRY_STATUS && timeout--) {
>       udelay(1);
>       data = readl(cfg_data_p);
>     }
>
>     if (data == CFG_RETRY_STATUS)
>       data = 0xffffffff;
>     return data;
>   }
>
>   static int iproc_pcie_config_read(...)
>   {
>     u32 data;
>
>     ...
>     data = iproc_pcie_cfg_retry(cfg_data_p);
>     if (size <= 2)
>       *val = (data >> (8 * (where & 3))) & ((1 << (size * 8)) - 1);
>
> In the event of a timeout, this returns PCIBIOS_SUCCESSFUL and
> 0xffffffff data.  That's what most other platforms do, and most
> callers of the PCI config accessors check for that data instead of
> checking the return code to see whether the access was successful.
>
> For example, pci_flr_wait() assumes that if a read of PCI_COMMAND
> returns ~0, it's because the device isn't ready yet, and we should
> wait and retry.
>
>> +
>> +     if (size <= 2)
>> +             *val = (*val >> (8 * (where & 3))) & ((1 << (size * 8)) - 1);
>> +
>> +     return PCIBIOS_SUCCESSFUL;
>> +}
>> +
>>  static int iproc_pcie_config_read32(struct pci_bus *bus, unsigned int devfn,
>>                                   int where, int size, u32 *val)
>>  {
>>       int ret;
>> +     struct iproc_pcie *pcie = iproc_data(bus);
>>
>>       iproc_pcie_apb_err_disable(bus, true);
>> -     ret = pci_generic_config_read32(bus, devfn, where, size, val);
>> +     if (pcie->type == IPROC_PCIE_PAXB_V2)
>> +             ret = iproc_pcie_config_read(bus, devfn, where, size, val);
>> +     else
>> +             ret = pci_generic_config_read32(bus, devfn, where, size, val);
>>       iproc_pcie_apb_err_disable(bus, false);
>>
>>       return ret;
>> --
>> 1.9.1
>>

ok I will take care of your comments. and change the code accordingly.
and re-submit the patch-set again.

Regards
Oza.