lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <eea9b090-b938-b5eb-e8da-0bf9d15238a3@nvidia.com>
Date:   Wed, 4 Nov 2020 17:13:07 +0530
From:   Vidya Sagar <vidyas@...dia.com>
To:     Bjorn Helgaas <helgaas@...nel.org>
CC:     <jingoohan1@...il.com>, <gustavo.pimentel@...opsys.com>,
        <lorenzo.pieralisi@....com>, <bhelgaas@...gle.com>,
        <amurray@...goodpenguin.co.uk>, <robh@...nel.org>,
        <treding@...dia.com>, <jonathanh@...dia.com>,
        <linux-pci@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <kthota@...dia.com>, <mmaddireddy@...dia.com>, <sagar.tv@...il.com>
Subject: Re: [PATCH V3 2/2] PCI: dwc: Add support to configure for ECRC



On 11/4/2020 2:37 AM, Bjorn Helgaas wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Tue, Nov 03, 2020 at 08:57:01AM +0530, Vidya Sagar wrote:
>> On 11/3/2020 4:32 AM, Bjorn Helgaas wrote:
>>> On Thu, Oct 29, 2020 at 11:09:59AM +0530, Vidya Sagar wrote:
>>>> DesignWare core has a TLP digest (TD) override bit in one of the control
>>>> registers of ATU. This bit also needs to be programmed for proper ECRC
>>>> functionality. This is currently identified as an issue with DesignWare
>>>> IP version 4.90a. This patch does the required programming in ATU upon
>>>> querying the system policy for ECRC.
>>>
>>> I guess this is a hardware defect, right?
>> Yes. This is common across all DWC implementations (version 4.90 precisely)
>>
>>> How much of a problem would it be if we instead added a "no_ecrc"
>>> quirk for this hardware so we never enabled ECRC?
>> Well, on Tegra for some of the high fidelity use cases, ECRC is required to
>> be turned on and if it can be done safely with these patches, why shouldn't
>> we not enable ECRC at all?
>>
>>> IIUC, the current Linux support of ECRC is a single choice at
>>> boot-time: by default ECRC is not enabled, but if you boot with
>>> "pci=ecrc=on", we turn on ECRC for every device.
>>>
>>> That seems like the minimal support, but I think the spec allows ECRC
>>> to be enabled selectively, on individual devices.  I can imagine a
>>> sysfs knob that would allow us to enable/disable ECRC per-device at
>>> run-time.
>>>
>>> If we had such a sysfs knob, it would be pretty ugly and maybe
>>> impractical to work around this hardware issue.  So I'm a little bit
>>> hesitant to add functionality that might have to be removed in the
>>> future.
>>
>> Agree with this. But since it is a boot-time choice at this point, I think
>> we can still go ahead with this approach to have a working ECRC mechanism
>> right? I don't see any sysfs knob for AER controlling at this point.
> 
> I don't want to do anything that will prevent us from adding
> per-device ECRC control in the future.
> 
> My concern is that if we add a run-time sysfs knob in the future, the
> user experience on this hardware will be poor because there's no
> convenient path to twiddle the PCIE_ATU_TD bit when the generic code
> changes the AER Control bit.
Agree.
Can we add it to the documentation that run time changing of ECRC 
settings are not supported on this (i.e. Tegra194) platform (or for that 
matter on any SoC with PCIe based on DesignWare core version 4.90A). By 
'not supported', I meant that the ECRC digest part may not work but the 
normal functionality will continue to work without reporting any AER 
errors. I tried to emulate the following scenarios on Tegra194 silicon 
and here are my observations.
FWIW, I'm referring to the PCIe spec Rev 5.0 Ver 1.0 (22 May 2019)

> 
> What is the failure mode in these scenarios:
> 
>    - User boots with defaults, ECRC is disabled.
>    - User enables ECRC via sysfs.
>    - What happens here?  ECRC is enabled via AER Control but not via
>      DWC TD bit.  I assume ECRC doesn't work.  Does it cause PCIe
>      errors (malformed TLP, etc)?
Since DWC TD bit is not programmed, for all the transactions that go 
through ATU, TLP Digest won't get generated (although AER registers 
indicate that ECRC should get generated).
As per the spec section "2.7.1 ECRC Rules"

---
If a device Function is enabled to generate ECRC, it must calculate and 
apply ECRC for all TLPs originated by the Function
---

So the RP would be violating the PCIe spec, but it doesn't result in any 
error because the same section has the following rule as well

---
If a device Function is enabled to check ECRC, it must do so for all 
TLPs with ECRC where the device is the ultimate PCI Express Receiver
	Note that it is still possible for the Function to receive TLPs without 
ECRC, and these are processed normally - this is not an error
---

so, even if the EP has ECRC check enabled, because of the above rule, it 
just processes those packets without ECRC as normal packets.
Basically, whoever is enabling ECRC run time gets cheated as 
transactions routed through ATU don't really have ECRC digest

> 
> Or this one:
> 
>    - User boots with "pci=ecrc=on", ECRC is enabled.
>    - ECRC works fine.
>    - User disables ECRC via sysfs.
>    - What happens here?  ECRC is disabled via AER Control, but DWC TD
>      bit thinks it's still enabled.
In this case, the EP doesn't have ECRC check enabled but receives TLPs 
with ECRC digest. This again won't result in any error because of the 
section "2.2.3 TLP Digest Rules" which says

---
If an intermediate or ultimate PCI Express Receiver of the TLP does not 
support ECRC checking, the Receiver must ignore the TLP Digest
---

So the EP just ignores the TLP Digest and process the TLP normally.
Although functionality wise there is no issue here, there could be some 
impact on the perf because of the extra DWord data. This is again 
debatable as the perf/data path is typically from EP's DMA engine to 
host system memory and not config/BAR accesses.
> 
> If you enabled ECRC unconditionally on DWC and the sysfs knob had no
> effect, I'd be OK with that.  I'm more worried about what happens when
> the AER bit and the DWC TD bit are set so they don't match.  What is
> the failure mode there?
Based on the above experiments, we can as well keep the DWC TD bit 
programmed unconditionally and it won't lead to any errors. As mentioned 
before, the only downside of it is the extra DWord in each ATU routed 
TLP which may load the bus (in downstream direction) with no real 
benefit as such.

Please let me know where can we go from here.
I can push a different patch to keep DWC TD bit always programmed if 
that is the best approach to take at this point.
> 
>>>> Signed-off-by: Vidya Sagar <vidyas@...dia.com>
>>>> Reviewed-by: Jingoo Han <jingoohan1@...il.com>
>>>> ---
>>>> V3:
>>>> * Added 'Reviewed-by: Jingoo Han <jingoohan1@...il.com>'
>>>>
>>>> V2:
>>>> * Addressed Jingoo's review comment
>>>> * Removed saving 'td' bit information in 'dw_pcie' structure
>>>>
>>>>    drivers/pci/controller/dwc/pcie-designware.c | 8 ++++++--
>>>>    drivers/pci/controller/dwc/pcie-designware.h | 1 +
>>>>    2 files changed, 7 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
>>>> index b5e438b70cd5..cbd651b219d2 100644
>>>> --- a/drivers/pci/controller/dwc/pcie-designware.c
>>>> +++ b/drivers/pci/controller/dwc/pcie-designware.c
>>>> @@ -246,6 +246,8 @@ static void dw_pcie_prog_outbound_atu_unroll(struct dw_pcie *pci, u8 func_no,
>>>>         dw_pcie_writel_ob_unroll(pci, index, PCIE_ATU_UNR_UPPER_TARGET,
>>>>                                  upper_32_bits(pci_addr));
>>>>         val = type | PCIE_ATU_FUNC_NUM(func_no);
>>>> +     if (pci->version == 0x490A)
>>>> +             val = val | pcie_is_ecrc_enabled() << PCIE_ATU_TD_SHIFT;
>>>>         val = upper_32_bits(size - 1) ?
>>>>                 val | PCIE_ATU_INCREASE_REGION_SIZE : val;
>>>>         dw_pcie_writel_ob_unroll(pci, index, PCIE_ATU_UNR_REGION_CTRL1, val);
>>>> @@ -294,8 +296,10 @@ static void __dw_pcie_prog_outbound_atu(struct dw_pcie *pci, u8 func_no,
>>>>                            lower_32_bits(pci_addr));
>>>>         dw_pcie_writel_dbi(pci, PCIE_ATU_UPPER_TARGET,
>>>>                            upper_32_bits(pci_addr));
>>>> -     dw_pcie_writel_dbi(pci, PCIE_ATU_CR1, type |
>>>> -                        PCIE_ATU_FUNC_NUM(func_no));
>>>> +     val = type | PCIE_ATU_FUNC_NUM(func_no);
>>>> +     if (pci->version == 0x490A)
>>>> +             val = val | pcie_is_ecrc_enabled() << PCIE_ATU_TD_SHIFT;
>>>> +     dw_pcie_writel_dbi(pci, PCIE_ATU_CR1, val);
>>>>         dw_pcie_writel_dbi(pci, PCIE_ATU_CR2, PCIE_ATU_ENABLE);
>>>>
>>>>         /*
>>>> diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
>>>> index e7f441441db2..b01ef407fd52 100644
>>>> --- a/drivers/pci/controller/dwc/pcie-designware.h
>>>> +++ b/drivers/pci/controller/dwc/pcie-designware.h
>>>> @@ -89,6 +89,7 @@
>>>>    #define PCIE_ATU_TYPE_IO             0x2
>>>>    #define PCIE_ATU_TYPE_CFG0           0x4
>>>>    #define PCIE_ATU_TYPE_CFG1           0x5
>>>> +#define PCIE_ATU_TD_SHIFT            8
>>>>    #define PCIE_ATU_FUNC_NUM(pf)           ((pf) << 20)
>>>>    #define PCIE_ATU_CR2                 0x908
>>>>    #define PCIE_ATU_ENABLE                      BIT(31)
>>>> --
>>>> 2.17.1
>>>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ