lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fee8b431-617f-3890-3ad2-67a61d3ffca2@v0yd.nl>
Date:   Tue, 12 Oct 2021 10:48:49 +0200
From:   Jonas Dreßler <verdre@...d.nl>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     Amitkumar Karwar <amitkarwar@...il.com>,
        Ganapathi Bhat <ganapathi017@...il.com>,
        Xinming Hu <huxinming820@...il.com>,
        Kalle Valo <kvalo@...eaurora.org>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Tsuchiya Yuto <kitakar@...il.com>,
        linux-wireless@...r.kernel.org, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org,
        Maximilian Luz <luzmaximilian@...il.com>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Pali Rohár <pali@...nel.org>,
        Heiner Kallweit <hkallweit1@...il.com>,
        Johannes Berg <johannes@...solutions.net>,
        Brian Norris <briannorris@...omium.org>,
        David Laight <David.Laight@...LAB.COM>,
        Alex Williamson <alex.williamson@...hat.com>
Subject: Re: [PATCH] mwifiex: Add quirk resetting the PCI bridge on MS Surface
 devices

On 10/11/21 18:53, Bjorn Helgaas wrote:
> [+cc Alex]
> 
> On Mon, Oct 11, 2021 at 03:42:38PM +0200, Jonas Dreßler wrote:
>> The most recent firmware (15.68.19.p21) of the 88W8897 PCIe+USB card
>> reports a hardcoded LTR value to the system during initialization,
>> probably as an (unsuccessful) attempt of the developers to fix firmware
>> crashes. This LTR value prevents most of the Microsoft Surface devices
>> from entering deep powersaving states (either platform C-State 10 or
>> S0ix state), because the exit latency of that state would be higher than
>> what the card can tolerate.
> 
> S0ix and C-State 10 are ACPI concepts that don't mean anything in a
> PCIe context.
> 
> I think LTR is only involved in deciding whether to enter the ASPM
> L1.2 substate.  Maybe the system will only enter C-State 10 or S0ix
> when the link is in L1.2?

Yup, this is indeed the case, see https://01.org/blogs/qwang59/2020/linux-s0ix-troubleshooting
(ctrl+f "IP LINK PM STATE").

> 
>> Turns out the card works just the same (including the firmware crashes)
>> no matter if that hardcoded LTR value is reported or not, so it's kind
>> of useless and only prevents us from saving power.
>>
>> To get rid of those hardcoded LTR requirements, it's possible to reset
>> the PCI bridge device after initializing the cards firmware. I'm not
>> exactly sure why that works, maybe the power management subsystem of the
>> PCH resets its stored LTR values when doing a function level reset of
>> the bridge device. Doing the reset once after starting the wifi firmware
>> works very well, probably because the firmware only reports that LTR
>> value a single time during firmware startup.
>>
>> Signed-off-by: Jonas Dreßler <verdre@...d.nl>
>> ---
>>   drivers/net/wireless/marvell/mwifiex/pcie.c   | 12 +++++++++
>>   .../wireless/marvell/mwifiex/pcie_quirks.c    | 26 +++++++++++++------
>>   .../wireless/marvell/mwifiex/pcie_quirks.h    |  1 +
>>   3 files changed, 31 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
>> index c6ccce426b49..2506e7e49f0c 100644
>> --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
>> +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
>> @@ -1748,9 +1748,21 @@ mwifiex_pcie_send_boot_cmd(struct mwifiex_adapter *adapter, struct sk_buff *skb)
>>   static int mwifiex_pcie_init_fw_port(struct mwifiex_adapter *adapter)
>>   {
>>   	struct pcie_service_card *card = adapter->card;
>> +	struct pci_dev *pdev = card->dev;
>> +	struct pci_dev *parent_pdev = pci_upstream_bridge(pdev);
>>   	const struct mwifiex_pcie_card_reg *reg = card->pcie.reg;
>>   	int tx_wrap = card->txbd_wrptr & reg->tx_wrap_mask;
>>   
>> +	/* Trigger a function level reset of the PCI bridge device, this makes
>> +	 * the firmware (latest version 15.68.19.p21) of the 88W8897 PCIe+USB
>> +	 * card stop reporting a fixed LTR value that prevents the system from
>> +	 * entering package C10 and S0ix powersaving states.
> 
> I don't believe this.  Why would resetting the root port change what
> the downstream device reports via LTR messages?
> 
>  From PCIe r5.0, sec 5.5.1:
> 
>    The following rules define how the L1.1 and L1.2 substates are entered:
>      ...
>      * When in ASPM L1.0 and the ASPM L1.2 Enable bit is Set, the L1.2
>        substate must be entered when CLKREQ# is deasserted and all of
>        the following conditions are true:
> 
>        - The reported snooped LTR value last sent or received by this
> 	Port is greater than or equal to the value set by the
> 	LTR_L1.2_THRESHOLD Value and Scale fields, or there is no
> 	snoop service latency requirement.
> 
>        - The reported non-snooped LTR last sent or received by this
> 	Port value is greater than or equal to the value set by the
> 	LTR_L1.2_THRESHOLD Value and Scale fields, or there is no
> 	non-snoop service latency requirement.
> 
>  From the LTR Message format in sec 6.18:
> 
>    No-Snoop Latency and Snoop Latency: As shown in Figure 6-15, these
>    fields include a Requirement bit that indicates if the device has a
>    latency requirement for the given type of Request. If the
>    Requirement bit is Set, the LatencyValue and LatencyScale fields
>    describe the latency requirement. If the Requirement bit is Clear,
>    there is no latency requirement and the LatencyValue and
>    LatencyScale fields are ignored.
> 
> Resetting the root port might make it forget the LTR value it last
> received.  If that's equivalent to having no service latency
> requirement, it *might* enable L1.2 entry, although that doesn't seem
> equivalent to the downstream device having sent an LTR message with
> the Requirement bit cleared.
> 
> I think the endpoint is required to send a new LTR message before it
> goes to a non-D0 state (sec 6.18), so the bridge will capture the
> latency again, and we'll probably be back in the same state.

Indeed that happens when suspending the device, after resuming the LTR
value is back to the initial value. mwifiex_pcie_init_fw_port() is
executed on resume, too though (I should probably have mentioned this
in the commit message, will do in v2), so this is taken care of.

While suspended, the device goes into D3 anyway and S0ix is achieved
regardless of the LTR value.

> 
> This all seems fragile to me.  If we force the link to L1.2 without
> knowing accurate exit latencies and latency tolerance, the device is
> liable to drop packets.

Yeah, I'm not saying this patch isn't an ugly hack...

What I can say though is that this patch has been running in the
linux-surface (https://github.com/linux-surface/kernel/pull/72) kernel
for a few months now, and so far we've only received positive feedback.

There's two alternatives I can think of to deal with this issue:

1) Revert the cards firmware in linux-firmware back to the second-latest
version. That firmware didn't report a fixed LTR value and also doesn't
have any other obvious issues I know of compared to the latest one.

2) Somehow interact with the PMC Core driver to make it ignore the LTR
values reported by the card (I doubt that's possible from mwifiex).
It can be done manually via debugfs by writing to
/sys/kernel/debug/pmc_core/ltr_ignore.

> 
>> +	 * We need to do it here because it must happen after firmware
>> +	 * initialization and this function is called right after that is done.
>> +	 */
>> +	if (card->quirks & QUIRK_DO_FLR_ON_BRIDGE)
>> +		pci_reset_function(parent_pdev);
> 
> PCIe r5.0, sec 7.5.3.3, says Function Level Reset can only be
> supported by endpoints, so I guess this will actually do some other
> kind of reset.

Interesting, I briefly searched and it doesn't seem like think
there's public documentation available by Intel that goes into
the specifics here, maybe someone working at Intel knows more?

> 
>>   	/* Write the RX ring read pointer in to reg->rx_rdptr */
>>   	if (mwifiex_write_reg(adapter, reg->rx_rdptr, card->rxbd_rdptr |
>>   			      tx_wrap)) {
>> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c
>> index 0234cf3c2974..cbf0565353ae 100644
>> --- a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c
>> +++ b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c
>> @@ -27,7 +27,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
>>   			DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
>>   			DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Pro 4"),
>>   		},
>> -		.driver_data = (void *)QUIRK_FW_RST_D3COLD,
>> +		.driver_data = (void *)(QUIRK_FW_RST_D3COLD |
>> +					QUIRK_DO_FLR_ON_BRIDGE),
>>   	},
>>   	{
>>   		.ident = "Surface Pro 5",
>> @@ -36,7 +37,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
>>   			DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
>>   			DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "Surface_Pro_1796"),
>>   		},
>> -		.driver_data = (void *)QUIRK_FW_RST_D3COLD,
>> +		.driver_data = (void *)(QUIRK_FW_RST_D3COLD |
>> +					QUIRK_DO_FLR_ON_BRIDGE),
>>   	},
>>   	{
>>   		.ident = "Surface Pro 5 (LTE)",
>> @@ -45,7 +47,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
>>   			DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
>>   			DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "Surface_Pro_1807"),
>>   		},
>> -		.driver_data = (void *)QUIRK_FW_RST_D3COLD,
>> +		.driver_data = (void *)(QUIRK_FW_RST_D3COLD |
>> +					QUIRK_DO_FLR_ON_BRIDGE),
>>   	},
>>   	{
>>   		.ident = "Surface Pro 6",
>> @@ -53,7 +56,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
>>   			DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
>>   			DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Pro 6"),
>>   		},
>> -		.driver_data = (void *)QUIRK_FW_RST_D3COLD,
>> +		.driver_data = (void *)(QUIRK_FW_RST_D3COLD |
>> +					QUIRK_DO_FLR_ON_BRIDGE),
>>   	},
>>   	{
>>   		.ident = "Surface Book 1",
>> @@ -61,7 +65,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
>>   			DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
>>   			DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Book"),
>>   		},
>> -		.driver_data = (void *)QUIRK_FW_RST_D3COLD,
>> +		.driver_data = (void *)(QUIRK_FW_RST_D3COLD |
>> +					QUIRK_DO_FLR_ON_BRIDGE),
>>   	},
>>   	{
>>   		.ident = "Surface Book 2",
>> @@ -69,7 +74,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
>>   			DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
>>   			DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Book 2"),
>>   		},
>> -		.driver_data = (void *)QUIRK_FW_RST_D3COLD,
>> +		.driver_data = (void *)(QUIRK_FW_RST_D3COLD |
>> +					QUIRK_DO_FLR_ON_BRIDGE),
>>   	},
>>   	{
>>   		.ident = "Surface Laptop 1",
>> @@ -77,7 +83,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
>>   			DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
>>   			DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Laptop"),
>>   		},
>> -		.driver_data = (void *)QUIRK_FW_RST_D3COLD,
>> +		.driver_data = (void *)(QUIRK_FW_RST_D3COLD |
>> +					QUIRK_DO_FLR_ON_BRIDGE),
>>   	},
>>   	{
>>   		.ident = "Surface Laptop 2",
>> @@ -85,7 +92,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
>>   			DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
>>   			DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Laptop 2"),
>>   		},
>> -		.driver_data = (void *)QUIRK_FW_RST_D3COLD,
>> +		.driver_data = (void *)(QUIRK_FW_RST_D3COLD |
>> +					QUIRK_DO_FLR_ON_BRIDGE),
>>   	},
>>   	{}
>>   };
>> @@ -103,6 +111,8 @@ void mwifiex_initialize_quirks(struct pcie_service_card *card)
>>   		dev_info(&pdev->dev, "no quirks enabled\n");
>>   	if (card->quirks & QUIRK_FW_RST_D3COLD)
>>   		dev_info(&pdev->dev, "quirk reset_d3cold enabled\n");
>> +	if (card->quirks & QUIRK_DO_FLR_ON_BRIDGE)
>> +		dev_info(&pdev->dev, "quirk do_flr_on_bridge enabled\n");
>>   }
>>   
>>   static void mwifiex_pcie_set_power_d3cold(struct pci_dev *pdev)
>> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h
>> index 8ec4176d698f..f8d463f4269a 100644
>> --- a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h
>> +++ b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h
>> @@ -18,6 +18,7 @@
>>   #include "pcie.h"
>>   
>>   #define QUIRK_FW_RST_D3COLD	BIT(0)
>> +#define QUIRK_DO_FLR_ON_BRIDGE	BIT(1)
>>   
>>   void mwifiex_initialize_quirks(struct pcie_service_card *card);
>>   int mwifiex_pcie_reset_d3cold_quirk(struct pci_dev *pdev);
>> -- 
>> 2.31.1
>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ