lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d9158206-8ebe-c857-7533-47155a6464e1@gmail.com>
Date:   Fri, 9 Jul 2021 19:03:37 +0200
From:   Maximilian Luz <luzmaximilian@...il.com>
To:     Pali Rohár <pali@...nel.org>,
        Jonas Dreßler <verdre@...d.nl>
Cc:     Amitkumar Karwar <amitkarwar@...il.com>,
        Ganapathi Bhat <ganapathi.bhat@....com>,
        Xinming Hu <huxinming820@...il.com>,
        Kalle Valo <kvalo@...eaurora.org>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Tsuchiya Yuto <kitakar@...il.com>,
        linux-wireless@...r.kernel.org, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>
Subject: Re: [PATCH v2 2/2] mwifiex: pcie: add reset_d3cold quirk for Surface
 gen4+ devices

On 7/9/21 6:12 PM, Pali Rohár wrote:

[...]

>>> Hello! Now I'm thinking loudly about this patch. Why this kind of reset
>>> is needed only for Surface devices? AFAIK these 88W8897 chips are same
>>> in all cards. Chip itself implements PCIe interface (and also SDIO) so
>>> for me looks very strange if this 88W8897 PCIe device needs DMI specific
>>> quirks. I cannot believe that Microsoft got some special version of
>>> these chips from Marvell which are different than version uses on cards
>>> in mPCIe form factor.
>>>
>>> And now when I'm reading comment below about PCIe bridge to which is
>>> this 88W8897 PCIe chip connected, is not this rather an issue in that
>>> PCIe bridge (instead of mwifiex/88W8897) or in ACPI firmware which
>>> controls this bridge?
>>>
>>> Or are having other people same issues on mPCIe form factor wifi cards
>>> with 88W8897 chips and then this quirk should not DMI dependent?
>>>
>>> Note that I'm seeing issues with reset and other things also on chip
>>> 88W8997 when is connected to system via SDIO. These chips have both PCIe
>>> and SDIO buses, it just depends which pins are used.
>>>
>>
>> Hi and thanks for the quick reply! Honestly I've no idea, this is just the
>> first method we found that allows for a proper reset of the chip. What I
>> know is that some Surface devices need that ACPI DSM call (the one that was
>> done in the commit I dropped in this version of the patchset) to reset the
>> chip instead of this method.
>>
>> Afaik other devices with this chip don't need this resetting method, at
>> least Marvell employees couldn't reproduce the issues on their testing
>> devices.
>>
>> So would you suggest we just try to match for the pci chip 88W8897 instead?
> 
> Hello! Such suggestion makes sense when we know that it is 88W8897
> issue. But if you got information that issue cannot be reproduced on
> other 88W8897 cards then matching 88W8897 is not correct.
> 
>  From all this information looks like that it is problem in (Microsoft?)
> PCIe bridge to which is card connected. Otherwise I do not reason how it
> can be 88W8897 affected. Either it is reproducible on 88W8897 cards also
> in other devices or issue is not on 88W8897 card.

I doubt that it's an issue with the PCIe bridge (itself at least). The
same type of bridge is used for both dGPU and NVME SSD on my device (see
lspci output below) and those work fine. Also if I'm seeing that right
it's from the Intel CPU, so my guess is that a lot more people would
have issues with that then.

I don't know about the hardware side, so it might be possible that it's
an issue with integrating both bridge and wifi chip, in which case it's
still probably best handled via DMI quirks unless we know more.

Also as Tsuchiya mentioned in his original submission, on Windows the
device is reset via this D3cold method. I've only skimmed that
errata.inf file mentioned, but I think this is what he's referring to:

   Controls whether ACPIDeviceEnableD3ColdOnSurpriseRemoval rule will be
   evaluated or not on a given platform. Currently
   ACPIDeviceEnableD3ColdOnSurpriseRemoval rule only needs to be
   evaluated on Surface platforms which contain the Marvell WiFi
   controller which depends on device going through D3Cold as part of
   surprise-removal.

and

   Starting with Windows releases *after* Blue, ACPI will not put
   surprise-removed devices into D3Cold automatically. Some known
   scenarios (viz. WiFi reset/recovery) rely on the device cycling
   through D3Cold on surprise-removal. This hack allows surprise-removed
   devices to be put into D3Cold (if supported by the stack).

So, as far as I can tell, the chip doesn't like to be surprise-removed
(which seems to happen during reset) and then needs to be power-cycled,
which I think is likely due to some issue with firmware state.

So the quirk on Windows seems very Surface specific.

There also seem a bunch of revisions of these chips around, for example
my SB2 is affected by a bug that we've tied to the specific hardware
revision which causes some issues with host-sleep (IIRC chip switches
rapidly between wake and sleep states without any external influence,
which is not how it should behave and how it does behave on a later
hardware revision).

>> Then we'd probably have to check if there are any laptops where multiple
>> devices are connected to the pci bridge as Amey suggested in a review
>> before.
> 
> Well, I do not know... But if this is issue with PCIe bridge then
> similar issue could be observed also for other PCIe devices with this
> PCIe bridge. But question is if there are other laptops with this PCIe
> bridge. And also it can be a problem in ACPI firmware on those Surface
> devices, which implements some PCIe bridge functionality. So it is
> possible that issue is with PCIe bridge, not in HW, but in SW/firmware
> part which can be Microsoft specific... So too many questions to which
> we do not know answers.
> 
> Could you provide output of 'lspci -nn -vv' and 'lspci -tvnn' on
> affected machines? If you have already sent it in some previous email,
> just send a link. At least I'm not able to find it right now and output
> may contain something useful...

 From my Surface Book 2 (with the same issue):

  - lspci -tvnn: https://paste.ubuntu.com/p/mm3YpcZJ8N/
  - lspci -vv -nn: https://paste.ubuntu.com/p/dctTDP738N/

Regards,
Max

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ