lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 28 Nov 2023 14:20:27 +1300
From:   Luke Jones <luke@...nes.dev>
To:     Armin Wolf <W_Armin@....de>
Cc:     Mario Limonciello <mario.limonciello@....com>, hdegoede@...hat.com,
        ilpo.jarvinen@...ux.intel.com, corentin.chary@...il.com,
        platform-driver-x86@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/1] platform/x86: asus-wmi: disable USB0 hub on ROG
 Ally before suspend



On Tue, Nov 28 2023 at 02:16:53 AM +01:00:00, Armin Wolf 
<W_Armin@....de> wrote:
> Am 28.11.23 um 01:54 schrieb Luke Jones:
> 
>> 
>> 
>> On Mon, Nov 27 2023 at 10:42:48 PM +01:00:00, Armin Wolf
>> <W_Armin@....de> wrote:
>>> Am 27.11.23 um 21:55 schrieb Mario Limonciello:
>>>> On 11/27/2023 14:46, Luke Jones wrote:
>>>>> 
>>>>> 
>>>>>  On Mon, Nov 27 2023 at 02:14:23 PM -06:00:00, Mario Limonciello
>>>>> <mario.limonciello@....com> wrote:
>>>>>> On 11/26/2023 17:05, Luke D. Jones wrote:
>>>>>>> ASUS have worked around an issue in XInput where it doesn't
>>>>>>> support USB
>>>>>>>  selective suspend, which causes suspend issues in Windows. They
>>>>>>> worked
>>>>>>>  around this by adjusting the MCU firmware to disable the USB0
>>>>>>> hub when
>>>>>>>  the screen is switched off during the Microsoft DSM suspend 
>>>>>>> path
>>>>>>> in ACPI.
>>>>>>> 
>>>>>>>  The issue we have with this however is one of timing - the call
>>>>>>> the tells
>>>>>>>  the MCU to this isn't able to complete before suspend is done 
>>>>>>> so
>>>>>>> we call
>>>>>>>  this in a prepare() and add a small msleep() to ensure it is
>>>>>>> done. This
>>>>>>>  must be done before the screen is switched off to prevent a
>>>>>>> variety of
>>>>>>>  possible races.
>>>>>> 
>>>>>>  Right now the way that Linux handles the LPS0 calls is that
>>>>>> they're all back to back.  Luke did try to inject a delay after
>>>>>> the LPS0 calls were done but before it went to sleep but this
>>>>>> wasn't sufficient.
>>>>>> 
>>>>>>  Another "potential" way to solve this problem from Linux may be
>>>>>> to actually glue the LPS0 screen off call to when DRM actually 
>>>>>> has
>>>>>> eDP turned off.
>>>>>> 
>>>>>>  Making such a change would essentially push back the "screen 
>>>>>> off"
>>>>>> LPS0 command to when the user has run 'systemctl suspend' (or an
>>>>>> action that did this) because the compositor usually turns it off
>>>>>> with DPMS at this time.
>>>>> 
>>>>>  I would be willing to test this if you want some concrete data.
>>>> 
>>>>  It would require some cross subsystem plumbing to evaluate
>>>> feasibility.
>>>>  I don't currently have any plans to do it.
>>>> 
>>>>  I think your patch makes sense; I just want to make it known that
>>>> "might" clean this up if it ever happens.
>>>> 
>>>>> See my big block of text below.
>>>>> 
>>>>>> 
>>>>>>  This is a much bigger change though and *much more ripe for
>>>>>> breakage*.
>>>>>> 
>>>>>>  So I think in may be worth leaving a TODO comment to look into
>>>>>> doing that in the future.
>>>>>  Do you mean add the TODO to a line in this patch?
>>>> 
>>>>  Yeah.  In case someone ever does it (me or otherwise) I think it
>>>> would be good to have some reference in the comments that the 
>>>> commit
>>>> 'might' be possible to revert.
>>>> 
>>>>> 
>>>>>> 
>>>>>>  If that ever happens; it's possible that this change could be
>>>>>> reverted too.
>>>>>> 
>>>>>>> 
>>>>>>>  Further to this the MCU powersave option must also be disabled
>>>>>>> as it can
>>>>>>>  cause a number of issues such as:
>>>>>>>  - unreliable resume connection of N-Key
>>>>>>>  - complete loss of N-Key if the power is plugged in while 
>>>>>>> suspended
>>>>>>>  Disabling the powersave option prevents this.
>>>>>>> 
>>>>>>>  Without this the MCU is unable to initialise itself correctly 
>>>>>>> on
>>>>>>> resume.
>>>>>> 
>>>>>>  initialize
>>>>> 
>>>>>  Are we forced to use USA spelling? I'm from NZ
>>>>>  "initialise is predominantly used in British English (used in
>>>>> UK/AU/NZ) ( en-GB )"
>>>>> 
>>>> 
>>>>  Ah I didn't realize it's an acceptable spelling for en-GB, and
>>>> thought it was just a typo; sorry.
>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>  Signed-off-by: Luke D. Jones <luke@...nes.dev>
>>>>>> 
>>>>>>  I think it would be good to add a Closes: tag to the AMD Gitlab
>>>>>> issue that this was discussed within as well as any other public
>>>>>> references you know about.
>>>>>> 
>>>>>>  Additionally as Phoenix APU support goes back as far as kernel
>>>>>> 6.1 and this is well contained to only run on the ROG I suggest 
>>>>>> to
>>>>>> CC stable so that people can use the ROG on that LTS kernel or 
>>>>>> later.
>>>>>> 
>>>>>>> ---
>>>>>>>  -SNIP-
>>>>>>>    @@ -4701,6 +4749,8 @@ static const struct dev_pm_ops
>>>>>>> asus_pm_ops = {
>>>>>>>        .thaw = asus_hotk_thaw,
>>>>>>>        .restore = asus_hotk_restore,
>>>>>>>        .resume = asus_hotk_resume,
>>>>>>>  +    .resume_early = asus_hotk_resume_early,
>>>>>>>  +    .prepare = asus_hotk_prepare,
>>>>>> 
>>>>>>  Have you experimented with only using the prepare() call or only
>>>>>> the resume_early() call?  Are both really needed?
>>>>> 
>>>>>  I have yes. Although the device comes back eventually in resume
>>>>> after only a prepare call it's not preferable as it tends to 
>>>>> change
>>>>> the device path. With resume_early we can get the device replugged
>>>>> super early (before anything notices it's gone in fact).
>>>>> 
>>>>>  This whole thing is a bit of a mess. It ends up being a race
>>>>> between various things to prevent a HUB0 disconnect being
>>>>> registered by the xhci subsystem, and adding the device back 
>>>>> before
>>>>> the xhci subsystem gets control.
>>>>> 
>>>>>  If I add a sleep longer than 1300ms in prepare then the xhci
>>>>> subsys registers a disconnect of the USB0 hub. If the sleep is
>>>>> under 250ms it isn't quite enough for the MCU to do its thing, and
>>>>> on battery it seems worse.
>>>>> 
>>>>>  I have asked the ASUS guys I'm in contact with for something to
>>>>> disable this MCU behaviour since it is purely a workaround for a
>>>>> broken Windows thing :( They are open to something, maybe an OS
>>>>> detect in ACPI or a WMI method addition similar to the MCU
>>>>> powersave method, from what I'm told it would require an MCU
>>>>> firmware update along with BIOS update. If this eventuates I'll
>>>>> submit an additional patch to check and set that plus disable 
>>>>> this.
>>>> 
>>>>  Don't let them do an OS detection in ACPI, it's going to be too
>>>> painful.
>>>>  I would instead suggest that they can have a bit that you can
>>>> program in via ACPI or WMI from the ASUS WMI driver that says to
>>>> skip the MCU disconnect behavior.
>>>> 
>>> I totally agree, we do not need another _OSI(Linux) type of problem.
>>> Maybe those guys at Asus could just implement a ACPI _DSM for the 
>>> USB
>>> controller in question which allows for disabling this workaround.
>>> This would be preferable to an additional WMI method, since the
>>> notebook would otherwise depend on the asus-wmi driver to suspend
>>> properly.
>>> With the ACPI _DSM, the USB controller driver can disable the
>>> workaround as soon as the USB controller probes.
>> 
>> Would you be so kind as to explain what this means? My knowledge of
>> ACPI is paper thin and generally revolves just around the ASUS WMI
>> part. From what i can find the XHC0 (the hub the MCU is attached to)
>> doesn't have any current _DSM. I understand it means Device Specific
>> Method, so I guess you mean adding a method to be used only if that
>> HUB is there and implements it?
>> 
>> The ROG Ally depends on the asus-wmi driver regardless, without it, 
>> it
>> is barely functional.
>> 
> An ACPI _DSM method is a "standardized" way for manufacturers to add 
> device specific methods to ACPI devices, see
> https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/09_ACPI-Defined_Devices_and_Device-Specific_Objects/ACPIdefined_Devices_and_DeviceSpecificObjects.html 
> for details.
> 
> Implementing via a ACPI _DSM would contain the necessary logic for 
> USB suspend inside the USB hub driver, instead of forcing asus-wmi 
> and the USB hub driver
> to collaborate on USB suspend. The USB hub driver would discover this 
> device specific method during probe and disable the workaround, 
> leaving no chance for
> race conditions between asus-wmi and the USB hub driver to develop.
> 
> Of course if the ROG Ally depends on asus-wmi regardless, then Asus 
> could as well use the WMI interface for this. But IMHO standard ACPI 
> interfaces should be
> preferred to custom ones like WMI.

Okay i understand now, thank you. I'll make a note of it in my next 
response to ASUS.

> 
>>>>> 
>>>>>  I may possibly write a new version of this patch as we've seen
>>>>> that enabling powersave reduces suspend power use by at least 
>>>>> half.
>>>>> And looking through my DSDT dumps, there are a few laptops with 
>>>>> the
>>>>> same feature as Ally. The patch for powersave being enabled
>>>>> requires also AC power state on suspend change detection, and a
>>>>> later forced reset in late resume (and the device paths change
>>>>> regardless when powersave is on).
>>>>> 
>>>>>  When I look at it objectively, the device path changing should be
>>>>> a non-issue really as it is fully handled by USB subsystem and
>>>>> behaves exactly like what it is - a USB hub disconnect. It's just
>>>>> that some userspace apps don't expect this. I will experiment some
>>>>> more.
>>>>> 
>>>>>  Regards,
>>>>>  Luke.
>>>>> 
>>>> 
>>>>  As another experiment - what happens if you "comment out" the LPS0
>>>> calls that do this problematic stuff?
>>>> 
>>>>  It's important to make sure the callback to amd-pmc stays in 
>>>> place,
>>>> but if you just skip those ACPI ones does it still get to the
>>>> deepest state and are there other problems?
>>>> 
>>>>>> 
>>>>>>>   };
>>>>>>>      /* Registration
>>>>>>> ***************************************************************/
>>>>>>>  diff --git a/include/linux/platform_data/x86/asus-wmi.h
>>>>>>> b/include/linux/platform_data/x86/asus-wmi.h
>>>>>>>  index 63e630276499..ab1c7deff118 100644
>>>>>>>  --- a/include/linux/platform_data/x86/asus-wmi.h
>>>>>>>  +++ b/include/linux/platform_data/x86/asus-wmi.h
>>>>>>>  @@ -114,6 +114,9 @@
>>>>>>>    /* Charging mode - 1=Barrel, 2=USB */
>>>>>>>    #define ASUS_WMI_DEVID_CHARGE_MODE    0x0012006C
>>>>>>>    +/* MCU powersave mode */
>>>>>>>  +#define ASUS_WMI_DEVID_MCU_POWERSAVE   0x001200E2
>>>>>>>  +
>>>>>>>    /* epu is connected? 1 == true */
>>>>>>>    #define ASUS_WMI_DEVID_EGPU_CONNECTED    0x00090018
>>>>>>>    /* egpu on/off */
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>> 
>> 
>> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ