lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e66973b9-29e5-4d77-a069-8bd7d264f402@kernel.org>
Date: Thu, 19 Dec 2024 15:35:08 -0600
From: Mario Limonciello <superm1@...nel.org>
To: Antheas Kapenekakis <lkml@...heas.dev>
Cc: Shyam Sundar S K <Shyam-sundar.S-k@....com>,
 "Rafael J . Wysocki" <rafael@...nel.org>, Hans de Goede
 <hdegoede@...hat.com>, Ilpo Järvinen
 <ilpo.jarvinen@...ux.intel.com>, "Luke D . Jones" <luke@...nes.dev>,
 Mark Pearson <mpearson-lenovo@...ebb.ca>,
 "open list:AMD PMF DRIVER" <platform-driver-x86@...r.kernel.org>,
 open list <linux-kernel@...r.kernel.org>,
 "open list:ACPI" <linux-acpi@...r.kernel.org>,
 "Derek J . Clark" <derekjohn.clark@...il.com>, me@...egospodneti.ch,
 Denis Benato <benato.denis96@...il.com>,
 Mario Limonciello <mario.limonciello@....com>
Subject: Re: [RFC 2/2] platform/x86/amd: pmf: Add manual control support

On 12/19/2024 15:27, Antheas Kapenekakis wrote:
> On Thu, 19 Dec 2024 at 22:21, Mario Limonciello <superm1@...nel.org> wrote:
>>
>> On 12/19/2024 15:10, Antheas Kapenekakis wrote:
>>> On Thu, 19 Dec 2024 at 17:14, Mario Limonciello <superm1@...nel.org> wrote:
>>>>
>>>> On 12/19/2024 09:24, Antheas Kapenekakis wrote:
>>>>> On Thu, 19 Dec 2024 at 15:50, Mario Limonciello <superm1@...nel.org> wrote:
>>>>>>
>>>>>> On 12/19/2024 07:12, Antheas Kapenekakis wrote:
>>>>>>> Hi Mario,
>>>>>>> given that there is a Legion Go driver in the works, and Asus already
>>>>>>> has a driver, the only thing that would be left for locking down ACPI
>>>>>>> access is manufacturers w/o vendor APIs.
>>>>>>>
>>>>>>> So, can we restart the conversation about this driver? It would be
>>>>>>> nice to get to a place where we can lock down /dev/mem and ACPI by
>>>>>>> spring.
>>>>>>
>>>>>> As Shyam mentioned we don't have control for limits by the PMF driver
>>>>>> for this on PMF v2 (Strix) or later platforms.
>>>>>>
>>>>>> So if we were to revive this custom discussion it would only be for
>>>>>> Phoenix and Hawk Point platforms.
>>>>>
>>>>> That's unfortunate.
>>>>>
>>>>>>>
>>>>>>> Moreover, since the other two proposed drivers use the
>>>>>>> firmware_attributes API, should this be used here as well?
>>>>>>
>>>>>> I do feel that if we revive this conversation specifically for Phoenix
>>>>>> and Hawk Point platforms yes we should use the same API to expose it to
>>>>>> userspace as those other two drivers do.
>>>>>>
>>>>>> I'd like Shyam's temperature on this idea though before anyone spends
>>>>>> time on it.  If he's amenable would you want to work on it?
>>>>>
>>>>> We currently expect the 2025 lineup to include a lot of Strix Point
>>>>> handhelds, so I'd like a solution that works with that. OneXPlayer
>>>>> released a model already, and GPD is getting ready to ship as well.
>>>>>
>>>>> Yeah, I could throw some hours to it after I go through some overdue stuff.
>>>>>
>>>>>>>
>>>>>>> By the way, you were right about needing a taint for this. Strix Point
>>>>>>> fails to enter a lower power state during sleep if you set it to lower
>>>>>>> than 10W. This is not ideal, as hawk point could go down to 5 while
>>>>>>> still showing a power difference, but I am unsure where this bug
>>>>>>> should be reported. This is both through ryzenadj/ALIB
>>>>>>
>>>>>> Who is to say this is a bug?  Abusing a debugging interface with a
>>>>>> reverse engineered tool means you might be able to configure a platform
>>>>>> out of specifications.
>>>>>
>>>>> The spec being 10+W would be very undesirable for handhelds with Strix
>>>>> Point, so I'd hope somebody looks into it, esp. if it can be fixed
>>>>> with a BIOS fw update before more handhelds come out. I can raise the
>>>>> minimum TDP to 10W, with some user complaints.
>>>>>
>>>>> Asus and Lenovo use the same mailbox so they'd share the issue too.
>>>>>
>>>>> FYI for a typical handheld with e.g., a 60Wh battery, a 10W envelope
>>>>> results in around 20-22W total consumption which is around 2.5 hours.
>>>>> Hawk Point can be TDP limited down to 16W total consumption (TDP ~7W)
>>>>> and can go down to 8W with frame limiting etc. I do not have numbers
>>>>> for Strix Point yet, but to match Hawk Point it has to allow TDP to go
>>>>> down to 7W. I think for 2025, customer expectation will be 6-8 hours+
>>>>> at low wattages.
>>>>>
>>>>
>>>> I've got a fundamental question - why the fixation on PPT?
>>>>
>>>> This just sets "limits" for the package.  In Windows it's probably the
>>>> best knob to tune to adjust performance in an effort to extend battery
>>>> life, but in Linux we have a lot of other knobs:
>>>>
>>>> * the ability to tune EPP (energy_performance_preference)
>>>> * set min and max CPU frequencies (scaling_min_freq, scaling_max_freq)
>>>
>>> We use both of these.
>>>
>>>> * offline cores at will
>>>
>>> if a core is parked and you try to write into its sysfs entrypoints,
>>> we found that this might cause a userspace program to hang
>>> indefinitely. Since a lot of settings are per core that's problematic
>>> and since it does not help much most TDP programs dont offer it
>>> anymore.
>>
>> This sounds like a kernel bug if you're hanging programs when trying to
>> write to sysfs files of offlined cores.  If we can get that fixed having
>> that in your toolbelt is quite useful.  I'm sure there are plenty of
>> games that don't really need all the cores up and you can save some power.
>>
>> Can you get a simple reproducer for me into a bug report to look at next
>> year?
> 
> I will try to. This was relayed to me. 

Thanks! If whoever relayed it to you opens the bug report that's totally 
fine with me too.  Just ping me after the new year if I miss it because 
it will be lost in a giant pile of other stuff.

> Disabling SMT also causes a
> crash on the Ally when going to sleep.

Yes; SMT is require to be enabled for s0i3 to work.

That's why the s2idle debugging script flags it.

https://gitlab.freedesktop.org/drm/amd/-/blob/master/scripts/amd_s2idle.py#L1207
> 
>>>
>>>> * change DPM setting in the GPU driver (power_dpm_force_performance_level)
>>>
>>> I think we played with this mostly to try to get lower than 800mhz.
>>> However, going lower than 800mhz in these APUs causes issues.
>>>
>>>> All the core related knobs can be changed on a per-core basis.  So for
>>>> example even on a non-heterogeneous design you could potentially make it
>>>> perform "like" a hetero design where you set it so that some cores don't
>>>> go above nominal frequency or the EPP value is tuned less aggressively
>>>> on some cores.
>>>
>>>> These knobs can have just as drastic of a result on battery life as
>>>> adjusting the various power limiting knobs.  Most importantly these
>>>> knobs have architectural limits that you won't be able to override so
>>>> you can safely change them to min/max and see what happens.
>>>
>>> I feel like we are discussing different targets here. When it comes to
>>> computing tasks, you have a certain block of work that needs to be
>>> done and after that the CPU is free. In this case, programs like tuned
>>> (allegedly) optimize these settings so that they take the minimum
>>> amount of power to complete that block of work.
>>>
>>> However, games are different. Games have no problem burning power if
>>> you let them and they are also playable at a variety of power levels.
>>> Typically, unless the user caps the framerate and video quality of the
>>> game it will use the full slow temp limit value. Even if they do set
>>> that, the game will typically burn 3-4W more than what is needed
>>> depending on TDP, EPP etc.
>>
>> Part of what I'm wondering is if our 4 levels of EPP values "aren't
>> enough" for optimization on a per game basis.
>>
>> IMO They're incredibly rigid.  I do have a patch that can expose "raw"
>> numbers for amd-pstate like intel-pstate does, but I haven't brought it
>> on the lists yet because I'm still discussing it with others internal to
>> AMD.
>>
>> EPP is really about responsiveness in games.
> 
> EPP performance is so detrimental we hide it. It destroys performance
> by sucking power from the GPU. EPP balance_performance is only useful
> in certain emulators that need a lot of CPU. Only balance_power is
> useful. Then, for TDPs lower than 10, setting EPP to power milks
> another 1-2W

Yes; this is a different conversation once you're talking about the 
power share between CPU cores and GPU.  It's part of why I was talking 
about core parking when you don't need all the cores.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ