[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <caa27d52-d668-4320-b40f-0e1fde8c0a9b@amd.com>
Date: Wed, 5 Nov 2025 18:45:18 +0530
From: Shyam Sundar S K <Shyam-sundar.S-k@....com>
To: Antheas Kapenekakis <lkml@...heas.dev>
Cc: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>,
Mario Limonciello <mario.limonciello@....com>,
Alex Deucher <alexander.deucher@....com>, Perry Yuan <perry.yuan@....com>,
amd-gfx@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org,
LKML <linux-kernel@...r.kernel.org>, platform-driver-x86@...r.kernel.org,
Sanket Goswami <Sanket.Goswami@....com>
Subject: Re: [PATCH v1 1/3] platform/x86/amd/pmc: Add support for Van Gogh SoC
On 11/5/2025 17:04, Antheas Kapenekakis wrote:
> On Wed, 5 Nov 2025 at 12:28, Shyam Sundar S K <Shyam-sundar.S-k@....com> wrote:
>>
>> Hi Ilpo,
>>
>> On 11/5/2025 16:43, Ilpo Järvinen wrote:
>>> On Mon, 27 Oct 2025, Antheas Kapenekakis wrote:
>>>
>>>> On Mon, 27 Oct 2025 at 09:36, Shyam Sundar S K <Shyam-sundar.S-k@....com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 10/27/2025 13:52, Shyam Sundar S K wrote:
>>>>>>
>>>>>>
>>>>>> On 10/24/2025 22:02, Mario Limonciello wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 10/24/2025 11:08 AM, Antheas Kapenekakis wrote:
>>>>>>>> On Fri, 24 Oct 2025 at 17:43, Mario Limonciello
>>>>>>>> <mario.limonciello@....com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 10/24/2025 10:21 AM, Antheas Kapenekakis wrote:
>>>>>>>>>> The ROG Xbox Ally (non-X) SoC features a similar architecture to the
>>>>>>>>>> Steam Deck. While the Steam Deck supports S3 (s2idle causes a crash),
>>>>>>>>>> this support was dropped by the Xbox Ally which only S0ix suspend.
>>>>>>>>>>
>>>>>>>>>> Since the handler is missing here, this causes the device to not
>>>>>>>>>> suspend
>>>>>>>>>> and the AMD GPU driver to crash while trying to resume afterwards
>>>>>>>>>> due to
>>>>>>>>>> a power hang.
>>>>>>>>>>
>>>>>>>>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4659
>>>>>>>>>> Signed-off-by: Antheas Kapenekakis <lkml@...heas.dev>
>>>>>>>>>> ---
>>>>>>>>>> drivers/platform/x86/amd/pmc/pmc.c | 3 +++
>>>>>>>>>> drivers/platform/x86/amd/pmc/pmc.h | 1 +
>>>>>>>>>> 2 files changed, 4 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/platform/x86/amd/pmc/pmc.c b/drivers/
>>>>>>>>>> platform/x86/amd/pmc/pmc.c
>>>>>>>>>> index bd318fd02ccf..cae3fcafd4d7 100644
>>>>>>>>>> --- a/drivers/platform/x86/amd/pmc/pmc.c
>>>>>>>>>> +++ b/drivers/platform/x86/amd/pmc/pmc.c
>>>>>>>>>> @@ -106,6 +106,7 @@ static void amd_pmc_get_ip_info(struct
>>>>>>>>>> amd_pmc_dev *dev)
>>>>>>>>>> switch (dev->cpu_id) {
>>>>>>>>>> case AMD_CPU_ID_PCO:
>>>>>>>>>> case AMD_CPU_ID_RN:
>>>>>>>>>> + case AMD_CPU_ID_VG:
>>>>>>>>>> case AMD_CPU_ID_YC:
>>>>>>>>>> case AMD_CPU_ID_CB:
>>>>>>>>>> dev->num_ips = 12;
>>>>>>>>>> @@ -517,6 +518,7 @@ static int amd_pmc_get_os_hint(struct
>>>>>>>>>> amd_pmc_dev *dev)
>>>>>>>>>> case AMD_CPU_ID_PCO:
>>>>>>>>>> return MSG_OS_HINT_PCO;
>>>>>>>>>> case AMD_CPU_ID_RN:
>>>>>>>>>> + case AMD_CPU_ID_VG:
>>>>>>>>>> case AMD_CPU_ID_YC:
>>>>>>>>>> case AMD_CPU_ID_CB:
>>>>>>>>>> case AMD_CPU_ID_PS:
>>>>>>>>>> @@ -717,6 +719,7 @@ static const struct pci_device_id
>>>>>>>>>> pmc_pci_ids[] = {
>>>>>>>>>> { PCI_DEVICE(PCI_VENDOR_ID_AMD, AMD_CPU_ID_RV) },
>>>>>>>>>> { PCI_DEVICE(PCI_VENDOR_ID_AMD, AMD_CPU_ID_SP) },
>>>>>>>>>> { PCI_DEVICE(PCI_VENDOR_ID_AMD, AMD_CPU_ID_SHP) },
>>>>>>>>>> + { PCI_DEVICE(PCI_VENDOR_ID_AMD, AMD_CPU_ID_VG) },
>>>>>>>>>> { PCI_DEVICE(PCI_VENDOR_ID_AMD,
>>>>>>>>>> PCI_DEVICE_ID_AMD_1AH_M20H_ROOT) },
>>>>>>>>>> { PCI_DEVICE(PCI_VENDOR_ID_AMD,
>>>>>>>>>> PCI_DEVICE_ID_AMD_1AH_M60H_ROOT) },
>>>>>>>>>> { }
>>>>>>>>>> diff --git a/drivers/platform/x86/amd/pmc/pmc.h b/drivers/
>>>>>>>>>> platform/x86/amd/pmc/pmc.h
>>>>>>>>>> index 62f3e51020fd..fe3f53eb5955 100644
>>>>>>>>>> --- a/drivers/platform/x86/amd/pmc/pmc.h
>>>>>>>>>> +++ b/drivers/platform/x86/amd/pmc/pmc.h
>>>>>>>>>> @@ -156,6 +156,7 @@ void amd_mp2_stb_deinit(struct amd_pmc_dev *dev);
>>>>>>>>>> #define AMD_CPU_ID_RN 0x1630
>>>>>>>>>> #define AMD_CPU_ID_PCO AMD_CPU_ID_RV
>>>>>>>>>> #define AMD_CPU_ID_CZN AMD_CPU_ID_RN
>>>>>>>>>> +#define AMD_CPU_ID_VG 0x1645
>>>>>>>>>
>>>>>>>>> Can you see if 0xF14 gives you a reasonable value for the idle mask if
>>>>>>>>> you add it to amd_pmc_idlemask_read()? Make a new define for it
>>>>>>>>> though,
>>>>>>>>> it shouldn't use the same define as 0x1a platforms.
>>>>>>>>
>>>>>>>> It does not work. Reports 0. I also tested the other ones, but the
>>>>>>>> 0x1a was the same as you said. All report 0x0.
>>>>>>>
>>>>>>> It's possible the platform doesn't report an idle mask.
>>>>>>>
>>>>>>> 0xF14 is where I would have expected it to report.
>>>>>>>
>>>>>>> Shyam - can you look into this to see if it's in a different place
>>>>>>> than 0xF14 for Van Gogh?
>>>>>>
>>>>>> Van Gogh is before Cezzane? I am bit surprised that pmc is getting
>>>>>> loaded there.
>>>>>>
>>>>>> Antheas - what is the output of
>>>>>>
>>>>>> #lspci -s 00:00.0
>>>>>
>>>>> OK. I get it from the diff.
>>>>>
>>>>> +#define AMD_CPU_ID_VG 0x1645
>>>>>
>>>>> S0 its 0x1645 that indicates SoC is 17h family and 90h model.
>>>>>
>>>>> What is the PMFW version running on your system?
>>>>> amd_pmc_get_smu_version() tells you that information.
>>>>
>>>> cat /sys/devices/platform/AMDI0005:00/smu_fw_version
>>>> 63.18.0
>>>> cat /sys/devices/platform/AMDI0005:00/smu_program
>>>> 7
>>>>
>>>>> Can you see if you put the scratch information same as Cezzane and if
>>>>> that works? i.e.
>>>>>
>>>>> AMD_PMC_SCRATCH_REG_CZN(0x94) instead of AMD_PMC_SCRATCH_REG_1AH(0xF14)
>>>>
>>>> I tried all idle masks and they return 0
>>>
>>> Hi Shyam & Antheas,
>>>
>>> This discussion seems to have died down without clear indication what's
>>> the best course of action here. Should I still wait?
>>>
>>> There's no particular hurry from my side but it seems Mario gave his
>>> Reviewed-by already and there hasn't been any follow-ups between you two,
>>> I'm left a bit unsure how to interpret that.
>>>
>>
>> The thought process to was understand how do we debug the rest 5%
>> failures when we do no not have idlemask concept, which got introduced
>> after sometime. But both the patches should work independently, so I
>> am ok with both patch 1/3 and 2/3.
>>
>> Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@....com>
>>
>>
>>>
>>> In addition, is the patch 3/3 entire independent from these two PMC ones?
>>> (If yes, I don't know why they were submitted as a series as that just
>>> manages to add a little bit of uncertainty when combined into a series.)
>>
>> I see a note from Mario on the cover letter that the patch 3/3 can be
>> dropped from this series and a newer approach is being planned.
>
> To be more specific, patch 3 became two separate patches that went through drm.
>
> For the rare failure, it would be an additional patch (if appropriate)
> that does not affect 1 and 2.
>
> Do you have any idea of where the failure for the other 5% of cases
> comes from? I noticed that after I hibernated my device and it booted
> up, it would never go into LPS0, the OS hint stopped working, would
> that be a hint?
Possibly. If the PMC driver did send the hint but the PMFW didn’t act
on it, that could explain it. However, your earlier logs don’t
indicate that, and the PMFW response register shows success, so I am
unsure about it.
Thanks,
Shyam
>
> Antheas
>
>> So, 1/3 and 2/3 of this series can be taken.
>>
>> Thanks,
>> Shyam
>>>
>>> Thanks in advance,
>>>
>>> --
>>> i.
>>>
>>>> Antheas
>>>>
>>>>> Thanks,
>>>>> Shyam
>>>>>
>>>>>
>>>>>>
>>>>>> 0xF14 index is meant for 1Ah (i.e. Strix and above)
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Any idea why the OS hint only works 90% of the time?
>>>>>>
>>>>>> What is the output of amd_pmc_dump_registers() when 10% of the time
>>>>>> when the OS_HINT is not working?
>>>>>>
>>>>>> What I can surmise is, though pmc driver is sending the hint PMFW is
>>>>>> not taking any action (since the support in FW is missing)
>>>>>>
>>>>>>>
>>>>>>> If we get the idle mask reporting working we would have a better idea
>>>>>>> if that is what is reported wrong.
>>>>>>>
>>>>>>
>>>>>> IIRC, The concept of idlemask came only after cezzane that too after a
>>>>>> certain PMFW version. So I am not sure if idlemask actually exists.
>>>>>>
>>>>>>
>>>>>>> If I was to guess though; maybe GFX is still active.
>>>>>>>
>>>>>>> Depending upon what's going wrong smu_fw_info might have some more
>>>>>>> information too.
>>>>>>
>>>>>> That's a good point to try it out.
>>>>>>
>>>>>> Thanks,
>>>>>> Shyam
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>
Powered by blists - more mailing lists