[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <295e7882-21a2-f50f-6bfa-b0bae1d0fa12@molgen.mpg.de>
Date: Wed, 20 Apr 2022 22:48:31 +0200
From: Paul Menzel <pmenzel@...gen.mpg.de>
To: Alex Deucher <alexdeucher@...il.com>
Cc: Dave Airlie <airlied@...ux.ie>,
Richard Gong <richard.gong@....com>,
Xinhui Pan <xinhui.pan@....com>,
LKML <linux-kernel@...r.kernel.org>,
amd-gfx list <amd-gfx@...ts.freedesktop.org>,
Maling list - DRI developers
<dri-devel@...ts.freedesktop.org>, Daniel Vetter <daniel@...ll.ch>,
Alexander Deucher <alexander.deucher@....com>,
Christian König <christian.koenig@....com>,
Mario Limonciello <mario.limonciello@....com>
Subject: Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based
systems
Dear Alex,
Am 20.04.22 um 22:40 schrieb Alex Deucher:
> On Wed, Apr 20, 2022 at 4:29 PM Paul Menzel <pmenzel@...gen.mpg.de> wrote:
>> Am 19.04.22 um 23:46 schrieb Gong, Richard:
>>
>>> On 4/14/2022 2:52 AM, Paul Menzel wrote:
>>>> [Cc: -kernel test robot <lkp@...el.com>]
>>
>> […]
>>
>>>> Am 13.04.22 um 15:00 schrieb Alex Deucher:
>>>>> On Wed, Apr 13, 2022 at 3:43 AM Paul Menzel wrote:
>>>>
>>>>>> Thank you for sending out v4.
>>>>>>
>>>>>> Am 12.04.22 um 23:50 schrieb Richard Gong:
>>>>>>> Active State Power Management (ASPM) feature is enabled since
>>>>>>> kernel 5.14.
>>>>>>> There are some AMD GFX cards (such as WX3200 and RX640) that won't
>>>>>>> work
>>>>>>> with ASPM-enabled Intel Alder Lake based systems. Using these GFX
>>>>>>> cards as
>>>>>>> video/display output, Intel Alder Lake based systems will hang during
>>>>>>> suspend/resume.
>>
>> [Your email program wraps lines in cited text for some reason, making
>> the citation harder to read.]
>>
>>>>>>
>>>>>> I am still not clear, what “hang during suspend/resume” means. I guess
>>>>>> suspending works fine? During resume (S3 or S0ix?), where does it hang?
>>>>>> The system is functional, but there are only display problems?
>>> System freeze after suspend/resume.
>>
>> But you see certain messages still? At what point does it freeze
>> exactly? In the bug report you posted Linux messages.
>>
>>>>>>> The issue was initially reported on one system (Dell Precision 3660
>>>>>>> with
>>>>>>> BIOS version 0.14.81), but was later confirmed to affect at least 4
>>>>>>> Alder
>>>>>>> Lake based systems.
>>>>>>>
>>>>>>> Add extra check to disable ASPM on Intel Alder Lake based systems.
>>>>>>>
>>>>>>> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
>>>>>>> Link:
>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1885&data=04%7C01%7Crichard.gong%40amd.com%7Ce7febed5d6a441c3a58008da1debb99c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637855195670542145%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=7cEnE%2BSM9e5IGFxSLloCLtCOxovBpaPz0Ns0Ta2vVlc%3D&reserved=0
>>
>> Thank you Microsoft Outlook for keeping us safe. :(
>>
>>>>>>>
>>>>>>> Reported-by: kernel test robot <lkp@...el.com>
>>>>>>
>>>>>> This tag is a little confusing. Maybe clarify that it was for an issue
>>>>>> in a previous patch iteration?
>>>
>>> I did describe in change-list version 3 below, which corrected the build
>>> error with W=1 option.
>>>
>>> It is not good idea to add the description for that to the commit
>>> message, this is why I add descriptions on change-list version 3.
>>
>> Do as you wish, but the current style is confusing, and readers of the
>> commit are going to think, the kernel test robot reported the problem
>> with AMD VI ASICs and Intel Alder Lake systems.
>>
>>>>>>
>>>>>>> Signed-off-by: Richard Gong <richard.gong@....com>
>>>>>>> ---
>>>>>>> v4: s/CONFIG_X86_64/CONFIG_X86
>>>>>>> enhanced check logic
>>>>>>> v3: s/intel_core_asom_chk/aspm_support_quirk_check
>>>>>>> correct build error with W=1 option
>>>>>>> v2: correct commit description
>>>>>>> move the check from chip family to problematic platform
>>>>>>> ---
>>>>>>> drivers/gpu/drm/amd/amdgpu/vi.c | 17 ++++++++++++++++-
>>>>>>> 1 file changed, 16 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>> index 039b90cdc3bc..b33e0a9bee65 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>> @@ -81,6 +81,10 @@
>>>>>>> #include "mxgpu_vi.h"
>>>>>>> #include "amdgpu_dm.h"
>>>>>>>
>>>>>>> +#if IS_ENABLED(CONFIG_X86)
>>>>>>> +#include <asm/intel-family.h>
>>>>>>> +#endif
>>>>>>> +
>>>>>>> #define ixPCIE_LC_L1_PM_SUBSTATE 0x100100C6
>>>>>>> #define PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK
>>>>>>> 0x00000001L
>>>>>>> #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK
>>>>>>> 0x00000002L
>>>>>>> @@ -1134,13 +1138,24 @@ static void vi_enable_aspm(struct
>>>>>>> amdgpu_device *adev)
>>>>>>> WREG32_PCIE(ixPCIE_LC_CNTL, data);
>>>>>>> }
>>>>>>>
>>>>>>> +static bool aspm_support_quirk_check(void)
>>>>>>> +{
>>>>>>> + if (IS_ENABLED(CONFIG_X86)) {
>>>>>>> + struct cpuinfo_x86 *c = &cpu_data(0);
>>>>>>> +
>>>>>>> + return !(c->x86 == 6 && c->x86_model ==
>>>>>>> INTEL_FAM6_ALDERLAKE);
>>>>>>> + }
>>>>>>> +
>>>>>>> + return true;
>>>>>>> +}
>>>>>>> +
>>>>>>> static void vi_program_aspm(struct amdgpu_device *adev)
>>>>>>> {
>>>>>>> u32 data, data1, orig;
>>>>>>> bool bL1SS = false;
>>>>>>> bool bClkReqSupport = true;
>>>>>>>
>>>>>>> - if (!amdgpu_device_should_use_aspm(adev))
>>>>>>> + if (!amdgpu_device_should_use_aspm(adev) ||
>>>>>>> !aspm_support_quirk_check())
>>>>>>> return;
>>>>>>
>>>>>> Can users still forcefully enable ASPM with the parameter
>>>>>> `amdgpu.aspm`?
>>>>>>
>>> As Mario mentioned in a separate reply, we can't forcefully enable ASPM
>>> with the parameter 'amdgpu.aspm'.
>>
>> That would be a regression on systems where ASPM used to work. Hmm. I
>> guess, you could say, there are no such systems.
>>
>>>>>>>
>>>>>>> if (adev->flags & AMD_IS_APU ||
>>>>>>
>>>>>> If I remember correctly, there were also newer cards, where ASPM worked
>>>>>> with Intel Alder Lake, right? Can only the problematic generations for
>>>>>> WX3200 and RX640 be excluded from ASPM?
>>>>>
>>>>> This patch only disables it for the generatioaon that was problematic.
>>>>
>>>> Could that please be made clear in the commit message summary, and
>>>> message?
>>>
>>> Are you ok with the commit messages below?
>>
>> Please change the commit message summary. Maybe:
>>
>> drm/amdgpu: VI: Disable ASPM on Intel Alder Lake based systems
>>
>>> Active State Power Management (ASPM) feature is enabled since kernel 5.14.
>>>
>>> There are some AMD GFX cards (such as WX3200 and RX640) that won't work
>>> with ASPM-enabled Intel Alder Lake based systems. Using these GFX cards as
>>> video/display output, Intel Alder Lake based systems will freeze after
>>> suspend/resume.
>>
>> Something like:
>>
>> On Intel Alder Lake based systems using ASPM with AMD GFX Volcanic
>> Islands (VI) cards, like WX3200 and RX640, graphics don’t initialize
>> when resuming from S0ix(?).
>>
>>
>>> The issue was initially reported on one system (Dell Precision 3660 with
>>> BIOS version 0.14.81), but was later confirmed to affect at least 4 Alder
>>> Lake based systems.
>>
>> Which ones?
>>
>>> Add extra check to disable ASPM on Intel Alder Lake based systems with
>>> problematic generation GFX cards.
>>
>> … with the problematic Volcanic Islands GFX cards.
>>
>>>>
>>>> Loosely related, is there a public (or internal issue) to analyze how
>>>> to get ASPM working for VI generation devices with Intel Alder Lake?
>>>
>>> As Alex mentioned, we need support from Intel. We don't have any update
>>> on that.
>>
>> It’d be great to get that fixed properly.
>>
>> Last thing, please don’t hate me, does Linux log, that ASPM is disabled?
>
> I'm not sure what gets logged at the platform level with respect to
> ASPM, but whether or not the driver enables ASPM is tied to whether
> ASPM is allowed at the platform level or not so if the platform
> indicates that ASPM is not supported, the driver won't enable it. The
> driver does not log whether ASPM is enabled or not if that is what you
> are asking. As to whether or not it should, it comes down to how much
> stuff is worth indiciating in the log. The driver is already pretty
> chatty by driver standards.
I specifically mean, Linux should log the quirks it applies. (As a
normal user, I’d also expect ASPM to work nowadays, so a message, that
it’s disabled would help a lot.)
Kind regards,
Paul
Powered by blists - more mailing lists