linux-kernel - Re: [PATCH] drm/amdgpu: add mb for si

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4adc40d1-e775-c480-bb35-23fe9c63e05e@kylinos.cn>
Date:   Fri, 18 Nov 2022 17:24:49 +0800
From:   李真能 <lizhenneng@...inos.cn>
To:     Michel Dänzer <michel.daenzer@...lbox.org>,
        Christian König <christian.koenig@....com>,
        Alex Deucher <alexander.deucher@....com>
Cc:     Xinhui.Pan@....com, linux-kernel@...r.kernel.org,
        dri-devel@...ts.freedesktop.org, amd-gfx@...ts.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: add mb for si


在 2022/11/18 17:18, Michel Dänzer 写道:
> On 11/18/22 09:01, Christian König wrote:
>> Am 18.11.22 um 08:48 schrieb Zhenneng Li:
>>> During reboot test on arm64 platform, it may failure on boot,
>>> so add this mb in smc.
>>>
>>> The error message are as follows:
>>> [    6.996395][ 7] [  T295] [drm:amdgpu_device_ip_late_init [amdgpu]] *ERROR*
>>>                  late_init of IP block <si_dpm> failed -22
>>> [    7.006919][ 7] [  T295] amdgpu 0000:04:00.0: amdgpu_device_ip_late_init failed
>>> [    7.014224][ 7] [  T295] amdgpu 0000:04:00.0: Fatal error during GPU init
>> Memory barries are not supposed to be sprinkled around like this, you need to give a detailed explanation why this is necessary.
>>
>> Regards,
>> Christian.
>>
>>> Signed-off-by: Zhenneng Li <lizhenneng@...inos.cn>
>>> ---
>>>    drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c | 2 ++
>>>    1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c b/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c
>>> index 8f994ffa9cd1..c7656f22278d 100644
>>> --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c
>>> +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c
>>> @@ -155,6 +155,8 @@ bool amdgpu_si_is_smc_running(struct amdgpu_device *adev)
>>>        u32 rst = RREG32_SMC(SMC_SYSCON_RESET_CNTL);
>>>        u32 clk = RREG32_SMC(SMC_SYSCON_CLOCK_CNTL_0);
>>>    +    mb();
>>> +
>>>        if (!(rst & RST_REG) && !(clk & CK_DISABLE))
>>>            return true;
> In particular, it makes no sense in this specific place, since it cannot directly affect the values of rst & clk.

I thinks so too.

But when I do reboot test using nine desktop machines,  there maybe 
report this error on one or two machines after Hundreds of times or 
Thousands of times reboot test, at the beginning, I use msleep() instead 
of mb(), these two methods are all works, but I don't know what is the 
root case.

I use this method on other verdor's oland card, this error message are 
reported again.

What could be the root reason?

test environmen:

graphics card: OLAND 0x1002:0x6611 0x1642:0x1869 0x87

driver: amdgpu

os: ubuntu 2004

platform: arm64

kernel: 5.4.18

>