lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <e0680f5f-c8fe-2bbb-dfee-cf9c148d60da@molgen.mpg.de>
Date:   Tue, 29 Mar 2022 08:01:25 +0200
From:   Paul Menzel <pmenzel@...gen.mpg.de>
To:     李真能 <lizhenneng@...inos.cn>
Cc:     Andrey Grodzovsky <andrey.grodzovsky@....com>,
        Pan Xinhui <Xinhui.Pan@....com>,
        Guchun Chen <guchun.chen@....com>,
        David Airlie <airlied@...ux.ie>,
        Lijo Lazar <lijo.lazar@....com>, linux-kernel@...r.kernel.org,
        amd-gfx@...ts.freedesktop.org,
        Christian König <christian.koenig@....com>,
        linaro-mm-sig@...ts.linaro.org, dri-devel@...ts.freedesktop.org,
        Daniel Vetter <daniel@...ll.ch>,
        Kevin Wang <kevin1.wang@....com>,
        Alex Deucher <alexander.deucher@....com>,
        Evan Quan <evan.quan@....com>,
        Sumit Semwal <sumit.semwal@...aro.org>,
        linux-media@...r.kernel.org
Subject: Re: 回复: Re: [PATCH] drm/amdgpu: resolve s3 hang for r7340

Dear 李真,


[Your mailer formatted the message oddly. Maybe configure it to use only 
plain text email with no HTML parts – common in Linux kernel community 
–, or, if not possible, switch to something else (Mozilla Thunderbird, …).]


Am 29.03.22 um 04:54 schrieb 李真能:

[…]

> *日 期:*2022-03-28 15:38
> *发件人:*Paul Menzel

[…]

> Am 28.03.22 um 09:36 schrieb Paul Menzel:
>   > Dear Zhenneng,
>   >
>   >
>   > Thank you for your patch.
>   >
>   > Am 28.03.22 um 06:05 schrieb Zhenneng Li:
>   >> This is a workaround for s3 hang for r7340(amdgpu).
>   >
>   > Is it hanging when resuming from S3?
> 
> Yes, this func is a delayed work after init graphics card.

Thank for clarifying it.

>   > Maybe also use the line below for
>   > the commit message summary:
>   >
>   > drm/amdgpu: Add 1 ms delay to init handler to fix s3 resume hang
>   >
>   > Also, please add a space before the ( in “r7340(amdgpu)”.
>   >
>   >> When we test s3 with r7340 on arm64 platform, graphics card will hang up,
>   >> the error message are as follows:
>   >> Mar  4 01:14:11 greatwall-GW-XXXXXX-XXX kernel: [    1.599374][ 7] [  T291] amdgpu 0000:02:00.0: fb0: amdgpudrmfb frame buffer device
>   >> Mar  4 01:14:11 greatwall-GW-XXXXXX-XXX kernel: [    1.612869][ 7] [  T291] [drm:amdgpu_device_ip_late_init [amdgpu]] *ERROR* late_init of IP blockfailed -22
>   >> Mar  4 01:14:11 greatwall-GW-XXXXXX-XXX kernel: [    1.623392][ 7] [  T291] amdgpu 0000:02:00.0: amdgpu_device_ip_late_init failed
>   >> Mar  4 01:14:11 greatwall-GW-XXXXXX-XXX kernel: [    1.630696][ 7] [  T291] amdgpu 0000:02:00.0: Fatal error during GPU init
>   >> Mar  4 01:14:11 greatwall-GW-XXXXXX-XXX kernel: [    1.637477][ 7] [  T291] [drm] amdgpu: finishing device.
>   >
>   > The prefix in the beginning is not really needed. Only the stuff after
>   > `kernel: `.
>   >
>   > Maybe also add the output of `lspci -nn -s …` for that r7340 device.
>   >
>   >> Change-Id: I5048b3894c0ca9faf2f4847ddab61f9eb17b4823
>   >
>   > Without the Gerrit instance this belongs to, the Change-Id is of no use
>   > in the public.
>   >
>   >> Signed-off-by: Zhenneng Li
>   >> ---
>   >>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++
>   >>   1 file changed, 2 insertions(+)
>   >>
>   >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>   >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>   >> index 3987ecb24ef4..1eced991b5b2 100644
>   >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>   >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>   >> @@ -2903,6 +2903,8 @@ static void
>   >> amdgpu_device_delayed_init_work_handler(struct work_struct *work)
>   >>           container_of(work, struct amdgpu_device, delayed_init_work.work);
>   >>       int r;
>   >> +    mdelay(1);
>   >> +
>   >
> 
>   > Wow, I wonder how long it took you to find that workaround.
> 
> About 3 months, I try to add this delay
> work(amdgpu_device_delayed_init_work_handler) from 2000ms to 2500ms, or use mb()
> instead of mdelay(1), but it's useless, I don't know the reason,the occurrence
> probability  of this bug is one ten-thousandth, do you know the possible reasons?

Oh, it’s not even always reproducible. That is hard. Did you try another 
graphics card or another ARM board to rule out hardware specific issues?

Sorry, I do not. Maybe the developers with access to non-public 
datasheets and erratas know.

>   >>       r = amdgpu_ib_ring_tests(adev);
>   >>       if (r)
>   >>           DRM_ERROR("ib ring test failed (%d).\n", r);


Kind regards,

Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ