lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b4ebdbce2f44c06806a650e72b1b6eb9a16dffe6.camel@icenowy.me>
Date: Mon, 17 Jun 2024 21:03:13 +0800
From: Icenowy Zheng <uwu@...nowy.me>
To: Christian König <christian.koenig@....com>, Alex
 Deucher <alexander.deucher@....com>, Pan Xinhui <Xinhui.Pan@....com>, David
 Airlie <airlied@...il.com>, Daniel Vetter <daniel@...ll.ch>, Pierre-Eric
 Pelloux-Prayer <pierre-eric.pelloux-prayer@....com>, Huacai Chen
 <chenhuacai@...nel.org>, WANG Xuerui <kernel@...0n.name>
Cc: amd-gfx@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org, 
	linux-kernel@...r.kernel.org, loongarch@...ts.linux.dev
Subject: Re: [PATCH 1/2] drm/amdgpu: make duplicated EOP packet for GFX7/8
 have real content

在 2024-06-17星期一的 14:35 +0200,Christian König写道:
> Am 17.06.24 um 12:58 schrieb Icenowy Zheng:
> > The duplication of EOP packets for GFX7/8, with the former one have
> > seq-1 written and the latter one have seq written, seems to confuse
> > some
> > hardware platform (e.g. Loongson 7A series PCIe controllers).
> > 
> > Make the content of the duplicated EOP packet the same with the
> > real
> > one, only masking any possible interrupts.
> 
> Well completely NAK to that, exactly that disables the workaround.
> 
> The CPU needs to see two different values written here.

Why do the CPU need to see two different values here? Only the second
packet will raise an interrupt before and after applying this patch,
and the first packet's result should just be overriden on ordinary
platforms. The CPU won't see the first one, until it's polling for the
address for a very short interval, so short that the GPU CP couldn't
execute 2 commands.

Or do you mean the GPU needs to see two different values being written,
or they will be merged into only one write request?

Please give out more information about this workaround, otherwise the
GPU hang problem on Loongson platforms will persist.

> 
> Regards,
> Christian.
> 
> > 
> > Fixes: bf26da927a1c ("drm/amdgpu: add cache flush workaround to
> > gfx8 emit_fence")
> > Fixes: a2e73f56fa62 ("drm/amdgpu: Add support for CIK parts")
> > Signed-off-by: Icenowy Zheng <uwu@...nowy.me>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 12 +++++-------
> >   drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 12 ++++--------
> >   2 files changed, 9 insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> > b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> > index 541dbd70d8c75..778f27f1a34fe 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> > @@ -2117,9 +2117,8 @@ static void
> > gfx_v7_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 addr,
> >   {
> >         bool write64bit = flags & AMDGPU_FENCE_FLAG_64BIT;
> >         bool int_sel = flags & AMDGPU_FENCE_FLAG_INT;
> > -       /* Workaround for cache flush problems. First send a dummy
> > EOP
> > -        * event down the pipe with seq one below.
> > -        */
> > +
> > +       /* Workaround for cache flush problems, send EOP twice. */
> >         amdgpu_ring_write(ring, PACKET3(PACKET3_EVENT_WRITE_EOP,
> > 4));
> >         amdgpu_ring_write(ring, (EOP_TCL1_ACTION_EN |
> >                                  EOP_TC_ACTION_EN |
> > @@ -2127,11 +2126,10 @@ static void
> > gfx_v7_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 addr,
> >                                  EVENT_INDEX(5)));
> >         amdgpu_ring_write(ring, addr & 0xfffffffc);
> >         amdgpu_ring_write(ring, (upper_32_bits(addr) & 0xffff) |
> > -                               DATA_SEL(1) | INT_SEL(0));
> > -       amdgpu_ring_write(ring, lower_32_bits(seq - 1));
> > -       amdgpu_ring_write(ring, upper_32_bits(seq - 1));
> > +                               DATA_SEL(write64bit ? 2 : 1) |
> > INT_SEL(0));
> > +       amdgpu_ring_write(ring, lower_32_bits(seq));
> > +       amdgpu_ring_write(ring, upper_32_bits(seq));
> >   
> > -       /* Then send the real EOP event down the pipe. */
> >         amdgpu_ring_write(ring, PACKET3(PACKET3_EVENT_WRITE_EOP,
> > 4));
> >         amdgpu_ring_write(ring, (EOP_TCL1_ACTION_EN |
> >                                  EOP_TC_ACTION_EN |
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> > b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> > index 2f0e72caee1af..39a7d60f1fd69 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> > @@ -6153,9 +6153,7 @@ static void
> > gfx_v8_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 addr,
> >         bool write64bit = flags & AMDGPU_FENCE_FLAG_64BIT;
> >         bool int_sel = flags & AMDGPU_FENCE_FLAG_INT;
> >   
> > -       /* Workaround for cache flush problems. First send a dummy
> > EOP
> > -        * event down the pipe with seq one below.
> > -        */
> > +       /* Workaround for cache flush problems, send EOP twice. */
> >         amdgpu_ring_write(ring, PACKET3(PACKET3_EVENT_WRITE_EOP,
> > 4));
> >         amdgpu_ring_write(ring, (EOP_TCL1_ACTION_EN |
> >                                  EOP_TC_ACTION_EN |
> > @@ -6164,12 +6162,10 @@ static void
> > gfx_v8_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 addr,
> >                                  EVENT_INDEX(5)));
> >         amdgpu_ring_write(ring, addr & 0xfffffffc);
> >         amdgpu_ring_write(ring, (upper_32_bits(addr) & 0xffff) |
> > -                               DATA_SEL(1) | INT_SEL(0));
> > -       amdgpu_ring_write(ring, lower_32_bits(seq - 1));
> > -       amdgpu_ring_write(ring, upper_32_bits(seq - 1));
> > +                         DATA_SEL(write64bit ? 2 : 1) |
> > INT_SEL(0));
> > +       amdgpu_ring_write(ring, lower_32_bits(seq));
> > +       amdgpu_ring_write(ring, upper_32_bits(seq));
> >   
> > -       /* Then send the real EOP event down the pipe:
> > -        * EVENT_WRITE_EOP - flush caches, send int */
> >         amdgpu_ring_write(ring, PACKET3(PACKET3_EVENT_WRITE_EOP,
> > 4));
> >         amdgpu_ring_write(ring, (EOP_TCL1_ACTION_EN |
> >                                  EOP_TC_ACTION_EN |
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ