[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <73597116d4f004c5f75cf4f13da1af405ea8da8b.camel@icenowy.me>
Date: Tue, 18 Jun 2024 00:09:42 +0800
From: Icenowy Zheng <uwu@...nowy.me>
To: Christian König <christian.koenig@....com>, Alex
Deucher <alexander.deucher@....com>, Pan Xinhui <Xinhui.Pan@....com>, David
Airlie <airlied@...il.com>, Daniel Vetter <daniel@...ll.ch>, Pierre-Eric
Pelloux-Prayer <pierre-eric.pelloux-prayer@....com>, Huacai Chen
<chenhuacai@...nel.org>, WANG Xuerui <kernel@...0n.name>
Cc: amd-gfx@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org,
linux-kernel@...r.kernel.org, loongarch@...ts.linux.dev
Subject: Re: [PATCH 1/2] drm/amdgpu: make duplicated EOP packet for GFX7/8
have real content
在 2024-06-17星期一的 15:59 +0200,Christian König写道:
> Am 17.06.24 um 15:43 schrieb Icenowy Zheng:
> > 在 2024-06-17星期一的 15:09 +0200,Christian König写道:
> > > Am 17.06.24 um 15:03 schrieb Icenowy Zheng:
> > > > 在 2024-06-17星期一的 14:35 +0200,Christian König写道:
> > > > > Am 17.06.24 um 12:58 schrieb Icenowy Zheng:
> > > > > > The duplication of EOP packets for GFX7/8, with the former
> > > > > > one
> > > > > > have
> > > > > > seq-1 written and the latter one have seq written, seems to
> > > > > > confuse
> > > > > > some
> > > > > > hardware platform (e.g. Loongson 7A series PCIe
> > > > > > controllers).
> > > > > >
> > > > > > Make the content of the duplicated EOP packet the same with
> > > > > > the
> > > > > > real
> > > > > > one, only masking any possible interrupts.
> > > > > Well completely NAK to that, exactly that disables the
> > > > > workaround.
> > > > >
> > > > > The CPU needs to see two different values written here.
> > > > Why do the CPU need to see two different values here? Only the
> > > > second
> > > > packet will raise an interrupt before and after applying this
> > > > patch,
> > > > and the first packet's result should just be overriden on
> > > > ordinary
> > > > platforms. The CPU won't see the first one, until it's polling
> > > > for
> > > > the
> > > > address for a very short interval, so short that the GPU CP
> > > > couldn't
> > > > execute 2 commands.
> > > Yes exactly that. We need to make two writes, one with the old
> > > value
> > > (seq - 1) and a second with the real value (seq).
> > >
> > > Otherwise it is possible that a polling CPU would see the
> > > sequence
> > > before the second EOP is issued with results in incoherent view
> > > of
> > > memory.
> > In this case shouldn't we write seq-1 before any work, and then
> > write
> > seq after work, like what is done in Mesa?
>
> No. This hw workaround requires that two consecutive write operations
> happen directly behind each other on the PCIe bus with two different
> values.
>
> To make the software logic around that work without any changes we
> use
> the values seq - 1 and seq because those are guaranteed to be
> different
> and not trigger any unwanted software behavior.
>
> Only then we can guarantee that we have a coherent view of system
> memory.
BTW is there any operation that could be taken to examine this specific
workaround?
Is there any case possible to reproduce?
>
> > As what I see, Mesa uses another command buffer to emit a
> > EVENT_WRITE_EOP writing 0, and commit this command buffer before
> > the
> > real command buffer.
> >
> > > > Or do you mean the GPU needs to see two different values being
> > > > written,
> > > > or they will be merged into only one write request?
> > > >
> > > > Please give out more information about this workaround,
> > > > otherwise
> > > > the
> > > > GPU hang problem on Loongson platforms will persist.
> > > Well if Loongson can't handle two consecutive write operations to
> > > the
> > > same address with different values then you have a massive
> > > platform
> > > bug.
> > I think the issue is triggered when two consecutive write
> > operations
> > and one IRQ is present, which is exactly the case of this function.
>
> Well then you have a massive platform bug.
>
> Two consecutive writes to the same bus address are perfectly legal
> from
> the PCIe specification and can happen all the time, even without this
> specific hw workaround.
>
> Regards,
> Christian.
>
> >
> > > That is something which can happen all the time throughout the
> > > operation.
> > >
> > > Regards,
> > > Christian.
> > >
> > > > > Regards,
> > > > > Christian.
> > > > >
> > > > > > Fixes: bf26da927a1c ("drm/amdgpu: add cache flush
> > > > > > workaround to
> > > > > > gfx8 emit_fence")
> > > > > > Fixes: a2e73f56fa62 ("drm/amdgpu: Add support for CIK
> > > > > > parts")
> > > > > > Signed-off-by: Icenowy Zheng <uwu@...nowy.me>
> > > > > > ---
> > > > > > drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 12 +++++-------
> > > > > > drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 12 ++++--------
> > > > > > 2 files changed, 9 insertions(+), 15 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> > > > > > b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> > > > > > index 541dbd70d8c75..778f27f1a34fe 100644
> > > > > > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> > > > > > @@ -2117,9 +2117,8 @@ static void
> > > > > > gfx_v7_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64
> > > > > > addr,
> > > > > > {
> > > > > > bool write64bit = flags &
> > > > > > AMDGPU_FENCE_FLAG_64BIT;
> > > > > > bool int_sel = flags & AMDGPU_FENCE_FLAG_INT;
> > > > > > - /* Workaround for cache flush problems. First send
> > > > > > a
> > > > > > dummy
> > > > > > EOP
> > > > > > - * event down the pipe with seq one below.
> > > > > > - */
> > > > > > +
> > > > > > + /* Workaround for cache flush problems, send EOP
> > > > > > twice.
> > > > > > */
> > > > > > amdgpu_ring_write(ring,
> > > > > > PACKET3(PACKET3_EVENT_WRITE_EOP,
> > > > > > 4));
> > > > > > amdgpu_ring_write(ring, (EOP_TCL1_ACTION_EN |
> > > > > > EOP_TC_ACTION_EN |
> > > > > > @@ -2127,11 +2126,10 @@ static void
> > > > > > gfx_v7_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64
> > > > > > addr,
> > > > > > EVENT_INDEX(5)));
> > > > > > amdgpu_ring_write(ring, addr & 0xfffffffc);
> > > > > > amdgpu_ring_write(ring, (upper_32_bits(addr) &
> > > > > > 0xffff)
> > > > > > >
> > > > > > - DATA_SEL(1) | INT_SEL(0));
> > > > > > - amdgpu_ring_write(ring, lower_32_bits(seq - 1));
> > > > > > - amdgpu_ring_write(ring, upper_32_bits(seq - 1));
> > > > > > + DATA_SEL(write64bit ? 2 :
> > > > > > 1) |
> > > > > > INT_SEL(0));
> > > > > > + amdgpu_ring_write(ring, lower_32_bits(seq));
> > > > > > + amdgpu_ring_write(ring, upper_32_bits(seq));
> > > > > >
> > > > > > - /* Then send the real EOP event down the pipe. */
> > > > > > amdgpu_ring_write(ring,
> > > > > > PACKET3(PACKET3_EVENT_WRITE_EOP,
> > > > > > 4));
> > > > > > amdgpu_ring_write(ring, (EOP_TCL1_ACTION_EN |
> > > > > > EOP_TC_ACTION_EN |
> > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> > > > > > b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> > > > > > index 2f0e72caee1af..39a7d60f1fd69 100644
> > > > > > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> > > > > > @@ -6153,9 +6153,7 @@ static void
> > > > > > gfx_v8_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64
> > > > > > addr,
> > > > > > bool write64bit = flags &
> > > > > > AMDGPU_FENCE_FLAG_64BIT;
> > > > > > bool int_sel = flags & AMDGPU_FENCE_FLAG_INT;
> > > > > >
> > > > > > - /* Workaround for cache flush problems. First send
> > > > > > a
> > > > > > dummy
> > > > > > EOP
> > > > > > - * event down the pipe with seq one below.
> > > > > > - */
> > > > > > + /* Workaround for cache flush problems, send EOP
> > > > > > twice.
> > > > > > */
> > > > > > amdgpu_ring_write(ring,
> > > > > > PACKET3(PACKET3_EVENT_WRITE_EOP,
> > > > > > 4));
> > > > > > amdgpu_ring_write(ring, (EOP_TCL1_ACTION_EN |
> > > > > > EOP_TC_ACTION_EN |
> > > > > > @@ -6164,12 +6162,10 @@ static void
> > > > > > gfx_v8_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64
> > > > > > addr,
> > > > > > EVENT_INDEX(5)));
> > > > > > amdgpu_ring_write(ring, addr & 0xfffffffc);
> > > > > > amdgpu_ring_write(ring, (upper_32_bits(addr) &
> > > > > > 0xffff)
> > > > > > >
> > > > > > - DATA_SEL(1) | INT_SEL(0));
> > > > > > - amdgpu_ring_write(ring, lower_32_bits(seq - 1));
> > > > > > - amdgpu_ring_write(ring, upper_32_bits(seq - 1));
> > > > > > + DATA_SEL(write64bit ? 2 : 1) |
> > > > > > INT_SEL(0));
> > > > > > + amdgpu_ring_write(ring, lower_32_bits(seq));
> > > > > > + amdgpu_ring_write(ring, upper_32_bits(seq));
> > > > > >
> > > > > > - /* Then send the real EOP event down the pipe:
> > > > > > - * EVENT_WRITE_EOP - flush caches, send int */
> > > > > > amdgpu_ring_write(ring,
> > > > > > PACKET3(PACKET3_EVENT_WRITE_EOP,
> > > > > > 4));
> > > > > > amdgpu_ring_write(ring, (EOP_TCL1_ACTION_EN |
> > > > > > EOP_TC_ACTION_EN |
>
Powered by blists - more mailing lists