[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <aca00cb25b813da4fd2f215829f02337f05642f3.camel@mailbox.org>
Date: Fri, 11 Apr 2025 14:44:18 +0200
From: Philipp Stanner <phasta@...lbox.org>
To: Christian König <christian.koenig@....com>,
phasta@...nel.org, Lyude Paul <lyude@...hat.com>, Danilo Krummrich
<dakr@...nel.org>, David Airlie <airlied@...il.com>, Simona Vetter
<simona@...ll.ch>, Sabrina Dubroca <sd@...asysnail.net>, Sumit Semwal
<sumit.semwal@...aro.org>
Cc: dri-devel@...ts.freedesktop.org, nouveau@...ts.freedesktop.org,
linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
linux-media@...r.kernel.org, linaro-mm-sig@...ts.linaro.org,
stable@...r.kernel.org
Subject: Re: [PATCH 1/3] drm/nouveau: Prevent signaled fences in pending list
On Fri, 2025-04-11 at 13:05 +0200, Christian König wrote:
> Am 11.04.25 um 11:29 schrieb Philipp Stanner:
>
> > [SNIP]
> >
> > It could be, however, that at the same moment
> > nouveau_fence_signal() is
> > removing that entry, holding the appropriate lock.
> >
> > So we have a race. Again.
> >
>
> Ah, yes of course. If signaled is called with or without the lock is
> actually undetermined.
>
>
> >
> > You see, fixing things in Nouveau is difficult :)
> > It gets more difficult if you want to clean it up "properly", so it
> > conforms to rules such as those from dma_fence.
> >
> > I have now provided two fixes that both work, but you are not
> > satisfied
> > with from the dma_fence-maintainer's perspective. I understand
> > that,
> > but please also understand that it's actually not my primary task
> > to
> > work on Nouveau. I just have to fix this bug to move on with my
> > scheduler work.
> >
>
> Well I'm happy with whatever solution as long as it works, but as
> far as I can see the approach with the callback simply doesn't.
>
> You just can't drop the fence reference for the list from the
> callback.
>
>
> >
> > So if you have another idea, feel free to share it. But I'd like to
> > know how we can go on here.
> >
>
> Well the fence code actually works, doesn't it? The problem is
> rather that setting the error throws a warning because it doesn't
> expect signaled fences on the pending list.
>
> Maybe we should fix that instead.
The fence code works as the author intended, but I would be happy if it
were more explicitly documented.
Regarding the WARN_ON: It occurs in dma_fence_set_error() because there
is an attempt to set an error code on a signaled fence. I don't think
that should be "fixed", it works as intended: You must not set an error
code of a fence that was already signaled.
The reason seems to be that once a fence is signaled, a third party
might evaluate the error code.
But I think this wasn't wat you meant with "fix".
In any case, there must not be signaled fences in nouveau's pending-
list. They must be removed immediately once they signal, and this must
not race.
>
>
> >
> > I'm running out of ideas. What I'm wondering if we couldn't just
> > remove
> > performance hacky fastpath functions such as
> > nouveau_fence_is_signaled() completely. It seems redundant to me.
> >
>
> That would work for me as well.
I'll test this approach. Seems a bit like the nuclear approach, but if
it works we'd at least clean up a lot of this mess.
P.
>
>
> >
> >
> > Or we might add locking to it, but IDK what was achieved with RCU
> > here.
> > In any case it's definitely bad that Nouveau has so many redundant
> > and
> > half-redundant mechanisms.
> >
>
> Yeah, agree messing with the locks even more won't help us here.
>
> Regards,
> Christian.
>
>
> >
> >
> >
> > P.
> >
> >
> > >
> > >
> > > P.
> > >
> > >
> > > >
> > > > Regards,
> > > > Christian.
> > > >
> > > >
> > > > >
> > > > > P.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > Regards,
> > > > > > Christian.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Replace the call to dma_fence_is_signaled() with
> > > > > > > nouveau_fence_base_is_signaled().
> > > > > > >
> > > > > > > Cc: <stable@...r.kernel.org> # 4.10+, precise commit not
> > > > > > > to
> > > > > > > be
> > > > > > > determined
> > > > > > > Signed-off-by: Philipp Stanner <phasta@...nel.org>
> > > > > > > ---
> > > > > > > drivers/gpu/drm/nouveau/nouveau_fence.c | 2 +-
> > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c
> > > > > > > b/drivers/gpu/drm/nouveau/nouveau_fence.c
> > > > > > > index 7cc84472cece..33535987d8ed 100644
> > > > > > > --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
> > > > > > > +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
> > > > > > > @@ -274,7 +274,7 @@ nouveau_fence_done(struct
> > > > > > > nouveau_fence
> > > > > > > *fence)
> > > > > > > nvif_event_block(&fctx->event);
> > > > > > > spin_unlock_irqrestore(&fctx->lock,
> > > > > > > flags);
> > > > > > > }
> > > > > > > - return dma_fence_is_signaled(&fence->base);
> > > > > > > + return test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> > > > > > > &fence-
> > > > > > >
> > > > > > > >
> > > > > > > > base.flags);
> > > > > > > >
> > > > > > >
> > > > > > > }
> > > > > > >
> > > > > > > static long
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
Powered by blists - more mailing lists