[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251219215344.170852-2-lyude@redhat.com>
Date: Fri, 19 Dec 2025 16:52:02 -0500
From: Lyude Paul <lyude@...hat.com>
To: dri-devel@...ts.freedesktop.org,
nouveau@...ts.freedesktop.org,
linux-kernel@...r.kernel.org
Cc: stable@...r.kernel.org,
"Timur Tabi" <ttabi@...dia.com>,
"Dave Airlie" <airlied@...hat.com>,
"Maarten Lankhorst" <maarten.lankhorst@...ux.intel.com>,
"Ben Skeggs" <bskeggs@...dia.com>,
"Simona Vetter" <simona@...ll.ch>,
"Ben Skeggs" <bskeggs@...hat.com>,
"David Airlie" <airlied@...il.com>,
"Thomas Zimmermann" <tzimmermann@...e.de>,
"Maxime Ripard" <mripard@...nel.org>,
"Danilo Krummrich" <dakr@...nel.org>,
"Lyude Paul" <lyude@...hat.com>
Subject: [PATCH 1/2] drm/nouveau/disp/nv50-: Set lock_core in curs507a_prepare
For a while, I've been seeing a strange issue where some (usually not all)
of the display DMA channels will suddenly hang, particularly when there is
a visible cursor on the screen that is being frequently updated, and
especially when said cursor happens to go between two screens. While this
brings back lovely memories of fixing Intel Skylake bugs, I would quite
like to fix it :).
It turns out the problem that's happening here is that we're managing to
reach nv50_head_flush_set() in our atomic commit path without actually
holding nv50_disp->mutex. This means that cursor updates happening in
parallel (along with any other atomic updates that need to use the core
channel) will race with eachother, which eventually causes us to corrupt
the pushbuffer - leading to a plethora of various GSP errors, usually:
nouveau 0000:c1:00.0: gsp: Xid:56 CMDre 00000000 00000218 00102680 00000004 00800003
nouveau 0000:c1:00.0: gsp: Xid:56 CMDre 00000000 0000021c 00040509 00000004 00000001
nouveau 0000:c1:00.0: gsp: Xid:56 CMDre 00000000 00000000 00000000 00000001 00000001
The reason this is happening is because generally we check whether we need
to set nv50_atom->lock_core at the end of nv50_head_atomic_check().
However, curs507a_prepare is called from the fb_prepare callback, which
happens after the atomic check phase. As a result, this can lead to commits
that both touch the core channel but also don't grab nv50_disp->mutex.
So, fix this by making sure that we set nv50_atom->lock_core in
cus507a_prepare().
Signed-off-by: Lyude Paul <lyude@...hat.com>
Fixes: 1590700d94ac ("drm/nouveau/kms/nv50-: split each resource type into their own source files")
Cc: <stable@...r.kernel.org> # v4.18+
---
drivers/gpu/drm/nouveau/dispnv50/curs507a.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/nouveau/dispnv50/curs507a.c b/drivers/gpu/drm/nouveau/dispnv50/curs507a.c
index a95ee5dcc2e39..1a889139cb053 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/curs507a.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/curs507a.c
@@ -84,6 +84,7 @@ curs507a_prepare(struct nv50_wndw *wndw, struct nv50_head_atom *asyh,
asyh->curs.handle = handle;
asyh->curs.offset = offset;
asyh->set.curs = asyh->curs.visible;
+ nv50_atom(asyh->state.state)->lock_core = true;
}
}
--
2.52.0
Powered by blists - more mailing lists