[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250410092418.135258-2-phasta@kernel.org>
Date: Thu, 10 Apr 2025 11:24:16 +0200
From: Philipp Stanner <phasta@...nel.org>
To: Lyude Paul <lyude@...hat.com>,
Danilo Krummrich <dakr@...nel.org>,
David Airlie <airlied@...il.com>,
Simona Vetter <simona@...ll.ch>,
Sabrina Dubroca <sd@...asysnail.net>,
Sumit Semwal <sumit.semwal@...aro.org>,
Christian König <christian.koenig@....com>
Cc: dri-devel@...ts.freedesktop.org,
nouveau@...ts.freedesktop.org,
linux-kernel@...r.kernel.org,
netdev@...r.kernel.org,
linux-media@...r.kernel.org,
linaro-mm-sig@...ts.linaro.org,
Philipp Stanner <phasta@...nel.org>
Subject: [PATCH 0/3] drm/nouveau: Fix & improve nouveau_fence_done()
Contains two patches improving nouveau_fence_done(), and one addressing
an actual bug (race):
[ 39.848463] WARNING: CPU: 21 PID: 1734 at drivers/gpu/drm/nouveau/nouveau_fence.c:509 nouveau_fence_no_signaling+0xac/0xd0 [nouveau]
[ 39.848551] Modules linked in: snd_seq_dummy snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_ine
t nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables qrtr sunrpc snd_sof_pci_intel_
tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic snd_sof_pci snd_sof_xtensa_dsp snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda snd_sof snd_sof_utils snd
_soc_acpi_intel_match snd_soc_acpi snd_soc_acpi_intel_sdca_quirks snd_sof_intel_hda_mlink snd_soc_sdca snd_soc_avs snd_ctl_led snd_soc_hda_codec intel_rapl_msr snd_hda_
codec_realtek snd_hda_ext_core intel_rapl_common snd_hda_codec_generic snd_soc_core snd_hda_scodec_component intel_uncore_frequency intel_uncore_frequency_common snd_hd
a_codec_hdmi intel_ifs snd_compress i10nm_edac skx_edac_common nfit snd_hda_intel snd_intel_dspcfg libnvdimm snd_hda_codec binfmt_misc snd_hwdep snd_hda_core snd_seq sn
d_seq_device dell_wmi
[ 39.848575] dell_pc x86_pkg_temp_thermal spi_nor platform_profile sparse_keymap intel_powerclamp dax_hmem snd_pcm cxl_acpi coretemp cxl_port iTCO_wdt mtd rapl intel
_pmc_bxt pmt_telemetry cxl_core dell_wmi_sysman pmt_class iTCO_vendor_support snd_timer isst_if_mmio vfat intel_cstate dell_smbios dcdbas fat dell_wmi_ddv dell_smm_hwmo
n dell_wmi_descriptor firmware_attributes_class wmi_bmof intel_uncore einj pcspkr isst_if_mbox_pci atlantic snd isst_if_common intel_vsec e1000e macsec mei_me i2c_i801
spi_intel_pci soundcore i2c_smbus spi_intel mei joydev loop nfnetlink zram nouveau drm_ttm_helper ttm polyval_clmulni iaa_crypto gpu_sched polyval_generic rtsx_pci_sdmm
c ghash_clmulni_intel i2c_algo_bit mmc_core drm_gpuvm sha512_ssse3 nvme drm_exec drm_display_helper sha256_ssse3 idxd sha1_ssse3 cec nvme_core idxd_bus rtsx_pci nvme_au
th pinctrl_alderlake ip6_tables ip_tables fuse
[ 39.848603] CPU: 21 UID: 42 PID: 1734 Comm: gnome-shell Tainted: G W 6.14.0-rc4+ #11
[ 39.848605] Tainted: [W]=WARN
[ 39.848606] Hardware name: Dell Inc. Precision 7960 Tower/01G0M6, BIOS 2.7.0 12/17/2024
[ 39.848607] RIP: 0010:nouveau_fence_no_signaling+0xac/0xd0 [nouveau]
[ 39.848688] Code: db 74 17 48 8d 7b 38 b8 ff ff ff ff f0 0f c1 43 38 83 f8 01 74 29 85 c0 7e 17 31 c0 5b 5d c3 cc cc cc cc e8 76 b2 c5 f0 eb 96 <0f> 0b e9 67 ff ff f
f be 03 00 00 00 e8 83 76 33 f1 31 c0 eb dd e8
[ 39.848690] RSP: 0018:ff1cc1ffc5c039f0 EFLAGS: 00010046
[ 39.848691] RAX: 0000000000000001 RBX: ff175a3b504da980 RCX: ff175a3b4801e008
[ 39.848692] RDX: ff175a3b43e7bad0 RSI: ffffffffc09d3fda RDI: ff175a3b504da980
[ 39.848693] RBP: ff175a3b504da9c0 R08: ffffffffc09e39df R09: 0000000000000001
[ 39.848694] R10: 0000000000000001 R11: 0000000000000000 R12: ff175a3b6d97de00
[ 39.848695] R13: 0000000000000246 R14: ff1cc1ffc5c03c60 R15: 0000000000000001
[ 39.848696] FS: 00007fc5477846c0(0000) GS:ff175a5a50280000(0000) knlGS:0000000000000000
[ 39.848698] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 39.848699] CR2: 000055cb7613d1a8 CR3: 000000012e5ce004 CR4: 0000000000f71ef0
[ 39.848700] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 39.848701] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 39.848702] PKRU: 55555554
[ 39.848703] Call Trace:
[ 39.848704] <TASK>
[ 39.848705] ? nouveau_fence_no_signaling+0xac/0xd0 [nouveau]
[ 39.848782] ? __warn.cold+0x93/0xfa
[ 39.848785] ? nouveau_fence_no_signaling+0xac/0xd0 [nouveau]
[ 39.848861] ? report_bug+0xff/0x140
[ 39.848863] ? handle_bug+0x58/0x90
[ 39.848865] ? exc_invalid_op+0x17/0x70
[ 39.848866] ? asm_exc_invalid_op+0x1a/0x20
[ 39.848870] ? nouveau_fence_no_signaling+0xac/0xd0 [nouveau]
[ 39.848943] nouveau_fence_enable_signaling+0x32/0x80 [nouveau]
[ 39.849016] ? __pfx_nouveau_fence_cleanup_cb+0x10/0x10 [nouveau]
[ 39.849088] __dma_fence_enable_signaling+0x33/0xc0
[ 39.849090] dma_fence_add_callback+0x4b/0xd0
[ 39.849093] nouveau_fence_emit+0xa3/0x260 [nouveau]
[ 39.849166] nouveau_fence_new+0x7d/0xf0 [nouveau]
[ 39.849242] nouveau_gem_ioctl_pushbuf+0xe8f/0x1300 [nouveau]
[ 39.849338] ? __pfx_nouveau_gem_ioctl_pushbuf+0x10/0x10 [nouveau]
[ 39.849431] drm_ioctl_kernel+0xad/0x100
[ 39.849433] drm_ioctl+0x288/0x550
[ 39.849435] ? __pfx_nouveau_gem_ioctl_pushbuf+0x10/0x10 [nouveau]
[ 39.849526] nouveau_drm_ioctl+0x57/0xb0 [nouveau]
[ 39.849620] __x64_sys_ioctl+0x94/0xc0
[ 39.849621] do_syscall_64+0x82/0x160
[ 39.849623] ? drm_ioctl+0x2b7/0x550
[ 39.849625] ? __pfx_nouveau_gem_ioctl_pushbuf+0x10/0x10 [nouveau]
[ 39.849719] ? ktime_get_mono_fast_ns+0x38/0xd0
[ 39.849721] ? __pm_runtime_suspend+0x69/0xc0
[ 39.849724] ? syscall_exit_to_user_mode_prepare+0x15e/0x1a0
[ 39.849726] ? syscall_exit_to_user_mode+0x10/0x200
[ 39.849729] ? do_syscall_64+0x8e/0x160
[ 39.849730] ? exc_page_fault+0x7e/0x1a0
[ 39.849733] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 39.849735] RIP: 0033:0x7fc5576fe0ad
[ 39.849736] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
[ 39.849737] RSP: 002b:00007ffc002688a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 39.849739] RAX: ffffffffffffffda RBX: 000055cb74e316c0 RCX: 00007fc5576fe0ad
[ 39.849740] RDX: 00007ffc00268960 RSI: 00000000c0406481 RDI: 000000000000000e
[ 39.849741] RBP: 00007ffc002688f0 R08: 0000000000000000 R09: 000055cb74e35560
[ 39.849742] R10: 0000000000000014 R11: 0000000000000246 R12: 00007ffc00268960
[ 39.849744] R13: 00000000c0406481 R14: 000000000000000e R15: 000055cb74e3cd10
[ 39.849746] </TASK>
[ 39.849746] ---[ end trace 0000000000000000 ]---
[ 39.849776] ------------[ cut here ]------------
This is the first WARN_ON() in dma_fence_set_error(), called by
nouveau_fence_context_kill().
It's rare, but it is a bug, or rather: the archetype of a race, since
(as Christian pointed out) nouveau_fence_update() later at some point
will remove the signaled fence (by signaling it again).
P.
Philipp Stanner (3):
drm/nouveau: Prevent signaled fences in pending list
drm/nouveau: Remove surplus if-branch
drm/nouveau: Add helper to check base fence
drivers/gpu/drm/nouveau/nouveau_fence.c | 32 ++++++++++++++-----------
1 file changed, 18 insertions(+), 14 deletions(-)
--
2.48.1
Powered by blists - more mailing lists