[<prev] [next>] [day] [month] [year] [list]
Message-ID: <e4dcf5497e3840706b2e13d10c7efff1819c14a6.camel@decadent.org.uk>
Date: Wed, 21 Jan 2026 13:39:27 +0100
From: Ben Hutchings <ben@...adent.org.uk>
To: Salvatore Bonaccorso <carnil@...ian.org>, linux-net-drivers@....com,
Edward Cree <ecree.xilinx@...il.com>
Cc: 1126015@...s.debian.org, Damir Mansurov <damir.mansurov@...etlabs.ru>,
netdev <netdev@...r.kernel.org>
Subject: Re: Bug#1126015: linux-image-6.17.13+deb14-rt-amd64: ethtool -x
<sfc-net-driver-ifname> causes: rtmutex deadlock detected
Control: tag -1 - moreinfo
Control: tag -1 upstream
On Wed, 2026-01-21 at 06:01 +0100, Salvatore Bonaccorso wrote:
> Control: severity -1 important
> Control: tags -1 + moreinfo
>
> Hi Damir,
>
> On Tue, Jan 20, 2026 at 03:12:12PM +0300, Damir Mansurov wrote:
> > Package: src:linux
> > Version: 6.17.13-1
> > Severity: critical
> > Justification: breaks the whole system
> > X-Debbugs-Cc: debian-amd64@...ts.debian.org, damir.mansurov@...etlabs.ru
> > User: debian-amd64@...ts.debian.org
> > Usertags: amd64
> >
> > I am trying to read the receive flow hash indirection table from a NIC serviced by the sfc driver and I get a hung system.
> >
> > $ ethtool -x enp1s0f0np0
> > Jan 20 14:30:30 kernel: ------------[ cut here ]------------
> > Jan 20 14:30:30 kernel: rtmutex deadlock detected
> > Jan 20 14:30:30 kernel: WARNING: CPU: 2 PID: 1194 at kernel/locking/rtmutex.c:1674 __rt_mutex_slowlock_locked.constprop.0+0x1e8/0x220
> > Jan 20 14:30:30 kernel: Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace netfs 8021q garp stp llc mrp binfmt_misc nls_ascii nls_cp437 vfat fat intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm jc42 dell_smbios ppdev dell_wmi_descriptor platform_profile dcdbas irqbypass ghash_clmulni_intel mgag200 aesni_intel drm_client_lib rapl drm_shmem_helper at24 drm_kms_helper intel_cstate intel_uncore vga16fb i2c_algo_bit vgastate pcspkr acpi_cpufreq parport_pc parport intel_vbtn sparse_keymap joydev ipmi_ssif evdev button ie31200_edac sg onload(OE) acpi_ipmi ipmi_si ipmi_watchdog sfc_resource(OE) ipmi_devintf ipmi_msghandler drm efi_pstore configfs auth_rpcgss sunrpc nfnetlink autofs4 ext4 crc16 mbcache jbd2 crc32c_cryptoapi hid_generic usbhid hid dm_mod sd_mod iTCO_wdt ahci intel_pmc_bxt iTCO_vendor_support watchdog tg3 libahci xhci_pci ehci_pci video sfc xhci_hcd libphy ehci_hcd wmi battery libata usbcore mdio_bus mtd scsi_mod i2c_i801 fan usb_common i2c_smbus scsi_common
> > Jan 20 14:30:30 kernel: lpc_ich
> > Jan 20 14:30:30 kernel: CPU: 2 UID: 1100 PID: 1194 Comm: ethtool Tainted: G W OE 6.17.13+deb14-rt-amd64 #1 PREEMPT_{RT,(full)} Debian 6.17.13-1
> > Jan 20 14:30:30 kernel: Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> > Jan 20 14:30:30 kernel: Hardware name: Dell Inc. PowerEdge R220/05Y15N, BIOS 1.4.0 10/23/2014
> > Jan 20 14:30:30 kernel: RIP: 0010:__rt_mutex_slowlock_locked.constprop.0+0x1e8/0x220
> > Jan 20 14:30:30 kernel: Code: 00 4c 89 e6 48 89 ef e8 b6 35 ca 00 41 83 fd dd 0f 85 d7 fe ff ff 48 89 ef e8 f4 9c ca 00 48 c7 c7 6c 51 5d a9 e8 a8 00 f6 ff <0f> 0b 66 90 b8 01 00 00 00 87 43 18 e8 c7 3e fb ff eb ef bf 01 00
> > Jan 20 14:30:30 kernel: RSP: 0018:ffffcdc24208f5a0 EFLAGS: 00010246
> > Jan 20 14:30:30 kernel: RAX: 0000000000000000 RBX: ffff8b14d3888000 RCX: 0000000000000027
> > Jan 20 14:30:30 kernel: RDX: ffff8b1617f1ce88 RSI: 0000000000000001 RDI: ffff8b1617f1ce80
> > Jan 20 14:30:30 kernel: RBP: ffff8b14ca3d9150 R08: 0000000000000000 R09: ffffcdc24208f378
> > Jan 20 14:30:30 kernel: R10: ffffffffa9ee3e48 R11: 00000000ffffefff R12: ffffcdc24208f5a0
> > Jan 20 14:30:30 kernel: R13: 00000000ffffffdd R14: ffffcdc24208f648 R15: ffffffffa9168880
> > Jan 20 14:30:30 kernel: FS: 00007f1bb6609b80(0000) GS:ffff8b166d501000(0000) knlGS:0000000000000000
> > Jan 20 14:30:30 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > Jan 20 14:30:30 kernel: CR2: 000055f2ef901700 CR3: 000000013b2c4001 CR4: 00000000001726f0
> > Jan 20 14:30:30 kernel: Call Trace:
> > Jan 20 14:30:30 kernel: <TASK>
> > Jan 20 14:30:30 kernel: rt_mutex_slowlock.constprop.0+0x4d/0xc0
> > Jan 20 14:30:30 kernel: efx_mcdi_rx_pull_rss_config+0x28/0x60 [sfc]
> > Jan 20 14:30:30 kernel: efx_ethtool_get_rxfh+0x3b/0xd0 [sfc]
> > Jan 20 14:30:30 kernel: rss_prepare.isra.0+0x1c9/0x330
[...]
>
> Futhermore the kernel is tained with OOT module, afaics this might be
> sfc_resource. That is you still would need to check if the problem is
> trigerable without loading OOT module.
sfc_resource isn't involved in this (and nor is RT). The deadlock seems
to be quite straightforward:
1. In net/ethtool/rss.c, rss_prepare_get() locks the net device's
rss_lock and calls its driver's get_rxfh operation
2. In drivers/net/ethernet/sfc/ethtool_common.c, efx_ethtool_get_rxfh()
calls the chip's rx_pull_rss_config operation
3. In drivers/net/ethernet/sfc/mcdi_filters.c,
efx_mcdi_rx_pull_rss_config() locks the net device's rss_lock
Step 3 seems to be a workaround for missing locking in the ethtool core.
Since that locking was added to the ethtool core in 6.17, it needs to be
removed from the sfc driver from that version onward.
Ben.
--
Ben Hutchings
I'm always amazed by the number of people who take up solipsism because
they heard someone else explain it. - E*Borg on alt.fan.pratchett
Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists