[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPpAL=zn7ZQ_bVBML5no3ifkBNgd2d-uhx5n0RUTn-DXWyPxKQ@mail.gmail.com>
Date: Mon, 15 Sep 2025 18:50:15 +0800
From: Lei Yang <leiyang@...hat.com>
To: Breno Leitao <leitao@...ian.org>
Cc: Andrew Lunn <andrew@...n.ch>, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>, kuba@...nel.org,
Simon Horman <horms@...nel.org>, "Michael S. Tsirkin" <mst@...hat.com>, Jason Wang <jasowang@...hat.com>,
Xuan Zhuo <xuanzhuo@...ux.alibaba.com>, Eugenio Pérez <eperezma@...hat.com>,
Andrew Lunn <andrew+netdev@...n.ch>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, virtualization@...ts.linux.dev,
kernel-team@...a.com
Subject: Re: [PATCH net-next v2 0/7] net: ethtool: add dedicated GRXRINGS
driver callbacks
Hi Breno
This series of patches introduced a kernel panic bug. The tests are
based on the linux-next commit [1]. I tried it a few times and found
that if I didn't apply the current patch, the issue wouldn't be
triggered. After applying the current patch, the probability of
triggering the issue was 3/3.
Reproduced steps:
1. git clone https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
2. applied this series of patches
3. compile and install
4. reboot server(A kernel panic occurs at this step)
[1] commit 590b221ed4256fd6c34d3dea77aa5bd6e741bbc1 (tag:
next-20250912, origin/master, origin/HEAD)
Author: Stephen Rothwell <sfr@...b.auug.org.au>
Date: Fri Sep 12 15:15:12 2025 +1000
Add linux-next specific files for 20250912
Signed-off-by: Stephen Rothwell <sfr@...b.auug.org.au>
kernel panic messages:
[ 13.769667] systemd[1]: bpf-restrict-fs: LSM BPF program attached
[ 13.840778] systemd-rc-local-generator[1084]: /etc/rc.d/rc.local is
not marked executable, skipping.
[ 13.892736] slab kmalloc-64 start ffff8b3784459940 pointer offset 40 size 64
[ 13.899909] list_add corruption. prev->next should be next
(ffffffffb5a91608), but was dead000000000100. (prev=ffff8b3784459968).
[ 13.911570] ------------[ cut here ]------------
[ 13.916185] kernel BUG at lib/list_debug.c:32!
[ 13.920637] Oops: invalid opcode: 0000 [#1] SMP NOPTI
[ 13.925708] CPU: 26 UID: 0 PID: 325 Comm: kworker/26:1 Tainted: G S
E 6.17.0-rc5-next-20250912+ #2 PREEMPT(voluntary)
[ 13.937692] Tainted: [S]=CPU_OUT_OF_SPEC, [E]=UNSIGNED_MODULE
[ 13.943437] Hardware name: Dell Inc. PowerEdge R740xd/01YM03, BIOS
2.2.11 06/13/2019
[ 13.951176] Workqueue: cgroup_free css_free_rwork_fn
[ 13.956150] RIP: 0010:__list_add_valid_or_report+0x94/0xa0
[ 13.961635] Code: cf 88 ff 0f 0b 48 89 f7 48 89 34 24 e8 45 ba c7
ff 48 8b 34 24 48 c7 c7 e8 fe e7 b4 48 8b 16 48 89 f1 48 89 de e8 8c
cf 88 ff <0f> 0b 90 66 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90
90 90
[ 13.980381] RSP: 0018:ffffcdcb07397dc0 EFLAGS: 00010246
[ 13.985605] RAX: 0000000000000075 RBX: ffffffffb5a91608 RCX: 0000000000000003
[ 13.992740] RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffffffffb59e45c8
[ 13.999870] RBP: ffff8b3008a6e5b0 R08: 0000000000000000 R09: ffffcdcb07397c48
[ 14.007001] R10: ffffffffb5924588 R11: 0000000000000003 R12: ffffffffb5a91600
[ 14.014135] R13: ffff8b3784459968 R14: ffff8b3008a6e040 R15: ffff8b3004e5b468
[ 14.021267] FS: 0000000000000000(0000) GS:ffff8b37a9be9000(0000)
knlGS:0000000000000000
[ 14.029352] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 14.035098] CR2: 00007fb1ef5862f4 CR3: 0000000ed3a24006 CR4: 00000000007706f0
[ 14.042229] PKRU: 55555554
[ 14.044942] Call Trace:
[ 14.047395] <TASK>
[ 14.049501] mem_cgroup_css_free+0x52/0x150
[ 14.053688] css_free_rwork_fn+0x4e/0x1f0
[ 14.057701] process_one_work+0x18b/0x340
[ 14.061714] worker_thread+0x256/0x3a0
[ 14.065465] ? __pfx_worker_thread+0x10/0x10
[ 14.069737] kthread+0xfc/0x240
[ 14.072882] ? __pfx_kthread+0x10/0x10
[ 14.076633] ? __pfx_kthread+0x10/0x10
[ 14.080388] ret_from_fork+0xf0/0x110
[ 14.084053] ? __pfx_kthread+0x10/0x10
[ 14.087807] ret_from_fork_asm+0x1a/0x30
[ 14.091733] </TASK>
[ 14.093924] Modules linked in: xfs(E) sd_mod(E) ahci(E) libahci(E)
ghash_clmulni_intel(E) wdat_wdt(E) megaraid_sas(E) tg3(E) libata(E)
wmi(E) sunrpc(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
be2iscsi(E) iscsi_boot_sysfs(E) bnx2i(E) cnic(E) uio(E) cxgb4i(E)
cxgb4(E) tls(E) libcxgbi(E) libcxgb(E) iscsi_tcp(E) libiscsi_tcp(E)
libiscsi(E) scsi_transport_iscsi(E) nfnetlink(E)
[ 14.093947] Unloaded tainted modules: fjes(E):1
[ 14.132310] ---[ end trace 0000000000000000 ]---
[ 14.177186] RIP: 0010:__list_add_valid_or_report+0x94/0xa0
[ 14.182685] Code: cf 88 ff 0f 0b 48 89 f7 48 89 34 24 e8 45 ba c7
ff 48 8b 34 24 48 c7 c7 e8 fe e7 b4 48 8b 16 48 89 f1 48 89 de e8 8c
cf 88 ff <0f> 0b 90 66 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90
90 90
[ 14.201432] RSP: 0018:ffffcdcb07397dc0 EFLAGS: 00010246
[ 14.206666] RAX: 0000000000000075 RBX: ffffffffb5a91608 RCX: 0000000000000003
[ 14.213803] RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffffffffb59e45c8
[ 14.220934] RBP: ffff8b3008a6e5b0 R08: 0000000000000000 R09: ffffcdcb07397c48
[ 14.228067] R10: ffffffffb5924588 R11: 0000000000000003 R12: ffffffffb5a91600
[ 14.235199] R13: ffff8b3784459968 R14: ffff8b3008a6e040 R15: ffff8b3004e5b468
[ 14.242331] FS: 0000000000000000(0000) GS:ffff8b37a9be9000(0000)
knlGS:0000000000000000
[ 14.250419] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 14.256164] CR2: 00007fb1ef5862f4 CR3: 0000000ed3a24006 CR4: 00000000007706f0
[ 14.263296] PKRU: 55555554
[ 14.266011] Kernel panic - not syncing: Fatal exception
[ 14.271323] Kernel Offset: 0x32600000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 14.323149] ---[ end Kernel panic - not syncing: Fatal exception ]---
Thanks
Lei
On Fri, Sep 12, 2025 at 11:59 PM Breno Leitao <leitao@...ian.org> wrote:
>
> This patchset introduces a new dedicated ethtool_ops callback,
> .get_rx_ring_count, which enables drivers to provide the number of RX
> rings directly, improving efficiency and clarity in RX ring queries and
> RSS configuration.
>
> Number of drivers implements .get_rxnfc callback just to report the ring
> count, so, having a proper callback makes sense and simplify .get_rxnfc
> (in some cases remove it completely).
>
> This has been suggested by Jakub, and follow the same idea as RXFH
> driver callbacks [1].
>
> This also port virtio_net to this new callback. Once there is consensus
> on this approach, I can start moving the drivers to this new callback.
>
> Link: https://lore.kernel.org/all/20250611145949.2674086-1-kuba@kernel.org/ [1]
>
> Suggested-by: Jakub Kicinski <kuba@...nel.org>
> Signed-off-by: Breno Leitao <leitao@...ian.org>
> Tested-by: Lei Yang <leiyang@...hat.com>
> ---
> Changes in v2:
> - rename get_num_rxrings() to ethtool_get_rx_ring_count() (Jakub)
> - initialize struct ethtool_rxnfc() (Jakub)
> - Link to v1: https://lore.kernel.org/r/20250909-gxrings-v1-0-634282f06a54@debian.org
> ---
> Changes v1 from RFC:
> - Renaming and changing the return type of .get_rxrings() callback (Jakub)
> - Add the docstring format for the new callback (Simon)
> - Remove the unecessary WARN_ONCE() (Jakub)
> - Link to RFC: https://lore.kernel.org/r/20250905-gxrings-v1-0-984fc471f28f@debian.org
>
> ---
> Breno Leitao (7):
> net: ethtool: pass the num of RX rings directly to ethtool_copy_validate_indir
> net: ethtool: add support for ETHTOOL_GRXRINGS ioctl
> net: ethtool: remove the duplicated handling from ethtool_get_rxrings
> net: ethtool: add get_rx_ring_count callback to optimize RX ring queries
> net: ethtool: update set_rxfh to use ethtool_get_rx_ring_count helper
> net: ethtool: update set_rxfh_indir to use ethtool_get_rx_ring_count helper
> net: virtio_net: add get_rxrings ethtool callback for RX ring queries
>
> drivers/net/virtio_net.c | 15 ++-------
> include/linux/ethtool.h | 2 ++
> net/ethtool/ioctl.c | 81 +++++++++++++++++++++++++++++++++++++-----------
> 3 files changed, 68 insertions(+), 30 deletions(-)
> ---
> base-commit: 1f24a240974589ce42f70502ccb3ff3f5189d69a
> change-id: 20250905-gxrings-a2ec22ee2aec
>
> Best regards,
> --
> Breno Leitao <leitao@...ian.org>
>
Powered by blists - more mailing lists