[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAPrAcgN=RwA5j_hCwHAdSTwyfj5jignvaZ0nNidFqYaY_VeRxA@mail.gmail.com>
Date: Fri, 21 Nov 2025 23:18:51 +0530
From: I Viswanath <viswanathiyyappan@...il.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: andrew+netdev@...n.ch, davem@...emloft.net, edumazet@...gle.com,
pabeni@...hat.com, horms@...nel.org, sdf@...ichev.me, kuniyu@...gle.com,
skhawaja@...gle.com, aleksander.lobakin@...el.com, mst@...hat.com,
jasowang@...hat.com, xuanzhuo@...ux.alibaba.com, eperezma@...hat.com,
virtualization@...ts.linux.dev, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-kernel-mentees@...ts.linux.dev
Subject: Re: [PATCH net-next v5 0/2] net: Split ndo_set_rx_mode into snapshot
On Thu, 20 Nov 2025 at 20:47, Jakub Kicinski <kuba@...nel.org> wrote:
> Running
>
> make -C tools/testing/selftests TARGETS="drivers/net/virtio_net" run_tests
This bug seems to be caused by a call to probe() followed by remove()
without ever calling
dev_open() as dev->rx_mode_ctx is allocated there. Modifying
netif_rx_mode_flush_work()
to call flush_work only when netif_running() is true, seems to fix
this specific bug.
However, I found the following deadlock while trying to reproduce that:
dev_close():
rtnl_lock();
cancel_work_sync(); // wait for netif_rx_mode_write_active to complete
netif_rx_mode_write_active(): // From work item
rtnl_lock(); // Wait for the rtnl lock to be released
I can't find a good way to solve this without changing alloc logic to
be partly in
alloc_netdev_mqs since we need the work struct to be alive after
closing. Does this
look good if that's really the most reasonable solution:
struct netif_rx_mode_ctx *rx_mode_ctx;
struct netif_rx_mode_ctx {
struct work_struct rx_mode_work;
struct netif_rx_mode_active_ctx *active_ctx;
int state;
}
struct netif_rx_mode_active_ctx {
struct net_device *dev;
struct netif_rx_mode_config *ready;
struct netif_rx_mode_config *pending;
}
rx_mode_ctx will be handled in alloc_netdev_mqs()/free_netdev() while active_ctx
will be handled in dev_open()/dev_close()
Never call flush_work/cancel_work_sync for this work in core
as that is a guaranteed deadlock because of how everything is serialized
Powered by blists - more mailing lists