linux-kernel - Re: [PATCH net-next v5 0/2] net: Split ndo_set_rx

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAPrAcgN=RwA5j_hCwHAdSTwyfj5jignvaZ0nNidFqYaY_VeRxA@mail.gmail.com>
Date: Fri, 21 Nov 2025 23:18:51 +0530
From: I Viswanath <viswanathiyyappan@...il.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: andrew+netdev@...n.ch, davem@...emloft.net, edumazet@...gle.com, 
	pabeni@...hat.com, horms@...nel.org, sdf@...ichev.me, kuniyu@...gle.com, 
	skhawaja@...gle.com, aleksander.lobakin@...el.com, mst@...hat.com, 
	jasowang@...hat.com, xuanzhuo@...ux.alibaba.com, eperezma@...hat.com, 
	virtualization@...ts.linux.dev, netdev@...r.kernel.org, 
	linux-kernel@...r.kernel.org, linux-kernel-mentees@...ts.linux.dev
Subject: Re: [PATCH net-next v5 0/2] net: Split ndo_set_rx_mode into snapshot

On Thu, 20 Nov 2025 at 20:47, Jakub Kicinski <kuba@...nel.org> wrote:

> Running
>
> make -C tools/testing/selftests TARGETS="drivers/net/virtio_net" run_tests

This bug seems to be caused by a call to probe() followed by remove()
without ever calling
dev_open() as dev->rx_mode_ctx is allocated there. Modifying
netif_rx_mode_flush_work()
to call flush_work only when netif_running() is true, seems to fix
this specific bug.

However, I found the following deadlock while trying to reproduce that:

dev_close():
    rtnl_lock();
    cancel_work_sync(); // wait for netif_rx_mode_write_active to complete

netif_rx_mode_write_active(): // From work item

    rtnl_lock(); // Wait for the rtnl lock to be released

I can't find a good way to solve this without changing alloc logic to
be partly in
alloc_netdev_mqs since we need the work struct to be alive after
closing. Does this
look good if that's really the most reasonable solution:

struct netif_rx_mode_ctx *rx_mode_ctx;

struct netif_rx_mode_ctx {
    struct work_struct rx_mode_work;
    struct netif_rx_mode_active_ctx *active_ctx;
    int state;
}

struct netif_rx_mode_active_ctx {
        struct net_device               *dev;
        struct netif_rx_mode_config     *ready;
        struct netif_rx_mode_config     *pending;
}

rx_mode_ctx will be handled in alloc_netdev_mqs()/free_netdev() while active_ctx
will be handled in dev_open()/dev_close()

Never call flush_work/cancel_work_sync for this work in core
as that is a guaranteed deadlock because of how everything is serialized