lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251227174225.699975-1-viswanathiyyappan@gmail.com>
Date: Sat, 27 Dec 2025 23:12:23 +0530
From: I Viswanath <viswanathiyyappan@...il.com>
To: kuba@...nel.org,
	pabeni@...hat.com,
	horms@...nel.org,
	andrew+netdev@...n.ch,
	edumazet@...gle.com,
	xuanzhuo@...ux.alibaba.com,
	mst@...hat.com,
	jasowang@...hat.com,
	eperezma@...hat.com
Cc: netdev@...r.kernel.org,
	virtualization@...ts.linux.dev,
	I Viswanath <viswanathiyyappan@...il.com>
Subject: [PATCH net-next v6 0/2] net: Split ndo_set_rx_mode into snapshot and deferred write

This is an implementation of the idea provided by Jakub here

https://lore.kernel.org/netdev/20250923163727.5e97abdb@kernel.org/

ndo_set_rx_mode is problematic because it cannot sleep. 

To address this, this series proposes dividing the concept of setting
rx_mode into 2 stages: snapshot and deferred I/O. To achieve this, we
reinterpret set_rx_mode and add create a new ndo write_rx_mode as
explained below:

The new set_rx_mode will be responsible for customizing the rx_mode
snapshot which will be used by write_rx_mode to update the hardware

In brief, the new flow looks something like:

prepare_rx_mode():
    ndo_set_rx_mode();
    prepare_snapshot();

write_rx_mode():
    use_ready_snapshot();
    ndo_write_rx_mode();

write_rx_mode() is called from a work item and doesn't hold the 
netif_addr_lock lock during ndo_write_rx_mode() making it sleepable
in that section.

This model should work correctly if the following conditions hold:

1. write_rx_mode should use the rx_mode set by the most recent
    call to make_snapshot_ready before its execution.

2. If a make_snapshot_ready call happens during execution of write_rx_mode,
    write_rx_mode should be rescheduled.

3. All calls to modify rx_mode should pass through the prepare_rx_mode +
	schedule write_rx_mode execution flow. netif_rx_mode_schedule_work 
    has been implemented in core for this purpose.

1 and 2 are implemented in core

Drivers need to ensure 3 using netif_rx_mode_schedule_work

To use this model, a driver needs to implement the
ndo_write_rx_mode callback, change the set_rx_mode callback
appropriately and replace all calls to modify rx mode with
netif_rx_mode_schedule_work

Signed-off-by: I Viswanath <viswanathiyyappan@...il.com>
---

In v5, apart from the bug with netif_rx_mode_flush_work, this line of code in netif_free_rx_mode_ctx
was problematic:

cancel_work_sync(&dev->rx_mode_ctx->rx_mode_work);

The problem was this function ran as part of dev_close() and hence the RTNL lock is held while it is waiting
for netif_rx_mode_write_active() which needs to grab RTNL lock. 

If the work function was scheduled before a call to dev_close(), we are guaranteed a deadlock. 

The solution to this is cancelling the work in a context that doesn't hold the RTNL lock. The only existing
function in the teardown path that did this was free_netdev and it isn't ideal to do the cleanup there.

My solution was to introduce a new struct netif_deferred_work_cleanup and a new net_device member 
deferred_work_cleanup.

deferred_work_cleanup will be a work item (along with a ptr to dev) scheduled by dev_close() that 
will execute the cleanup functions that require the RTNL lock to not be held

v1:
Link: https://lore.kernel.org/netdev/20251020134857.5820-1-viswanathiyyappan@gmail.com/

v2:
- Exported set_and_schedule_rx_config as a symbol for use in modules
- Fixed incorrect cleanup for the case of rx_work alloc failing in alloc_netdev_mqs
- Removed the locked version (cp_set_rx_mode) and renamed __cp_set_rx_mode to cp_set_rx_mode
Link: https://lore.kernel.org/netdev/20251026175445.1519537-1-viswanathiyyappan@gmail.com/

v3:
- Added RFT tag
- Corrected mangled patch
Link: https://lore.kernel.org/netdev/20251028174222.1739954-1-viswanathiyyappan@gmail.com/

v4:
- Completely reworked the snapshot mechanism as per v3 comments
- Implemented the callback for virtio-net instead of 8139cp driver
- Removed RFC tag
Link: https://lore.kernel.org/netdev/20251118164333.24842-1-viswanathiyyappan@gmail.com/

v5:
- Fix broken code and titles
- Remove RFT tag
Link: https://lore.kernel.org/netdev/20251120141354.355059-1-viswanathiyyappan@gmail.com/

v6:
- Added struct netif_deferred_work_cleanup and members needs_deferred_cleanup and deferred_work_cleanup in net_device
- Moved out ctrl bits from netif_rx_mode_config to netif_rx_mode_work_ctx

I Viswanath (2):
  net: refactor set_rx_mode into snapshot and deferred I/O
  virtio-net: Implement ndo_write_rx_mode callback

 drivers/net/virtio_net.c  |  55 +++-----
 include/linux/netdevice.h | 113 +++++++++++++++-
 net/core/dev.c            | 270 +++++++++++++++++++++++++++++++++++++-
 3 files changed, 396 insertions(+), 42 deletions(-)

-- 
2.47.3


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ