[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20251120141354.355059-1-viswanathiyyappan@gmail.com>
Date: Thu, 20 Nov 2025 19:43:52 +0530
From: I Viswanath <viswanathiyyappan@...il.com>
To: andrew+netdev@...n.ch,
davem@...emloft.net,
edumazet@...gle.com,
kuba@...nel.org,
pabeni@...hat.com,
horms@...nel.org,
sdf@...ichev.me,
kuniyu@...gle.com,
skhawaja@...gle.com,
aleksander.lobakin@...el.com,
mst@...hat.com,
jasowang@...hat.com,
xuanzhuo@...ux.alibaba.com,
eperezma@...hat.com
Cc: virtualization@...ts.linux.dev,
netdev@...r.kernel.org,
linux-kernel@...r.kernel.org,
linux-kernel-mentees@...ts.linux.dev,
I Viswanath <viswanathiyyappan@...il.com>
Subject: [PATCH net-next v5 0/2] net: Split ndo_set_rx_mode into snapshot
This is an implementation of the idea provided by Jakub here
https://lore.kernel.org/netdev/20250923163727.5e97abdb@kernel.org/
ndo_set_rx_mode is problematic because it cannot sleep.
To address this, this series proposes dividing the concept of setting
rx_mode into 2 stages: snapshot and deferred I/O. To achieve this, we
reinterpret set_rx_mode and add create a new ndo write_rx_mode as
explained below:
The new set_rx_mode will be responsible for customizing the rx_mode
snapshot which will be used by write_rx_mode to update the hardware
In brief, the new flow looks something like:
prepare_rx_mode():
ndo_set_rx_mode();
ready_snapshot();
write_rx_mode():
commit_and_use_snapshot();
ndo_write_rx_mode();
write_rx_mode() is called from a work item and doesn't hold the
netif_addr_lock lock during ndo_write_rx_mode() making it sleepable
in that section.
This model should work correctly if the following conditions hold:
1. write_rx_mode should use the rx_mode set by the most recent
call to prepare_rx_mode before its execution.
2. If a prepare_rx_mode call happens during execution of write_rx_mode,
write_rx_mode should be rescheduled.
3. All calls to modify rx_mode should pass through the prepare_rx_mode +
schedule write_rx_mode execution flow. netif_rx_mode_schedule_work
has been implemented in core for this.
1 and 2 are guaranteed because of the properties of work queues
Drivers need to ensure 3
To use this model, a driver needs to implement the
ndo_write_rx_mode callback, change the set_rx_mode callback
appropriately and replace all calls to modify rx mode with
netif_rx_mode_schedule_work
Signed-off-by: I Viswanath <viswanathiyyappan@...il.com>
---
Questions I have:
1) Would there ever be a situation in which you will have to wait for I/O
to complete in a call to set_rx_mode before proceeding further? That is,
Does netif_rx_mode_schedule_work need the flush argument?
2) Does priv_ptr in netif_rx_mode_config make sense? For virtio_net, I
can get the vi pointer by doing netdev_priv(dev) and am wondering
if this would be a common thing
3) From a previous discussion:
https://lore.kernel.org/netdev/417c677f-268a-4163-b07e-deea8f9b9b40@intel.com/
On Thu, 23 Oct 2025 at 05:16, Jacob Keller wrote:
> Is there any mechanism to make this guarantee either implemented or at
> least verified by the core? If not that, what about some sort of way to
> lint driver code and make sure its correct?
I am not sure how to automate this but basically we need warnings to be
generated when the the set_rx_mode implementations are called statically in
code (From my understanding, usually in the open callback or the timeout function)
but not when they are called through ops->set_rx_mode.
Can Coccinelle do something like this?
v1:
Link: https://lore.kernel.org/netdev/20251020134857.5820-1-viswanathiyyappan@gmail.com/
v2:
- Exported set_and_schedule_rx_config as a symbol for use in modules
- Fixed incorrect cleanup for the case of rx_work alloc failing in alloc_netdev_mqs
- Removed the locked version (cp_set_rx_mode) and renamed __cp_set_rx_mode to cp_set_rx_mode
Link: https://lore.kernel.org/netdev/20251026175445.1519537-1-viswanathiyyappan@gmail.com/
v3:
- Added RFT tag
- Corrected mangled patch
Link: https://lore.kernel.org/netdev/20251028174222.1739954-1-viswanathiyyappan@gmail.com/
v4:
- Completely reworked the snapshot mechanism as per v3 comments
- Implemented the callback for virtio-net instead of 8139cp driver
- Removed RFC tag
Link: https://lore.kernel.org/netdev/20251118164333.24842-1-viswanathiyyappan@gmail.com/
v5:
- Fix broken code and titles
- Remove RFT tag
Here’s an enumeration of all the cases I have tested that should be exhaustive:
RX behaviour verification:
1) Dest is UC/MC addr X in UC/MC list:
no mode: Recv
allmulti: Recv
promisc: Recv
2) Dest is UC addr X not in UC list:
no_mode: Drop
allmulti: Drop
promisc: Recv
3) Dest is MC addr X not in MC list:
no_mode: Drop
allmulti: Recv
promisc: Recv
Packets injected from host using scapy on a TAP device as follows:
sendp(Ether(src=tap0_mac, dst=X) / IP() / UDP() / "test", iface="tap0")
And on the VM side, rx was checked via cat /proc/net/dev
Teardown path:
Relevant as they flush the work item. ens4 is the virtio-net interface.
virtnet_remove:
ip maddr add 01:00:5e:00:03:02 dev ens4; echo 1 > /sys/bus/pci/devices/0000:00:04.0/remove
virtnet_freeze_down:
ip maddr add 01:00:5e:00:03:02 dev ens4; echo mem > /sys/power/state
---
I Viswanath (2):
net: refactor set_rx_mode into snapshot and deferred I/O
virtio-net: Implement ndo_write_rx_mode callback
drivers/net/virtio_net.c | 58 +++++------
include/linux/netdevice.h | 104 ++++++++++++++++++-
net/core/dev.c | 208 +++++++++++++++++++++++++++++++++++++-
3 files changed, 330 insertions(+), 40 deletions(-)
--
2.34.1
Powered by blists - more mailing lists