[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240307151849.394962-1-amorenoz@redhat.com>
Date: Thu, 7 Mar 2024 16:18:44 +0100
From: Adrian Moreno <amorenoz@...hat.com>
To: netdev@...r.kernel.org,
dev@...nvswitch.org
Cc: Adrian Moreno <amorenoz@...hat.com>,
cmi@...dia.com,
yotam.gi@...il.com,
i.maximets@....org,
aconole@...hat.com,
echaudro@...hat.com,
horms@...nel.org
Subject: [RFC PATCH 0/4] net: openvswitch: Add sample multicasting.
** Background **
Currently, OVS supports several packet sampling mechanisms (sFlow,
per-bridge IPFIX, per-flow IPFIX). These end up being translated into a
userspace action that needs to be handled by ovs-vswitchd's handler
threads only to be forwarded to some third party application that
will somehow process the sample and provide observability on the
datapath.
The fact that sampled traffic share netlink sockets and handler thread
time with upcalls, apart from being a performance bottleneck in the
sample extraction itself, can severely compromise the datapath,
yielding this solution unfit for highly loaded production systems.
Users are left with little options other than guessing what sampling
rate will be OK for their traffic pattern and system load and dealing
with the lost accuracy.
** Proposal **
In this RFC, I'd like to request feedback on an attempt to fix this
situation by adding a flag to the userspace action to indicate the
upcall should be sent to a netlink multicast group instead of unicasted
to ovs-vswitchd.
This would allow for other processes to read samples directly, freeing
the netlink sockets and handler threads to process packet upcalls.
** Notes on tc-offloading **
I am aware of the efforts being made to offload the sample action with
the help of psample.
I did consider using psample to multicast the samples. However, I
found a limitation that I'd like to discuss:
I would like to support OVN-driven per-flow (IPFIX) sampling because
it allows OVN to insert two 32-bit values (obs_domain_id and
ovs_point_id) that can be used to enrich the sample with "high level"
controller metadata (see debug_drop_domain_id NBDB option in ovn-nb(5)).
The existing fields in psample_metadata are not enough to carry this
information. Would it be possible to extend this struct to make room for
some extra "application-specific" metadata?
** Alternatives **
An alternative approach that I'm considering (apart from using psample
as explained above) is to use a brand-new action. This lead to a cleaner
separation of concerns with existing userspace action (used for slow
paths and OFP_CONTROLLER actions) and cleaner statistics.
Also, ovs-vswitchd could more easily make the layout of this
new userdata part of the public API, allowing third party sample
collectors to decode it.
I am currently exploring this alternative but wanted to send the RFC to
get some early feedback, guidance or ideas.
Adrian Moreno (4):
net:openvswitch: Support multicasting userspace ...
openvswitch:trace: Add ovs_dp_monitor tracepoint.
net:openvswitch: Avoid extra copy if no listeners.
net:openvswitch: Add multicasted packets to stats
include/uapi/linux/openvswitch.h | 8 +++-
net/openvswitch/actions.c | 5 ++
net/openvswitch/datapath.c | 33 ++++++++++++--
net/openvswitch/datapath.h | 1 +
net/openvswitch/flow_netlink.c | 6 ++-
net/openvswitch/openvswitch_trace.h | 71 +++++++++++++++++++++++++++++
net/openvswitch/vport.c | 8 ++++
net/openvswitch/vport.h | 1 +
8 files changed, 125 insertions(+), 8 deletions(-)
--
2.44.0
Powered by blists - more mailing lists