[<prev] [next>] [day] [month] [year] [list]
Message-ID: <166BBD38-24D4-491C-BA62-1E41BE8C2F84@nutanix.com>
Date: Fri, 21 Nov 2025 17:52:19 +0000
From: Jon Kohler <jon@...anix.com>
To: "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Investigating masked_flow_lookup and skb_get_hash overheads
Hi netdev,
Looking for some advice on on two overheads that I noticed while using
tun on top of vhost-net in a UDP TX heavy workload.
1) openvswitch masked_flow_lookup
2) figuring out the SKB hash for tun produced SKBs (siphash)
The reproduction is quite basic, with two Ubuntu 25.10 (6.17) guests on
a host running 6.18 rc6, with one VM doing iperf3 TX and the other doing
RX. There are no other VM's/endpoints doing any appreciable traffic
during the test. Each guest has a single virtio-net device, representing
one single NIC queue.
The TX VM is doing a fair amount of traffic, with 6.39 Gbits/sec and
551707 UDP datagrams per second. The vhost worker thread that backs this
virtio-net device is at 100% CPU during the test.
For point 1 (masked_flow_lookup):
I've created a GitHub gist that has the screenshot of a flamegraph from
the vhost-net worker thread on the TX side, and the disassembly of
masked_flow_lookup from the perf top perspective:
https://gist.github.com/JonKohler/02ef2c49a176dc30bea75f689887da16
Even though perf top's view is much smaller than lets say memcpy, both
masked_flow_lookup and the sip hash work are directly in the critical
path for netif_receive_skb, and are occuring prior to the tun_net_xmit
call, so optimizing them should produce some non-zero gains I suspect.
Output of perf top -t (vhost-net-thread)
Samples: 81K of event 'cycles:P', 4000 Hz, Event count (approx.): 32571035216 lost: 0/0 drop: 0/0
Overhead Shared Object Symbol
31.04% [kernel] [k] _copy_from_iter
6.73% [kernel] [k] __get_user_nocheck_2
4.92% [vhost] [k] vhost_copy_from_user.constprop.0
4.91% [kernel] [k] memcpy
3.59% [kernel] [k] __siphash_unaligned
3.41% [openvswitch] [k] masked_flow_lookup
masked_flow_lookup runs as part of the following call chain:
ovs_dp_process_packet
ovs_flow_tbl_lookup_stats
flow_lookup
masked_flow_lookup
The key comparisons feel fairly optimized at first blush, so I feel
like I'm missing something broader.
For point 2 (tun created SKB hash):
I don't see a correctness issue here, but I'm wondering if there is a
way to generate a hash for the SKB that is cheaper than leveraging
the default siphash that happens when the code hits skb_get_hash here:
ovs_dp_process_packet ->
flow = ovs_flow_tbl_lookup_stats(&dp->table, key, skb_get_hash(skb),
&n_mask_hit, &n_cache_hit);
In my reproduction, there is a single queue, so the typical hashing
stuff in tun (for rxhash) doesn't get hit by default.
I’m curious if, in the single queue case, we could use a cheaper hash
generation method of some sort?
Would appreciate any help I can get here, happy to collaborate!
Cheers,
Jon
Powered by blists - more mailing lists