netdev - Investigating masked_flow_lookup and skb_get

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <166BBD38-24D4-491C-BA62-1E41BE8C2F84@nutanix.com>
Date: Fri, 21 Nov 2025 17:52:19 +0000
From: Jon Kohler <jon@...anix.com>
To: "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Investigating masked_flow_lookup and skb_get_hash overheads

Hi netdev,
Looking for some advice on on two overheads that I noticed while using
tun on top of vhost-net in a UDP TX heavy workload.

1) openvswitch masked_flow_lookup
2) figuring out the SKB hash for tun produced SKBs (siphash)

The reproduction is quite basic, with two Ubuntu 25.10 (6.17) guests on
a host running 6.18 rc6, with one VM doing iperf3 TX and the other doing
RX. There are no other VM's/endpoints doing any appreciable traffic
during the test. Each guest has a single virtio-net device, representing
one single NIC queue.

The TX VM is doing a fair amount of traffic, with 6.39 Gbits/sec and
551707 UDP datagrams per second. The vhost worker thread that backs this
virtio-net device is at 100% CPU during the test.

For point 1 (masked_flow_lookup):
I've created a GitHub gist that has the screenshot of a flamegraph from
the vhost-net worker thread on the TX side, and the disassembly of
masked_flow_lookup from the perf top perspective:
https://gist.github.com/JonKohler/02ef2c49a176dc30bea75f689887da16

Even though perf top's view is much smaller than lets say memcpy, both
masked_flow_lookup and the sip hash work are directly in the critical
path for netif_receive_skb, and are occuring prior to the tun_net_xmit
call, so optimizing them should produce some non-zero gains I suspect.

Output of perf top -t (vhost-net-thread)
Samples: 81K of event 'cycles:P', 4000 Hz, Event count (approx.): 32571035216 lost: 0/0 drop: 0/0
Overhead  Shared Object     Symbol
  31.04%  [kernel]          [k] _copy_from_iter                   
   6.73%  [kernel]          [k] __get_user_nocheck_2              
   4.92%  [vhost]           [k] vhost_copy_from_user.constprop.0  
   4.91%  [kernel]          [k] memcpy           
   3.59%  [kernel]          [k] __siphash_unaligned               
   3.41%  [openvswitch]     [k] masked_flow_lookup

masked_flow_lookup runs as part of the following call chain:
ovs_dp_process_packet
  ovs_flow_tbl_lookup_stats
    flow_lookup
      masked_flow_lookup

The key comparisons feel fairly optimized at first blush, so I feel
like I'm missing something broader. 

For point 2 (tun created SKB hash):
I don't see a correctness issue here, but I'm wondering if there is a
way to generate a hash for the SKB that is cheaper than leveraging
the default siphash that happens when the code hits skb_get_hash here:
	ovs_dp_process_packet -> 
	flow = ovs_flow_tbl_lookup_stats(&dp->table, key, skb_get_hash(skb),
					 &n_mask_hit, &n_cache_hit);

In my reproduction, there is a single queue, so the typical hashing
stuff in tun (for rxhash) doesn't get hit by default.

I’m curious if, in the single queue case, we could use a cheaper hash
generation method of some sort?

Would appreciate any help I can get here, happy to collaborate!

Cheers,
Jon