[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CALCETrVJqj1JJmHJhMoZ3Fuj685Unf=DYsiEY1uEfnJq3H+Tzg@mail.gmail.com>
Date: Wed, 7 Feb 2024 10:59:52 -0800
From: Andy Lutomirski <luto@...capital.net>
To: Network Development <netdev@...r.kernel.org>
Subject: Raw sockets cause unnecessary, expensive skb clones
I was trying out dropwatch on a system that is incidentally running
tailscaled, and I noticed an utterly absurd number of drops. After
digging around the Linux code and the tailscaled code, I figured out
the issue. Tailscaled opens a raw IP socket bound to the UDP protocol
with a filter that should reject essentially all non-tailscale traffic
(and it does this for vaguely silly firewall-bypassing reasons that
may or may not make sense, but that's sort of secondary).
Linux passes incoming IPv4 packets to raw_v4_input, which tries to
ignore things, but not very hard, via raw_v4_match(). raw_v4_match()
considers protocol, source and dest addresses, and interfaces, which
means it matches literally every incoming UDP packet when tailscaled
is running. Okay, fine, the socket filter will pare this down
cheaply, except for:
struct sk_buff *clone = skb_clone(skb, GFP_ATOMIC);
This happens before raw_rcv -> raw_rcv_skb ->
sock_queue_rcv_skb_reason -> sk_filter.
On a quick inspection (and I'm not nearly familiar enough with the
code in question to believe this right away), the only thing that
happens in between skb_clone and sk_filter that actually requires
cloning is ipv4_pktinfo_prepare().
Does sk_filter require ipv4_pktinfo_prepare() first? If not, perhaps
sock_queue_rcv_skb_reason could have a new option that would tell it
to clone the skb itsefl and ipv4_pktinfo_prepare() could be pushed
down. Or raw_rcv_skb could do the sk_filter itself and arrange to
skip the subsequent sk_filter call?
FWIW, packet_filter() explicitly tries to avoid this problem -- it
even has a nice comment:
* This function makes lazy skb cloning in hope that most of packets
* are discarded by BPF.
--
Andy Lutomirski
AMA Capital Management, LLC
Powered by blists - more mailing lists