lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CALCETrVJqj1JJmHJhMoZ3Fuj685Unf=DYsiEY1uEfnJq3H+Tzg@mail.gmail.com>
Date: Wed, 7 Feb 2024 10:59:52 -0800
From: Andy Lutomirski <luto@...capital.net>
To: Network Development <netdev@...r.kernel.org>
Subject: Raw sockets cause unnecessary, expensive skb clones

I was trying out dropwatch on a system that is incidentally running
tailscaled, and I noticed an utterly absurd number of drops.  After
digging around the Linux code and the tailscaled code, I figured out
the issue.  Tailscaled opens a raw IP socket bound to the UDP protocol
with a filter that should reject essentially all non-tailscale traffic
(and it does this for vaguely silly firewall-bypassing reasons that
may or may not make sense, but that's sort of secondary).

Linux passes incoming IPv4 packets to raw_v4_input, which tries to
ignore things, but not very hard, via raw_v4_match().  raw_v4_match()
considers protocol, source and dest addresses, and interfaces, which
means it matches literally every incoming UDP packet when tailscaled
is running.  Okay, fine, the socket filter will pare this down
cheaply, except for:

struct sk_buff *clone = skb_clone(skb, GFP_ATOMIC);

This happens before raw_rcv -> raw_rcv_skb ->
sock_queue_rcv_skb_reason -> sk_filter.

On a quick inspection (and I'm not nearly familiar enough with the
code in question to believe this right away), the only thing that
happens in between skb_clone and sk_filter that actually requires
cloning is ipv4_pktinfo_prepare().

Does sk_filter require ipv4_pktinfo_prepare() first?  If not, perhaps
sock_queue_rcv_skb_reason could have a new option that would tell it
to clone the skb itsefl and ipv4_pktinfo_prepare() could be pushed
down.  Or raw_rcv_skb could do the sk_filter itself and arrange to
skip the subsequent sk_filter call?

FWIW, packet_filter() explicitly tries to avoid this problem -- it
even has a nice comment:

 * This function makes lazy skb cloning in hope that most of packets
 * are discarded by BPF.



-- 
Andy Lutomirski
AMA Capital Management, LLC

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ