[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iJ+iWS_d3Vwg6k03mp4v_6OXHB1oS76A+9p1U7hGKdFng@mail.gmail.com>
Date: Fri, 21 Jul 2023 20:14:35 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Maciej Żenczykowski <maze@...gle.com>
Cc: Linux NetDev <netdev@...r.kernel.org>, Jesper Dangaard Brouer <brouer@...hat.com>,
Pengtao He <hepengtao@...omi.com>, Willem Bruijn <willemb@...gle.com>,
Stanislav Fomichev <sdf@...gle.com>, Xiao Ma <xiaom@...gle.com>, Patrick Rohr <prohr@...gle.com>,
Alexei Starovoitov <ast@...nel.org>
Subject: Re: Performance question: af_packet with bpf filter vs TX path skb_clone
On Fri, Jul 21, 2023 at 7:55 PM Maciej Żenczykowski <maze@...gle.com> wrote:
>
> I've been asked to review:
> https://android-review.googlesource.com/c/platform/packages/modules/NetworkStack/+/2648779
>
> where it comes to light that in Android due to background debugging of
> connectivity problems
> (of which there are *plenty* due to various types of buggy [primarily]
> wifi networks)
> we have a permanent AF_PACKET, ETH_P_ALL socket with a cBPF filter:
>
> arp or (ip and udp port 68) or (icmp6 and ip6[40] >= 133 and ip6[40] <= 136)
>
> ie. it catches ARP, IPv4 DHCP and IPv6 ND (NS/NA/RS/RA)
>
> If I'm reading the kernel code right this appears to cause skb_clone()
> to be called on *every* outgoing packet,
> even though most packets will not be accepted by the filter.
>
> (In the TX path the filter appears to get called *after* the clone,
> I think that's unlike the RX path where the filter is called first)
>
> Unfortunately, I don't think it's possible to eliminate the
> functionality this socket provides.
> We need to be able to log RX & TX of ARP/DHCP/ND for debugging /
> bugreports / etc.
> and they *really* should be in order wrt. to each other.
> (and yeah, that means last few minutes history when an issue happens,
> so not possible to simply enable it on demand)
>
> We could of course split the socket into 3 separate ones:
> - ETH_P_ARP
> - ETH_P_IP + cbpf udp dport=dhcp
> - ETH_P_IPV6 + cbpf icmpv6 type=NS/NA/RS/RA
>
> But I don't think that will help - I believe we'll still get
> skb_clone() for every outbound ipv4/ipv6 packet.
>
> I have some ideas for what could be done to avoid the clone (with
> existing kernel functionality)... but none of it is pretty...
> Anyone have any smart ideas?
>
> Perhaps a way to move the clone past the af_packet packet_rcv run_filter?
> Unfortunately packet_rcv() does a little bit of 'setup' before it
> calls the filter - so this may be hard.
dev_queue_xmit_nit() also does some 'setup':
net_timestamp_set(skb2); (This one could probably be moved into
af_packet, if packet is not dropped ?)
<sanitize mac, network, transport headers>
>
> Or an 'extra' early pre-filter hook [prot_hook.prefilter()] that has
> very minimal
> functionality... like match 2 bytes at an offset into the packet?
> Maybe even not a hook at all, just adding a
> prot_hook.prefilter{1,2}_u64_{offset,mask,value}
> It doesn't have to be perfect, but if it could discard 99% of the
> packets we don't care about...
> (and leave filtering of the remaining 1% to the existing cbpf program)
> that would already be a huge win?
Maybe if we can detect a cBPF filter does not access mac, network,
transport header,
we could run it earlier, before the clone().
So we could add
prot_hook.filter_can_run_from_dev_queue_xmit_nit_before_the_clone
Or maybe we can remove sanitization, because BPF should not do bad
things if these headers are garbage ?
>
> Thoughts?
>
> Thanks,
> Maciej
Powered by blists - more mailing lists