[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANP3RGcxE4pv1Dqi65O4KMP8B=7rcX9Hyd4RRMLGRtiC9L1E2Q@mail.gmail.com>
Date: Fri, 24 Jan 2025 10:51:25 -0800
From: Maciej Żenczykowski <maze@...gle.com>
To: Daniel Borkmann <daniel@...earbox.net>
Cc: Alexei Starovoitov <ast@...nel.org>,
Linux Network Development Mailing List <netdev@...r.kernel.org>, "David S . Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>, BPF Mailing List <bpf@...r.kernel.org>,
Stanislav Fomichev <sdf@...ichev.me>, Willem de Bruijn <willemb@...gle.com>,
Matt Moeller <moeller.matt@...il.com>
Subject: Re: [PATCH bpf] bpf: fix classic bpf reads from negative offset
outside of linear skb portion
On Thu, Jan 23, 2025 at 2:36 PM Daniel Borkmann <daniel@...earbox.net> wrote:
>
> On 1/22/25 9:04 PM, Maciej Żenczykowski wrote:
> > We're received reports of cBPF code failing to accept DHCP packets.
> > "BPF filter for DHCP not working (android14-6.1-lts + android-14.0.0_r74)"
>
> I presume this is cBPF on AF_PACKET, right?
>
> > The relevant Android code is at:
> > https://cs.android.com/android/platform/superproject/main/+/main:packages/modules/NetworkStack/jni/network_stack_utils_jni.cpp;l=95;drc=9df50aef1fd163215dcba759045706253a5624f5
> > which uses a lot of macros from:
> > https://cs.android.com/android/platform/superproject/main/+/main:packages/modules/Connectivity/bpf/headers/include/bpf/BpfClassic.h;drc=c58cfb7c7da257010346bd2d6dcca1c0acdc8321
> >
> > This is widely used and does work on the vast majority of drivers,
> > but is exposing a core kernel cBPF bug related to driver skb layout.
> >
> > Root cause is iwlwifi driver, specifically on (at least):
> > Dell 7212: Intel Dual Band Wireless AC 8265
> > Dell 7220: Intel Wireless AC 9560
> > Dell 7230: Intel Wi-Fi 6E AX211
> > delivers frames where the UDP destination port is not in the skb linear
> > portion, while the cBPF code is using SKF_NET_OFF relative addressing.
> >
> > simplified from above, effectively:
> > BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, SKF_NET_OFF)
> > BPF_STMT(BPF_LD | BPF_H | BPF_IND, SKF_NET_OFF + 2)
> > BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 68, 1, 0)
> > BPF_STMT(BPF_RET | BPF_K, 0)
> > BPF_STMT(BPF_RET | BPF_K, 0xFFFFFFFF)
> > fails to match udp dport=68 packets.
> >
> > Specifically the 3rd cBPF instruction fails to match the condition:
> > if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb))
> > within bpf_internal_load_pointer_neg_helper() and thus returns NULL,
> > which results in reading -EFAULT.
> >
> > This is because bpf_skb_load_helper_{8,16,32} don't include the
> > "data past headlen do skb_copy_bits()" logic from the non-negative
> > offset branch in the negative offset branch.
> >
> > Note: I don't know sparc assembly, so this doesn't fix sparc...
> > ideally we should just delete bpf_internal_load_pointer_neg_helper()
> > This seems to have always been broken (but not pre-git era, since
> > obviously there was no eBPF helpers back then), but stuff older
> > than 5.4 is no longer LTS supported anyway, so using 5.4 as fixes tag.
> >
> > Cc: Alexei Starovoitov <ast@...nel.org>
> > Cc: Daniel Borkmann <daniel@...earbox.net>
> > Cc: Stanislav Fomichev <sdf@...ichev.me>
> > Cc: Willem de Bruijn <willemb@...gle.com>
> > Reported-by: Matt Moeller <moeller.matt@...il.com>
> > Closes: https://issuetracker.google.com/384636719 [Treble - GKI partner internal]
> > Signed-off-by: Maciej Żenczykowski <maze@...gle.com>
> > Fixes: 219d54332a09 ("Linux 5.4")
>
> Hm, so not strictly broken, just that using relative SKF_NET_OFF offset
> is limited in that it requires the data to be in linear section. Some
> potential workarounds that come to mind:
I'd consider that to be broken... this is *ancient* functionality of
classic bpf, and it far predates both eBPF and drivers creating weird
split packet layouts.
The breakage is very much userspace visible.
> 1) When you say vast majority of drivers, have you checked how much they
> typically pull into linear section and whether it would make sense also
> for iwlwifi to do the same? (It sounds like start of network header is
> already in non-linear part for iwlwifi?)
Well... when I say 'vast majority' what I mean is: this is rolled out
to *all* Android mainline capable devices (and has been for a while),
and this is the first bugreport we've got. I would imagine if IPv4
DHCP discovery failed to work (and this appears to be a 100% failure
on affected drivers) we'd be hearing a *lot* more screaming...
Of course this doesn't mean this is the first time this has failed,
since this is certainly not something easy to debug... But it has to
be mostly working on the ecosystem as a whole.
Of course most deployed devices tend to have old kernels and or old
drivers, so probably most of them simply don't do packet split (and
even if they do, they'd presumably be doing more normal header split,
and be splitting after the udp header)
> 2) Perhaps rework the filter to avoid relying on SKF_NET_OFF.. tradeoff
> are more instructions, e.g.,
>
> # tcpdump -dd udp dst port 68
> { 0x28, 0, 0, 0x0000000c },
> { 0x15, 0, 4, 0x000086dd },
> { 0x30, 0, 0, 0x00000014 },
> { 0x15, 0, 11, 0x00000011 },
> { 0x28, 0, 0, 0x00000038 },
> { 0x15, 8, 9, 0x00000044 },
> { 0x15, 0, 8, 0x00000800 },
> { 0x30, 0, 0, 0x00000017 },
> { 0x15, 0, 6, 0x00000011 },
> { 0x28, 0, 0, 0x00000014 },
> { 0x45, 4, 0, 0x00001fff },
> { 0xb1, 0, 0, 0x0000000e },
> { 0x48, 0, 0, 0x00000010 },
> { 0x15, 0, 1, 0x00000044 },
> { 0x6, 0, 0, 0x00040000 },
> { 0x6, 0, 0, 0x00000000 },
I don't think this works as is.
When I run this with -dd I get the (at first glance) same output as
you, but with a warning, here's the same thing with -d:
$ tcpdump -d udp dst port 68
Warning: assuming Ethernet
(000) ldh [12] <-- assumes ethernet header, loads ethertype
(001) jeq #0x86dd jt 2 jf 6
(002) ldb [20]
(003) jeq #0x11 jt 4 jf 15
(004) ldh [56]
(005) jeq #0x44 jt 14 jf 15
(006) jeq #0x800 jt 7 jf 15
(007) ldb [23]
(008) jeq #0x11 jt 9 jf 15
(009) ldh [20]
(010) jset #0x1fff jt 15 jf 11
(011) ldxb 4*([14]&0xf)
(012) ldh [x + 16]
(013) jeq #0x44 jt 14 jf 15
(014) ret #262144
(015) ret #0
Not assuming ethernet is the entire point of using SKF_NET_OFF.
(and SKF_AD_OFF + SKF_AD_PROTOCOL to check ethertype)
> 3) Use eBPF asm, e.g. you can pull in data into linear section via helper
> bpf_skb_pull_data() if needed, or use bpf_skb_load_bytes() which works
> for linear & non-linear data.
Sure, we could (and possibly should) replace our existing cBPF filters
with eBPF.
I've been considering it, BUT... we shouldn't have to.
[The reason why I've been considering switching all our cBPF to eBPF
is related to the bpf jit hardening thread from a few months back]
Furthermore some devices running this code are quite likely still on
3.18 and 4.4 kernels (I would have to check, but 4.9 is guaranteed)...
which makes any such transition painful.
This is *clearly* a kernel bug/regression vs something like Linux 2.0
(or 1.0), and should thus be fixed in the kernel.
> 4) What about pulling in linear data in AF_PACKET code right before the
> cBPF filter is run (perhaps usage of SKF_NET_OFF could be detected and
> then only done if really needed by the filter)?
I'm not sure how to do this cleanly without a kernel change.
And for a kernel change fixing the bpf helpers simply seems superior.
I guess I could 'hack' this by still doing a relative load, but then
considering -EFAULT to match...
But with the kernel bug present, it's hard to say in how many other
places our cBPF code can run into this same problem (just in other
filters and/or at different offsets).
Btw. we did get a working patch tested yesterday [it basically does an
extra -= skb_headroom(skb) ], I'll try to clean it up (add comments)
and send in a v2.
Powered by blists - more mailing lists