[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z5KolQGqCV3PBjWS@mini-arch>
Date: Thu, 23 Jan 2025 12:37:41 -0800
From: Stanislav Fomichev <stfomichev@...il.com>
To: Maciej Żenczykowski <zenczykowski@...il.com>
Cc: Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Linux Network Development Mailing List <netdev@...r.kernel.org>,
"David S . Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
BPF Mailing List <bpf@...r.kernel.org>,
Stanislav Fomichev <sdf@...ichev.me>,
Willem de Bruijn <willemb@...gle.com>,
Matt Moeller <moeller.matt@...il.com>
Subject: Re: [PATCH bpf] bpf: fix classic bpf reads from negative offset
outside of linear skb portion
On 01/22, Maciej Żenczykowski wrote:
> On Wed, Jan 22, 2025 at 12:28 PM Maciej Żenczykowski
> <zenczykowski@...il.com> wrote:
> >
> > On Wed, Jan 22, 2025 at 12:16 PM Maciej Żenczykowski
> > <zenczykowski@...il.com> wrote:
> > >
> > > On Wed, Jan 22, 2025 at 12:04 PM Maciej Żenczykowski <maze@...gle.com> wrote:
> > > >
> > > > We're received reports of cBPF code failing to accept DHCP packets.
> > > > "BPF filter for DHCP not working (android14-6.1-lts + android-14.0.0_r74)"
> > > >
> > > > The relevant Android code is at:
> > > > https://cs.android.com/android/platform/superproject/main/+/main:packages/modules/NetworkStack/jni/network_stack_utils_jni.cpp;l=95;drc=9df50aef1fd163215dcba759045706253a5624f5
> > > > which uses a lot of macros from:
> > > > https://cs.android.com/android/platform/superproject/main/+/main:packages/modules/Connectivity/bpf/headers/include/bpf/BpfClassic.h;drc=c58cfb7c7da257010346bd2d6dcca1c0acdc8321
> > > >
> > > > This is widely used and does work on the vast majority of drivers,
> > > > but is exposing a core kernel cBPF bug related to driver skb layout.
> > > >
> > > > Root cause is iwlwifi driver, specifically on (at least):
> > > > Dell 7212: Intel Dual Band Wireless AC 8265
> > > > Dell 7220: Intel Wireless AC 9560
> > > > Dell 7230: Intel Wi-Fi 6E AX211
> > > > delivers frames where the UDP destination port is not in the skb linear
> > > > portion, while the cBPF code is using SKF_NET_OFF relative addressing.
> > > >
> > > > simplified from above, effectively:
> > > > BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, SKF_NET_OFF)
> > > > BPF_STMT(BPF_LD | BPF_H | BPF_IND, SKF_NET_OFF + 2)
> > > > BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 68, 1, 0)
> > > > BPF_STMT(BPF_RET | BPF_K, 0)
> > > > BPF_STMT(BPF_RET | BPF_K, 0xFFFFFFFF)
> > > > fails to match udp dport=68 packets.
> > > >
> > > > Specifically the 3rd cBPF instruction fails to match the condition:
> > >
> > > 2nd of course
> > >
> > > > if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb))
> > > > within bpf_internal_load_pointer_neg_helper() and thus returns NULL,
> > > > which results in reading -EFAULT.
> > > >
> > > > This is because bpf_skb_load_helper_{8,16,32} don't include the
> > > > "data past headlen do skb_copy_bits()" logic from the non-negative
> > > > offset branch in the negative offset branch.
> > > >
> > > > Note: I don't know sparc assembly, so this doesn't fix sparc...
> > > > ideally we should just delete bpf_internal_load_pointer_neg_helper()
> > > > This seems to have always been broken (but not pre-git era, since
> > > > obviously there was no eBPF helpers back then), but stuff older
> > > > than 5.4 is no longer LTS supported anyway, so using 5.4 as fixes tag.
> > > >
> > > > Cc: Alexei Starovoitov <ast@...nel.org>
> > > > Cc: Daniel Borkmann <daniel@...earbox.net>
> > > > Cc: Stanislav Fomichev <sdf@...ichev.me>
> > > > Cc: Willem de Bruijn <willemb@...gle.com>
> > > > Reported-by: Matt Moeller <moeller.matt@...il.com>
> > > > Closes: https://issuetracker.google.com/384636719 [Treble - GKI partner internal]
> > > > Signed-off-by: Maciej Żenczykowski <maze@...gle.com>
> > > > Fixes: 219d54332a09 ("Linux 5.4")
> > > > ---
> > > > include/linux/filter.h | 2 ++
> > > > kernel/bpf/core.c | 14 +++++++++
> > > > net/core/filter.c | 69 +++++++++++++++++-------------------------
> > > > 3 files changed, 43 insertions(+), 42 deletions(-)
> > > >
> > > > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > > > index a3ea46281595..c24d8e338ce4 100644
> > > > --- a/include/linux/filter.h
> > > > +++ b/include/linux/filter.h
> > > > @@ -1479,6 +1479,8 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest)
> > > > void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb,
> > > > int k, unsigned int size);
> > > >
> > > > +int bpf_internal_neg_helper(const struct sk_buff *skb, int k);
> > > > +
> > > > static inline int bpf_tell_extensions(void)
> > > > {
> > > > return SKF_AD_MAX;
> > > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > > > index da729cbbaeb9..994988dabb97 100644
> > > > --- a/kernel/bpf/core.c
> > > > +++ b/kernel/bpf/core.c
> > > > @@ -89,6 +89,20 @@ void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, uns
> > > > return NULL;
> > > > }
> > > >
> > > > +int bpf_internal_neg_helper(const struct sk_buff *skb, int k)
> > > > +{
> > > > + if (k >= 0)
> > > > + return k;
> > > > + if (k >= SKF_NET_OFF)
> > > > + return skb->network_header + k - SKF_NET_OFF;
> > > > + if (k >= SKF_LL_OFF) {
> > > > + if (unlikely(!skb_mac_header_was_set(skb)))
> > > > + return -1;
> > > > + return skb->mac_header + k - SKF_LL_OFF;
> > > > + }
> > > > + return -1;
> > > > +}
> > > > +
> > > > /* tell bpf programs that include vmlinux.h kernel's PAGE_SIZE */
> > > > enum page_size_enum {
> > > > __PAGE_SIZE = PAGE_SIZE
> > > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > > index e56a0be31678..609ef7df71ce 100644
> > > > --- a/net/core/filter.c
> > > > +++ b/net/core/filter.c
> > > > @@ -221,21 +221,16 @@ BPF_CALL_3(bpf_skb_get_nlattr_nest, struct sk_buff *, skb, u32, a, u32, x)
> > > > BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *,
> > > > data, int, headlen, int, offset)
> > > > {
> > > > - u8 tmp, *ptr;
> > > > - const int len = sizeof(tmp);
> > > > -
> > > > - if (offset >= 0) {
> > > > - if (headlen - offset >= len)
> > > > - return *(u8 *)(data + offset);
> > > > - if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
> > > > - return tmp;
> > > > - } else {
> > > > - ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len);
> > > > - if (likely(ptr))
> > > > - return *(u8 *)ptr;
> > > > - }
> > > > + u8 tmp;
> > > >
> > > > - return -EFAULT;
> > > > + offset = bpf_internal_neg_helper(skb, offset);
> > > > + if (unlikely(offset < 0))
> > > > + return -EFAULT;
> > > > + if (headlen - offset >= sizeof(u8))
> > > > + return *(u8 *)(data + offset);
> > > > + if (skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
> > > > + return -EFAULT;
> > > > + return tmp;
> > > > }
> > > >
> > > > BPF_CALL_2(bpf_skb_load_helper_8_no_cache, const struct sk_buff *, skb,
> > > > @@ -248,21 +243,16 @@ BPF_CALL_2(bpf_skb_load_helper_8_no_cache, const struct sk_buff *, skb,
> > > > BPF_CALL_4(bpf_skb_load_helper_16, const struct sk_buff *, skb, const void *,
> > > > data, int, headlen, int, offset)
> > > > {
> > > > - __be16 tmp, *ptr;
> > > > - const int len = sizeof(tmp);
> > > > + __be16 tmp;
> > > >
> > > > - if (offset >= 0) {
> > > > - if (headlen - offset >= len)
> > > > - return get_unaligned_be16(data + offset);
> > > > - if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
> > > > - return be16_to_cpu(tmp);
> > > > - } else {
> > > > - ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len);
> > > > - if (likely(ptr))
> > > > - return get_unaligned_be16(ptr);
> > > > - }
> > > > -
> > > > - return -EFAULT;
> > > > + offset = bpf_internal_neg_helper(skb, offset);
> > > > + if (unlikely(offset < 0))
> > > > + return -EFAULT;
> > > > + if (headlen - offset >= sizeof(__be16))
> > > > + return get_unaligned_be16(data + offset);
> > > > + if (skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
> > > > + return -EFAULT;
> > > > + return be16_to_cpu(tmp);
> > > > }
> > > >
> > > > BPF_CALL_2(bpf_skb_load_helper_16_no_cache, const struct sk_buff *, skb,
> > > > @@ -275,21 +265,16 @@ BPF_CALL_2(bpf_skb_load_helper_16_no_cache, const struct sk_buff *, skb,
> > > > BPF_CALL_4(bpf_skb_load_helper_32, const struct sk_buff *, skb, const void *,
> > > > data, int, headlen, int, offset)
> > > > {
> > > > - __be32 tmp, *ptr;
> > > > - const int len = sizeof(tmp);
> > > > -
> > > > - if (likely(offset >= 0)) {
> > > > - if (headlen - offset >= len)
> > > > - return get_unaligned_be32(data + offset);
> > > > - if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
> > > > - return be32_to_cpu(tmp);
> > > > - } else {
> > > > - ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len);
> > > > - if (likely(ptr))
> > > > - return get_unaligned_be32(ptr);
> > > > - }
> > > > + __be32 tmp;
> > > >
> > > > - return -EFAULT;
> > > > + offset = bpf_internal_neg_helper(skb, offset);
> > > > + if (unlikely(offset < 0))
> > > > + return -EFAULT;
> > > > + if (headlen - offset >= sizeof(__be32))
> > > > + return get_unaligned_be32(data + offset);
> > > > + if (skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
> > > > + return -EFAULT;
> > > > + return be32_to_cpu(tmp);
> > > > }
> > > >
> > > > BPF_CALL_2(bpf_skb_load_helper_32_no_cache, const struct sk_buff *, skb,
> > > > --
> > > > 2.48.1.262.g85cc9f2d1e-goog
> > > >
> > >
> > > Note: this is currently only compile and boot tested.
> > > Which doesn't prove all that much ;-)
> >
> > Furthermore even after cherrypicking this (or a similar style fix)
> > into older LTS:
> > - sparc jit is (presumably, maybe only 32-bit) still broken, as the
> > assembly uses the old function
> > - same for mips jit on 5.4/5.10/5.15
> > - same for powerpc jit on 5.4/5.10
> >
> > (but I would guess we don't care)
>
> and Willem points out that:
>
> "skb->network_header is an offset against head, while the
> get_unaligned_be16 and skb_copy_bits take an offset relative to data.
>
> I did check yesterday that skb_copy_bits can take negative offsets, in
> case a protocol header (e.g., SKF_LL_OFF) is smaller than skb->data."
>
> which makes me think this approach is possibly (likely?) incorrect?
> skb offsets always leave me confused...
Same for me WRT skb offsets... However, it should be easy to write a minimal
reproducer to trigger the issues in a local vm: create tap device, attach
egress bpf (classic) classifier, use tap's new zerocopy api (tap_get_user
SOCK_ZEROCOPY) to create fragmented skb. And attach it as a selftest O:-)
Powered by blists - more mailing lists