[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANP3RGfRaYwve_xgxH6Tp2zenzKn2-DjZ9tg023WVzfdJF3p_w@mail.gmail.com>
Date: Wed, 4 Jun 2025 23:21:02 +0200
From: Maciej Żenczykowski <maze@...gle.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: davem@...emloft.net, netdev@...r.kernel.org, edumazet@...gle.com,
pabeni@...hat.com, andrew+netdev@...n.ch, horms@...nel.org,
martin.lau@...ux.dev, daniel@...earbox.net, john.fastabend@...il.com,
eddyz87@...il.com, sdf@...ichev.me, haoluo@...gle.com, willemb@...gle.com,
william.xuanziyang@...wei.com, alan.maguire@...cle.com, bpf@...r.kernel.org
Subject: Re: [PATCH net] net: clear the dst when changing skb protocol
On Wed, Jun 4, 2025 at 11:06 PM Jakub Kicinski <kuba@...nel.org> wrote:
>
> A not-so-careful NAT46 BPF program can crash the kernel
> if it indiscriminately flips ingress packets from v4 to v6:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> ip6_rcv_core (net/ipv6/ip6_input.c:190:20)
> ipv6_rcv (net/ipv6/ip6_input.c:306:8)
> process_backlog (net/core/dev.c:6186:4)
> napi_poll (net/core/dev.c:6906:9)
> net_rx_action (net/core/dev.c:7028:13)
> do_softirq (kernel/softirq.c:462:3)
> netif_rx (net/core/dev.c:5326:3)
> dev_loopback_xmit (net/core/dev.c:4015:2)
> ip_mc_finish_output (net/ipv4/ip_output.c:363:8)
> NF_HOOK (./include/linux/netfilter.h:314:9)
> ip_mc_output (net/ipv4/ip_output.c:400:5)
> dst_output (./include/net/dst.h:459:9)
> ip_local_out (net/ipv4/ip_output.c:130:9)
> ip_send_skb (net/ipv4/ip_output.c:1496:8)
> udp_send_skb (net/ipv4/udp.c:1040:8)
> udp_sendmsg (net/ipv4/udp.c:1328:10)
>
> The output interface has a 4->6 program attached at ingress.
> We try to loop the multicast skb back to the sending socket.
> Ingress BPF runs as part of netif_rx(), pushes a valid v6 hdr
> and changes skb->protocol to v6. We enter ip6_rcv_core which
> tries to use skb_dst(). But the dst is still an IPv4 one left
> after IPv4 mcast output.
>
> Clear the dst in all BPF helpers which change the protcol.
> Try to preserve metadata dsts, those won't hurt.
>
> Fixes: d219df60a70e ("bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()")
> Fixes: 1b00e0dfe7d0 ("bpf: update skb->protocol in bpf_skb_net_grow")
> Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper")
> Signed-off-by: Jakub Kicinski <kuba@...nel.org>
Reviewed-by: Maciej Żenczykowski <maze@...gle.com>
> ---
> I wonder if we should not skip ingress (tc_skip_classify?)
> for looped back packets in the first place. But that doesn't
> seem robust enough vs multiple redirections to solve the crash.
>
> Ignoring LOOPBACK packets (like the NAT46 prog should) doesn't
> work either, since BPF can change pkt_type arbitrarily.
>
> CC: martin.lau@...ux.dev
> CC: daniel@...earbox.net
> CC: john.fastabend@...il.com
> CC: eddyz87@...il.com
> CC: sdf@...ichev.me
> CC: haoluo@...gle.com
> CC: willemb@...gle.com
> CC: william.xuanziyang@...wei.com
> CC: alan.maguire@...cle.com
> CC: bpf@...r.kernel.org
> CC: edumazet@...gle.com
> CC: maze@...gle.com
> ---
> net/core/filter.c | 19 +++++++++++++------
> tools/testing/selftests/net/nat6to4.sh | 15 +++++++++++++++
> 2 files changed, 28 insertions(+), 6 deletions(-)
> create mode 100755 tools/testing/selftests/net/nat6to4.sh
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 327ca73f9cd7..7a72f766aacf 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3233,6 +3233,13 @@ static const struct bpf_func_proto bpf_skb_vlan_pop_proto = {
> .arg1_type = ARG_PTR_TO_CTX,
> };
>
> +static void bpf_skb_change_protocol(struct sk_buff *skb, u16 proto)
> +{
> + skb->protocol = htons(proto);
> + if (skb_valid_dst(skb))
> + skb_dst_drop(skb);
> +}
> +
> static int bpf_skb_generic_push(struct sk_buff *skb, u32 off, u32 len)
> {
> /* Caller already did skb_cow() with len as headroom,
> @@ -3329,7 +3336,7 @@ static int bpf_skb_proto_4_to_6(struct sk_buff *skb)
> }
> }
>
> - skb->protocol = htons(ETH_P_IPV6);
> + bpf_skb_change_protocol(skb, ETH_P_IPV6);
> skb_clear_hash(skb);
>
> return 0;
> @@ -3359,7 +3366,7 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
> }
> }
>
> - skb->protocol = htons(ETH_P_IP);
> + bpf_skb_change_protocol(skb, ETH_P_IP);
> skb_clear_hash(skb);
>
> return 0;
> @@ -3550,10 +3557,10 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
> /* Match skb->protocol to new outer l3 protocol */
> if (skb->protocol == htons(ETH_P_IP) &&
> flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV6)
> - skb->protocol = htons(ETH_P_IPV6);
> + bpf_skb_change_protocol(skb, ETH_P_IPV6);
> else if (skb->protocol == htons(ETH_P_IPV6) &&
> flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV4)
> - skb->protocol = htons(ETH_P_IP);
> + bpf_skb_change_protocol(skb, ETH_P_IP);
I wonder if this shouldn't drop dst even when doing ipv4->ipv4 or
ipv6->ipv6 -- it's encapping, presumably old dst is irrelevant...
> }
>
> if (skb_is_gso(skb)) {
> @@ -3606,10 +3613,10 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
> /* Match skb->protocol to new outer l3 protocol */
> if (skb->protocol == htons(ETH_P_IP) &&
> flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV6)
> - skb->protocol = htons(ETH_P_IPV6);
> + bpf_skb_change_protocol(skb, ETH_P_IPV6);
> else if (skb->protocol == htons(ETH_P_IPV6) &&
> flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV4)
> - skb->protocol = htons(ETH_P_IP);
> + bpf_skb_change_protocol(skb, ETH_P_IP);
ditto for decap
>
> if (skb_is_gso(skb)) {
> struct skb_shared_info *shinfo = skb_shinfo(skb);
> diff --git a/tools/testing/selftests/net/nat6to4.sh b/tools/testing/selftests/net/nat6to4.sh
> new file mode 100755
> index 000000000000..0ee859b622a4
> --- /dev/null
> +++ b/tools/testing/selftests/net/nat6to4.sh
> @@ -0,0 +1,15 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +
> +NS="ns-peer-$(mktemp -u XXXXXX)"
> +
> +ip netns add "${NS}"
> +ip -netns "${NS}" link set lo up
> +ip -netns "${NS}" route add default via 127.0.0.2 dev lo
> +
> +tc -n "${NS}" qdisc add dev lo ingress
> +tc -n "${NS}" filter add dev lo ingress prio 4 protocol ip \
> + bpf object-file nat6to4.bpf.o section schedcls/egress4/snat4 direct-action
> +
> +ip netns exec "${NS}" \
> + bash -c 'echo 012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789abc | socat - UDP4-DATAGRAM:224.1.0.1:6666,ip-multicast-loop=1'
> --
> 2.49.0
>
--
Maciej Żenczykowski, Kernel Networking Developer @ Google
Powered by blists - more mailing lists