[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b3f25e61-7b0c-4576-baae-9b498c3b8748@linux.dev>
Date: Mon, 28 Jul 2025 18:05:26 -0700
From: Martin KaFai Lau <martin.lau@...ux.dev>
To: Mahe Tardy <mahe.tardy@...il.com>
Cc: alexei.starovoitov@...il.com, andrii@...nel.org, ast@...nel.org,
bpf@...r.kernel.org, coreteam@...filter.org, daniel@...earbox.net,
fw@...len.de, john.fastabend@...il.com, netdev@...r.kernel.org,
netfilter-devel@...r.kernel.org, oe-kbuild-all@...ts.linux.dev,
pablo@...filter.org, lkp@...el.com
Subject: Re: [PATCH bpf-next v3 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb
kfunc
On 7/28/25 2:43 AM, Mahe Tardy wrote:
> This is needed in the context of Tetragon to provide improved feedback
> (in contrast to just dropping packets) to east-west traffic when blocked
> by policies using cgroup_skb programs.
>
> This reuse concepts from netfilter reject target codepath with the
> differences that:
> * Packets are cloned since the BPF user can still return SK_PASS from
> the cgroup_skb progs and the current skb need to stay untouched
This needs more details. Which field(s) of the skb are changed by the kfunc, the
skb_dst_set in ip[6]_route_reply_fetch_dst() and/or the code path in the
icmp[v6]_send() ?
> (cgroup_skb hooks only allow read-only skb payload).
> * Since cgroup_skb programs are called late in the stack, checksums do
> not need to be computed or verified, and IPv4 fragmentation does not
> need to be checked (ip_local_deliver should take care of that
> earlier).
>
> Signed-off-by: Mahe Tardy <mahe.tardy@...il.com>
> ---
> net/core/filter.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 61 insertions(+)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 7a72f766aacf..050872324575 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -85,6 +85,10 @@
> #include <linux/un.h>
> #include <net/xdp_sock_drv.h>
> #include <net/inet_dscp.h>
> +#include <linux/icmp.h>
> +#include <net/icmp.h>
> +#include <net/route.h>
> +#include <net/ip6_route.h>
>
> #include "dev.h"
>
> @@ -12148,6 +12152,53 @@ __bpf_kfunc int bpf_sock_ops_enable_tx_tstamp(struct bpf_sock_ops_kern *skops,
> return 0;
> }
>
> +__bpf_kfunc int bpf_icmp_send_unreach(struct __sk_buff *__skb, int code)
> +{
> + struct sk_buff *skb = (struct sk_buff *)__skb;
> + struct sk_buff *nskb;
> +
> + switch (skb->protocol) {
> + case htons(ETH_P_IP):
> + if (code < 0 || code > NR_ICMP_UNREACH)
> + return -EINVAL;
> +
> + nskb = skb_clone(skb, GFP_ATOMIC);
> + if (!nskb)
> + return -ENOMEM;
> +
> + if (ip_route_reply_fetch_dst(nskb) < 0) {
> + kfree_skb(nskb);
> + return -EHOSTUNREACH;
> + }
> +
> + icmp_send(nskb, ICMP_DEST_UNREACH, code, 0);
> + kfree_skb(nskb);
> + break;
> +#if IS_ENABLED(CONFIG_IPV6)
> + case htons(ETH_P_IPV6):
> + if (code < 0 || code > ICMPV6_REJECT_ROUTE)
> + return -EINVAL;
> +
> + nskb = skb_clone(skb, GFP_ATOMIC);
> + if (!nskb)
> + return -ENOMEM;
> +
> + if (ip6_route_reply_fetch_dst(nskb) < 0) {
From a very quick look at icmpv6_send(), it does its own route lookup. I
haven't looked at the v4 yet.
I am likely missing some details. Can you explain why it needs to do a lookup
before calling icmpv6_send()?
> + kfree_skb(nskb);
> + return -EHOSTUNREACH;
> + }
> +
> + icmpv6_send(nskb, ICMPV6_DEST_UNREACH, code, 0);
> + kfree_skb(nskb);
> + break;
> +#endif
> + default:
> + return -EPROTONOSUPPORT;
> + }
> +
> + return SK_DROP;
> +}
> +
Powered by blists - more mailing lists