[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aI0MkNvWlE4FXMV8@gmail.com>
Date: Fri, 1 Aug 2025 20:50:56 +0200
From: Mahe Tardy <mahe.tardy@...il.com>
To: Martin KaFai Lau <martin.lau@...ux.dev>
Cc: alexei.starovoitov@...il.com, andrii@...nel.org, ast@...nel.org,
bpf@...r.kernel.org, coreteam@...filter.org, daniel@...earbox.net,
fw@...len.de, john.fastabend@...il.com, netdev@...r.kernel.org,
netfilter-devel@...r.kernel.org, oe-kbuild-all@...ts.linux.dev,
pablo@...filter.org, lkp@...el.com
Subject: Re: [PATCH bpf-next v3 0/4] bpf: add icmp_send_unreach kfunc
On Tue, Jul 29, 2025 at 06:54:58PM -0700, Martin KaFai Lau wrote:
> On 7/29/25 2:53 AM, Mahe Tardy wrote:
> > > Which other program types do you need this kfunc to send icmp and the future
> > > tcp rst?
> >
> > I don't really know, I mostly need this in cgroup_skb for my use case
> > but I could see other programs type using this either for simplification
> > (for progs that can already rewrite the packet, like tc) or other
> > programs types like cgroup_skb, because they can't touch the packet
> > themselves.
>
> I also don't think the tc needs this kfunc either. The tc should already
> have ways to do this now.
>
> >
> > >
> > > This cover letter mentioned sending icmp unreach is easier than sending tcp
> > > rst. What problems do you see in sending tcp rst?
> > >
> >
> > Yes, I based these patches on what net/ipv4/netfilter/ipt_REJECT.c's
> > 'reject_tg' function does. In the case of sending ICMP unreach
> > 'nf_send_unreach', the routing step is quite straighforward as they are
> > only inverting the daddr and the saddr (that's what my renamed/moved
> > ip_route_reply_fetch_dst helper does).
> >
> > In the case of sending RST 'nf_send_reset', there are extra steps, first
> > the same routing mechanism is done by just inverting the daddr and the
> > saddr but later 'ip_route_me_harder' is called which is doing a lot
> > more. I'm currently not sure which parts of this must be ported to work
> > in our BPF use case so I wanted to start with unreach.
>
> I don't think we necessarily need to completely borrow from nf, the hooks'
> locations are different and the use case may be different.
>
> A concern that I have is the icmp6_send called by the kfunc. The icmp6_send
> should eventually call to ip6_finish_output which may call the very same
> "cgroup/egress" program again in a recursive way. The same for v4 icmp_send.
>
> The icmp packet is sent from an internal kernel sk. I suspect you will see
> this recursive behavior if the test is done in the default cgroup
> (/sys/fs/cgroup). I think the is_ineligible(skb) should have stopped the
> second icmpv6_send from replying to an icmp error and the cgroup hook cannot
> change the skb. However, I am not sure I want to cross this bridge. Is there
> a way to avoid the recursive bpf prog?
>
Thanks Martin for the review. Indeed the recursive BPF prog call is a
concerning issue. I'll take some time to think about it and hopefully
propose something.
Powered by blists - more mailing lists