[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7083544f-5b0c-432e-bec8-509ca733f316@linux.dev>
Date: Tue, 29 Jul 2025 18:54:58 -0700
From: Martin KaFai Lau <martin.lau@...ux.dev>
To: Mahe Tardy <mahe.tardy@...il.com>
Cc: alexei.starovoitov@...il.com, andrii@...nel.org, ast@...nel.org,
bpf@...r.kernel.org, coreteam@...filter.org, daniel@...earbox.net,
fw@...len.de, john.fastabend@...il.com, netdev@...r.kernel.org,
netfilter-devel@...r.kernel.org, oe-kbuild-all@...ts.linux.dev,
pablo@...filter.org, lkp@...el.com
Subject: Re: [PATCH bpf-next v3 0/4] bpf: add icmp_send_unreach kfunc
On 7/29/25 2:53 AM, Mahe Tardy wrote:
>> Which other program types do you need this kfunc to send icmp and the future
>> tcp rst?
>
> I don't really know, I mostly need this in cgroup_skb for my use case
> but I could see other programs type using this either for simplification
> (for progs that can already rewrite the packet, like tc) or other
> programs types like cgroup_skb, because they can't touch the packet
> themselves.
I also don't think the tc needs this kfunc either. The tc should already have
ways to do this now.
>
>>
>> This cover letter mentioned sending icmp unreach is easier than sending tcp
>> rst. What problems do you see in sending tcp rst?
>>
>
> Yes, I based these patches on what net/ipv4/netfilter/ipt_REJECT.c's
> 'reject_tg' function does. In the case of sending ICMP unreach
> 'nf_send_unreach', the routing step is quite straighforward as they are
> only inverting the daddr and the saddr (that's what my renamed/moved
> ip_route_reply_fetch_dst helper does).
>
> In the case of sending RST 'nf_send_reset', there are extra steps, first
> the same routing mechanism is done by just inverting the daddr and the
> saddr but later 'ip_route_me_harder' is called which is doing a lot
> more. I'm currently not sure which parts of this must be ported to work
> in our BPF use case so I wanted to start with unreach.
I don't think we necessarily need to completely borrow from nf, the hooks'
locations are different and the use case may be different.
A concern that I have is the icmp6_send called by the kfunc. The icmp6_send
should eventually call to ip6_finish_output which may call the very same
"cgroup/egress" program again in a recursive way. The same for v4 icmp_send.
The icmp packet is sent from an internal kernel sk. I suspect you will see this
recursive behavior if the test is done in the default cgroup (/sys/fs/cgroup). I
think the is_ineligible(skb) should have stopped the second icmpv6_send from
replying to an icmp error and the cgroup hook cannot change the skb. However, I
am not sure I want to cross this bridge. Is there a way to avoid the recursive
bpf prog?
Powered by blists - more mailing lists