netdev - Re: [PATCH bpf-next v3 0/4] bpf: add icmp_send

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7083544f-5b0c-432e-bec8-509ca733f316@linux.dev>
Date: Tue, 29 Jul 2025 18:54:58 -0700
From: Martin KaFai Lau <martin.lau@...ux.dev>
To: Mahe Tardy <mahe.tardy@...il.com>
Cc: alexei.starovoitov@...il.com, andrii@...nel.org, ast@...nel.org,
 bpf@...r.kernel.org, coreteam@...filter.org, daniel@...earbox.net,
 fw@...len.de, john.fastabend@...il.com, netdev@...r.kernel.org,
 netfilter-devel@...r.kernel.org, oe-kbuild-all@...ts.linux.dev,
 pablo@...filter.org, lkp@...el.com
Subject: Re: [PATCH bpf-next v3 0/4] bpf: add icmp_send_unreach kfunc

On 7/29/25 2:53 AM, Mahe Tardy wrote:
>> Which other program types do you need this kfunc to send icmp and the future
>> tcp rst?
> 
> I don't really know, I mostly need this in cgroup_skb for my use case
> but I could see other programs type using this either for simplification
> (for progs that can already rewrite the packet, like tc) or other
> programs types like cgroup_skb, because they can't touch the packet
> themselves.

I also don't think the tc needs this kfunc either. The tc should already have 
ways to do this now.

> 
>>
>> This cover letter mentioned sending icmp unreach is easier than sending tcp
>> rst. What problems do you see in sending tcp rst?
>>
> 
> Yes, I based these patches on what net/ipv4/netfilter/ipt_REJECT.c's
> 'reject_tg' function does. In the case of sending ICMP unreach
> 'nf_send_unreach', the routing step is quite straighforward as they are
> only inverting the daddr and the saddr (that's what my renamed/moved
> ip_route_reply_fetch_dst helper does).
> 
> In the case of sending RST 'nf_send_reset', there are extra steps, first
> the same routing mechanism is done by just inverting the daddr and the
> saddr but later 'ip_route_me_harder' is called which is doing a lot
> more. I'm currently not sure which parts of this must be ported to work
> in our BPF use case so I wanted to start with unreach.

I don't think we necessarily need to completely borrow from nf, the hooks' 
locations are different and the use case may be different.

A concern that I have is the icmp6_send called by the kfunc. The icmp6_send 
should eventually call to ip6_finish_output which may call the very same 
"cgroup/egress" program again in a recursive way. The same for v4 icmp_send.

The icmp packet is sent from an internal kernel sk. I suspect you will see this 
recursive behavior if the test is done in the default cgroup (/sys/fs/cgroup). I 
think the is_ineligible(skb) should have stopped the second icmpv6_send from 
replying to an icmp error and the cgroup hook cannot change the skb. However, I 
am not sure I want to cross this bridge. Is there a way to avoid the recursive 
bpf prog?