[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aIiaB2QUxKmhvPlx@gmail.com>
Date: Tue, 29 Jul 2025 11:53:11 +0200
From: Mahe Tardy <mahe.tardy@...il.com>
To: Martin KaFai Lau <martin.lau@...ux.dev>
Cc: alexei.starovoitov@...il.com, andrii@...nel.org, ast@...nel.org,
bpf@...r.kernel.org, coreteam@...filter.org, daniel@...earbox.net,
fw@...len.de, john.fastabend@...il.com, netdev@...r.kernel.org,
netfilter-devel@...r.kernel.org, oe-kbuild-all@...ts.linux.dev,
pablo@...filter.org, lkp@...el.com
Subject: Re: [PATCH bpf-next v3 0/4] bpf: add icmp_send_unreach kfunc
On Mon, Jul 28, 2025 at 06:21:50PM -0700, Martin KaFai Lau wrote:
> On 7/28/25 2:43 AM, Mahe Tardy wrote:
> > Hello,
> >
> > This is v3 of adding the icmp_send_unreach kfunc, as suggested during
> > LSF/MM/BPF 2025[^1]. The goal is to allow cgroup_skb programs to
> > actively reject east-west traffic, similarly to what is possible to do
> > with netfilter reject target.
> >
> > The first step to implement this is using ICMP control messages, with
> > the ICMP_DEST_UNREACH type with various code ICMP_NET_UNREACH,
> > ICMP_HOST_UNREACH, ICMP_PROT_UNREACH, etc. This is easier to implement
> > than a TCP RST reply and will already hint the client TCP stack to abort
> > the connection and not retry extensively.
> >
> > Note that this is different than the sock_destroy kfunc, that along
> > calls tcp_abort and thus sends a reset, destroying the underlying
> > socket.
> >
> > Caveats of this kfunc design are that a cgroup_skb program can call this
> > function N times, thus send N ICMP unreach control messages and that the
> > program can return from the BPF filter with SK_PASS leading to a
> > potential confusing situation where the TCP connection was established
> > while the client received ICMP_DEST_UNREACH messages.
> >
> > Another more sophisticated design idea would be for the kfunc to set the
> > kernel to send an ICMP_HOST_UNREACH control message with the appropriate
> > code when the cgroup_skb program terminates with SK_DROP. Creating a new
> > 'SK_REJECT' return code for cgroup_skb program was generally rejected
> > and would be too limited for other program types support.
> >
> > We should bear in mind that we want to add a TCP reset kfunc next and
> > also could extend this kfunc to other program types if wanted.
>
> Some high level questions.
>
> Which other program types do you need this kfunc to send icmp and the future
> tcp rst?
I don't really know, I mostly need this in cgroup_skb for my use case
but I could see other programs type using this either for simplification
(for progs that can already rewrite the packet, like tc) or other
programs types like cgroup_skb, because they can't touch the packet
themselves.
>
> This cover letter mentioned sending icmp unreach is easier than sending tcp
> rst. What problems do you see in sending tcp rst?
>
Yes, I based these patches on what net/ipv4/netfilter/ipt_REJECT.c's
'reject_tg' function does. In the case of sending ICMP unreach
'nf_send_unreach', the routing step is quite straighforward as they are
only inverting the daddr and the saddr (that's what my renamed/moved
ip_route_reply_fetch_dst helper does).
In the case of sending RST 'nf_send_reset', there are extra steps, first
the same routing mechanism is done by just inverting the daddr and the
saddr but later 'ip_route_me_harder' is called which is doing a lot
more. I'm currently not sure which parts of this must be ported to work
in our BPF use case so I wanted to start with unreach.
Powered by blists - more mailing lists