[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANP3RGeK_NE+U9R59QynCr94B7543VLJnF_Sp3eecKCMCC3XRw@mail.gmail.com>
Date: Fri, 21 Nov 2025 11:05:39 +0900
From: Maciej Żenczykowski <zenczykowski@...il.com>
To: Maciej Żenczykowski <zenczykowski@...il.com>
Cc: Linux Network Development Mailing List <netdev@...r.kernel.org>, "David S . Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Lorenzo Colitti <lorenzo@...gle.com>, Neal Cardwell <ncardwell@...gle.com>, bpf@...r.kernel.org
Subject: Re: [PATCH net] net: fix propagation of EPERM from tcp_connect()
On Fri, Nov 21, 2025 at 10:59 AM Maciej Żenczykowski <maze@...gle.com> wrote:
> bpf CGROUP_INET_EGRESS hook can fail packet transmit resulting
> in -EPERM, however as this is not -ECONNREFUSED it results in tcp
> simply treating it as a lost packet resulting in a need to wait
> for retransmits and timeout before an error is signaled back
> to userspace.
>
> Android implements a lot of security/power savings policy
> in this hook, so these failures are common and more or less
> permanent (at least until something significant happens).
>
> We cannot currently call bpf_set_retval() from that hook point
> and while this could be trivially fixed with a one line deletion,
> it's not clear if that's truly a good idea (would we want to
> be able to set arbitrary error values??).
>
> If the hook *truly* wants to drop the packet without signaling
> an error, it should IMHO return '2' for congestion caused drop
> instead of '0' for drop.
>
> Another possibility would be to teach the hook to treat (a new)
> return value of '4' as meaning 'drop and return ECONNREFUSED',
> but this seems easier... furthermore EPERM seems like a better
> return to userspace for 'policy denied your transmit', while
> ECONNREFUSED seems to suggest the remote server refused it.
>
> Cc: Lorenzo Colitti <lorenzo@...gle.com>
> Cc: Neal Cardwell <ncardwell@...gle.com>
> Cc: Eric Dumazet <edumazet@...gle.com>
> Cc: bpf@...r.kernel.org
> Signed-off-by: Maciej Żenczykowski <maze@...gle.com>
> ---
> net/ipv4/tcp_output.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 479afb714bdf..3ab21249e196 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -4336,7 +4336,7 @@ int tcp_connect(struct sock *sk)
> /* Send off SYN; include data in Fast Open. */
> err = tp->fastopen_req ? tcp_send_syn_data(sk, buff) :
> tcp_transmit_skb(sk, buff, 1, sk->sk_allocation);
> - if (err == -ECONNREFUSED)
> + if (err == -ECONNREFUSED || err == -EPERM)
> return err;
>
> /* We change tp->snd_nxt after the tcp_transmit_skb() call
> --
> 2.52.0.rc2.455.g230fcf2819-goog
Perhaps I should have sent this as an RFC, as I haven't had the
opportunity to test this fix in the full blown environment where we're
running into problems.
I'm hoping to at least get feedback on whether this is acceptable
and/or even the right approach -- or if someone has a better idea or
if there's some fundamental reason we cannot return EPERM here.
Powered by blists - more mailing lists