netdev - Re: [PATCH net-next] udp: no longer touch sk->sk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iJ+1Y9a9DmR54QUO4S1NRX_yMQaJwsVqU0dr_0c5J4_ZQ@mail.gmail.com>
Date: Fri, 8 Mar 2024 10:21:33 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, netdev@...r.kernel.org, 
	eric.dumazet@...il.com, Martin KaFai Lau <kafai@...com>, Joe Stringer <joe@...d.net.nz>, 
	Alexei Starovoitov <ast@...nel.org>, Willem de Bruijn <willemdebruijn.kernel@...il.com>, 
	Kuniyuki Iwashima <kuniyu@...zon.com>, Florian Westphal <fw@...len.de>
Subject: Re: [PATCH net-next] udp: no longer touch sk->sk_refcnt in early demux

On Fri, Mar 8, 2024 at 9:37 AM Paolo Abeni <pabeni@...hat.com> wrote:
>
> On Thu, 2024-03-07 at 22:00 +0000, Eric Dumazet wrote:
> > After commits ca065d0cf80f ("udp: no longer use SLAB_DESTROY_BY_RCU")
> > and 7ae215d23c12 ("bpf: Don't refcount LISTEN sockets in sk_assign()")
> > UDP early demux no longer need to grab a refcount on the UDP socket.
> >
> > This save two atomic operations per incoming packet for connected
> > sockets.
>
> This reminds me of a old series:
>
> https://lore.kernel.org/netdev/cover.1506114055.git.pabeni@redhat.com/
>
> and I'm wondering if we could reconsider such option.
>
> > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> > Cc: Martin KaFai Lau <kafai@...com>
> > Cc: Joe Stringer <joe@...d.net.nz>
> > Cc: Alexei Starovoitov <ast@...nel.org>
> > Cc: Willem de Bruijn <willemdebruijn.kernel@...il.com>
> > Cc: Kuniyuki Iwashima <kuniyu@...zon.com>
> > ---
> >  net/ipv4/udp.c | 5 +++--
> >  net/ipv6/udp.c | 5 +++--
> >  2 files changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > index a8acea17b4e5344d022ae8f8eb674d1a36f8035a..e43ad1d846bdc2ddf5767606b78bbd055f692aa8 100644
> > --- a/net/ipv4/udp.c
> > +++ b/net/ipv4/udp.c
> > @@ -2570,11 +2570,12 @@ int udp_v4_early_demux(struct sk_buff *skb)
> >                                            uh->source, iph->saddr, dif, sdif);
> >       }
> >
> > -     if (!sk || !refcount_inc_not_zero(&sk->sk_refcnt))
> > +     if (!sk)
> >               return 0;
> >
> >       skb->sk = sk;
> > -     skb->destructor = sock_efree;
> > +     DEBUG_NET_WARN_ON_ONCE(sk_is_refcounted(sk));
> > +     skb->destructor = sock_pfree;
>
> I *think* that the skb may escape the current rcu section if e.g. if
> matches a nf dup target in the input tables.

You mean the netfilter queueing stuff perhaps ?

This is already safe, it uses a refcount_inc_not_zero(&sk->sk_refcnt):

if (skb_sk_is_prefetched(skb)) {
    struct sock *sk = skb->sk;

    if (!sk_is_refcounted(sk)) {
             if (!refcount_inc_not_zero(&sk->sk_refcnt))
                   return -ENOTCONN;

        /* drop refcount on skb_orphan */
        skb->destructor = sock_edemux;
    }
}

I would think a duplicate can not duplicate skb->sk in general, or must also
attempt an refcount_inc_not_zero(&sk->sk_refcnt) and use a related destructor.

>
> Back then I tried to implement some debug infra to track such accesses:
>
> https://lore.kernel.org/lkml/cover.1507294365.git.pabeni@redhat.com/
>
> which was buggy (prone to false negative). I think it can be improved
> to something more reliable, perhaps I should revamp it?
>
> I'm also wondering if the DEBUG_NET_WARN_ON_ONCE is worthy?!? the sk is
> an hashed UDP socket so is a full sock and has the bit SOCK_RCU_FREE
> set.

This was mostly to catch any future issues and related to my use of sock_pfree()

DEBUG_NET_WARN_ON_ONCE() is a nop, unless you compile a DEV kernel.

>
> Perhaps we could use a simple 'noop' destructor as in:
>
> https://lore.kernel.org/netdev/b16163e3a4fa4d772edeabd8743acb4a07206bb9.1506114055.git.pabeni@redhat.com/
>

I think we need sock_pfree() for inet_steal_sock(), I might be wrong.