[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iJF4X+zFFFfhHdDWcYxTO0J_TZ-oN=X_8_FuQqxsCWJCQ@mail.gmail.com>
Date: Mon, 8 Jul 2024 11:38:41 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Kuniyuki Iwashima <kuniyu@...zon.com>
Cc: "David S. Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Willem de Bruijn <willemdebruijn.kernel@...il.com>,
David Ahern <dsahern@...nel.org>, Kuniyuki Iwashima <kuni1840@...il.com>, netdev@...r.kernel.org,
syzkaller <syzkaller@...glegroups.com>
Subject: Re: [PATCH v1 net] udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port().
On Mon, Jul 8, 2024 at 11:20 AM Kuniyuki Iwashima <kuniyu@...zon.com> wrote:
>
> syzkaller triggered the warning [0] in udp_v4_early_demux().
>
> In udp_v4_early_demux(), we do not touch the refcount of the looked-up
> sk and use sock_pfree() as skb->destructor, so we check SOCK_RCU_FREE
> to ensure that the sk is safe to access during the RCU grace period.
>
> Currently, SOCK_RCU_FREE is flagged for a bound socket after being put
> into the hash table. Moreover, the SOCK_RCU_FREE check is done too
> early in udp_v4_early_demux(), so there could be a small race window:
>
> CPU1 CPU2
> ---- ----
> udp_v4_early_demux() udp_lib_get_port()
> | |- hlist_add_head_rcu()
> |- sk = __udp4_lib_demux_lookup() |
> |- DEBUG_NET_WARN_ON_ONCE(sk_is_refcounted(sk));
> `- sock_set_flag(sk, SOCK_RCU_FREE)
>
> In practice, sock_pfree() is called much later, when SOCK_RCU_FREE
> is most likely propagated for other CPUs; otherwise, we will see
> another warning of sk refcount underflow, but at least I didn't.
>
> Technically, moving sock_set_flag(sk, SOCK_RCU_FREE) before
> hlist_add_{head,tail}_rcu() does not guarantee the order, and we
> must put smp_mb() between them, or smp_wmb() there and smp_rmb()
> in udp_v4_early_demux().
>
> But it's overkill in the real scenario, so I just put smp_mb() only under
> CONFIG_DEBUG_NET to silence the splat. When we see the refcount underflow
> warning, we can remove the config guard.
>
> Another option would be to remove DEBUG_NET_WARN_ON_ONCE(), but this could
> make future debugging harder without the hints in udp_v4_early_demux() and
> udp_lib_get_port().
>
> [0]:
>
> Fixes: 08842c43d016 ("udp: no longer touch sk->sk_refcnt in early demux")
> Reported-by: syzkaller <syzkaller@...glegroups.com>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@...zon.com>
> ---
> net/ipv4/udp.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 189c9113fe9a..1a05cc3d2b4f 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -326,6 +326,12 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
> goto fail_unlock;
> }
>
> + sock_set_flag(sk, SOCK_RCU_FREE);
Nice catch.
> +
> + if (IS_ENABLED(CONFIG_DEBUG_NET))
> + /* for DEBUG_NET_WARN_ON_ONCE() in udp_v4_early_demux(). */
> + smp_mb();
> +
I do not think this smp_mb() is needed. If this was, many other RCU
operations would need it,
RCU rules mandate that all memory writes must be committed before the
object can be seen
by other cpus in the hash table.
This includes the setting of the SOCK_RCU_FREE flag.
For instance, hlist_add_head_rcu() does a
rcu_assign_pointer(hlist_first_rcu(h), n);
> sk_add_node_rcu(sk, &hslot->head);
> hslot->count++;
> sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1);
> @@ -342,7 +348,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
> hslot2->count++;
> spin_unlock(&hslot2->lock);
> }
> - sock_set_flag(sk, SOCK_RCU_FREE);
> +
> error = 0;
> fail_unlock:
> spin_unlock_bh(&hslot->lock);
> --
> 2.30.2
>
Powered by blists - more mailing lists