[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <666b43a9b2b4a_1995c208f@john.notmuch>
Date: Thu, 13 Jun 2024 12:08:25 -0700
From: John Fastabend <john.fastabend@...il.com>
To: Cong Wang <xiyou.wangcong@...il.com>,
Vincent Whitchurch <vincent.whitchurch@...adoghq.com>
Cc: Jason Xing <kerneljasonxing@...il.com>,
John Fastabend <john.fastabend@...il.com>,
Jakub Sitnicki <jakub@...udflare.com>,
Jason Xing <kernelxing@...cent.com>,
netdev@...r.kernel.org,
bpf@...r.kernel.org
Subject: Re: Recursive locking in sockmap
Cong Wang wrote:
> On Fri, Jun 07, 2024 at 02:09:59PM +0200, Vincent Whitchurch wrote:
> > On Thu, Jun 6, 2024 at 2:47 PM Jason Xing <kerneljasonxing@...il.com> wrote:
> > > On Thu, Jun 6, 2024 at 6:00 PM Vincent Whitchurch
> > > <vincent.whitchurch@...adoghq.com> wrote:
> > > > With a socket in the sockmap, if there's a parser callback installed
> > > > and the verdict callback returns SK_PASS, the kernel deadlocks
> > > > immediately after the verdict callback is run. This started at commit
> > > > 6648e613226e18897231ab5e42ffc29e63fa3365 ("bpf, skmsg: Fix NULL
> > > > pointer dereference in sk_psock_skb_ingress_enqueue").
> > > >
> > > > It can be reproduced by running ./test_sockmap -t ping
> > > > --txmsg_pass_skb. The --txmsg_pass_skb command to test_sockmap is
> > > > available in this series:
> > > > https://lore.kernel.org/netdev/20240606-sockmap-splice-v1-0-4820a2ab14b5@datadoghq.com/.
> > >
> > > I don't have time right now to look into this issue carefully until
> > > this weekend. BTW, did you mean the patch [2/5] in the link that can
> > > solve the problem?
> >
> > No. That patch set addresses a different problem which occurs even if
> > only a verdict callback is used. But patch 4/5 in that patch set adds
> > the --txmsg_pass_skb option to the test_sockmap test program, and that
> > option can be used to reproduce this deadlock too.
>
> I think we can remove that write_lock_bh(&sk->sk_callback_lock). Can you
> test the following patch?
>
> ------------>
>
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index fd20aae30be2..da64ded97f3a 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -1116,9 +1116,7 @@ static void sk_psock_strp_data_ready(struct sock *sk)
> if (tls_sw_has_ctx_rx(sk)) {
> psock->saved_data_ready(sk);
> } else {
> - write_lock_bh(&sk->sk_callback_lock);
> strp_data_ready(&psock->strp);
> - write_unlock_bh(&sk->sk_callback_lock);
> }
> }
> rcu_read_unlock();
Its not obvious to me that we can run the strp parser without the
sk_callback lock here. I believe below is the correct fix. It
fixes the splat above with test.
bpf: sockmap, fix introduced strparser recursive lock
Originally there was a race where removing a psock from the sock map while
it was also receiving an skb and calling sk_psock_data_ready(). It was
possible the removal code would NULL/set the data_ready callback while
concurrently calling the hook from receive path. The fix was to wrap the
access in sk_callback_lock to ensure the saved_data_ready pointer didn't
change under us. There was some discussion around doing a larger change
to ensure we could use READ_ONCE/WRITE_ONCE over the callback, but that
was for *next kernels not stable fixes.
But, we unfortunately introduced a regression with the fix because there
is another path into this code (that didn't have a test case) through
the stream parser. The stream parser runs with the lower lock which means
we get the following splat and lock up.
============================================
WARNING: possible recursive locking detected
6.10.0-rc2 #59 Not tainted
--------------------------------------------
test_sockmap/342 is trying to acquire lock:
ffff888007a87228 (clock-AF_INET){++--}-{2:2}, at:
sk_psock_skb_ingress_enqueue (./include/linux/skmsg.h:467
net/core/skmsg.c:555)
but task is already holding lock:
ffff888007a87228 (clock-AF_INET){++--}-{2:2}, at:
sk_psock_strp_data_ready (net/core/skmsg.c:1120)
To fix ensure we do not grap lock when we reach this code through the
strparser.
Fixes: 6648e613226e1 ("bpf, skmsg: Fix NULL pointer dereference in sk_psock_skb_ingress_enqueue")
Signed-off-by: John Fastabend <john.fastabend@...il.com>
---
include/linux/skmsg.h | 9 +++++++--
net/core/skmsg.c | 5 ++++-
2 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index c9efda9df285..3659e9b514d0 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -461,13 +461,18 @@ static inline void sk_psock_put(struct sock *sk, struct sk_psock *psock)
sk_psock_drop(sk, psock);
}
-static inline void sk_psock_data_ready(struct sock *sk, struct sk_psock *psock)
+static inline void __sk_psock_data_ready(struct sock *sk, struct sk_psock *psock)
{
- read_lock_bh(&sk->sk_callback_lock);
if (psock->saved_data_ready)
psock->saved_data_ready(sk);
else
sk->sk_data_ready(sk);
+}
+
+static inline void sk_psock_data_ready(struct sock *sk, struct sk_psock *psock)
+{
+ read_lock_bh(&sk->sk_callback_lock);
+ __sk_psock_data_ready(sk, psock);
read_unlock_bh(&sk->sk_callback_lock);
}
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index fd20aae30be2..8429daecbbb6 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -552,7 +552,10 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb,
msg->skb = skb;
sk_psock_queue_msg(psock, msg);
- sk_psock_data_ready(sk, psock);
+ if (skb_bpf_strparser(skb))
+ __sk_psock_data_ready(sk, psock);
+ else
+ sk_psock_data_ready(sk, psock);
return copied;
}
Powered by blists - more mailing lists