[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <dfed692354c94efab9c83dee2cd5cf16@huawei.com>
Date: Mon, 4 Oct 2021 04:21:30 +0000
From: "liujian (CE)" <liujian56@...wei.com>
To: John Fastabend <john.fastabend@...il.com>,
"daniel@...earbox.net" <daniel@...earbox.net>,
"jakub@...udflare.com" <jakub@...udflare.com>,
"lmb@...udflare.com" <lmb@...udflare.com>,
"edumazet@...gle.com" <edumazet@...gle.com>,
"davem@...emloft.net" <davem@...emloft.net>,
"yoshfuji@...ux-ipv6.org" <yoshfuji@...ux-ipv6.org>,
"dsahern@...nel.org" <dsahern@...nel.org>,
"kuba@...nel.org" <kuba@...nel.org>,
"ast@...nel.org" <ast@...nel.org>,
"andrii@...nel.org" <andrii@...nel.org>,
"kafai@...com" <kafai@...com>,
"songliubraving@...com" <songliubraving@...com>,
"yhs@...com" <yhs@...com>,
"kpsingh@...nel.org" <kpsingh@...nel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>
Subject: RE: [PATCH] tcp_bpf: Fix one concurrency problem in the
tcp_bpf_send_verdict function
> -----Original Message-----
> From: John Fastabend [mailto:john.fastabend@...il.com]
> Sent: Friday, October 1, 2021 6:25 AM
> To: liujian (CE) <liujian56@...wei.com>; john.fastabend@...il.com;
> daniel@...earbox.net; jakub@...udflare.com; lmb@...udflare.com;
> edumazet@...gle.com; davem@...emloft.net; yoshfuji@...ux-ipv6.org;
> dsahern@...nel.org; kuba@...nel.org; ast@...nel.org; andrii@...nel.org;
> kafai@...com; songliubraving@...com; yhs@...com; kpsingh@...nel.org;
> netdev@...r.kernel.org; bpf@...r.kernel.org
> Cc: liujian (CE) <liujian56@...wei.com>
> Subject: RE: [PATCH] tcp_bpf: Fix one concurrency problem in the
> tcp_bpf_send_verdict function
>
> Liu Jian wrote:
> > In the following cases:
> > We need to redirect the first msg to sock1 and the second msg to sock2.
> > The sock lock needs to be released at __SK_REDIRECT and to get another
> > sock lock, this will cause the probability that psock->eval is not set
> > to __SK_NONE when the second msg comes.
> >
> > If psock does not set apple bytes, fix this by do the cleanup before
> > releasing the sock lock. And keep the original logic in other cases.
>
> It took me sometime to figure out the above description. Please include a bit
> more details here this needs to be backported so we want to be very clear
> what the error is and how to trigger it.
>
> In this case we should list the flow to show how the interleaving of msgs
> breaks.
>
> "
> With two Msgs, msgA and msgB and a user doing nonblocking sendmsg calls
> (or multiple cores) on a single socket 'sk' we could get the following flow.
>
> msgA, sk msgB, sk
> ----------- ---------------
> tcp_bpf_sendmsg()
> lock(sk)
> psock = sk->psock
> tcp_bpf_sendmsg()
> lock(sk) ... blocking tcp_bpf_send_verdict if (psock-
> >eval == NONE)
> psock->eval = sk_psock_msg_verdict
> ..
> < handle SK_REDIRECT case >
> release_sock(sk) < lock dropped so grab here >
> ret = tcp_bpf_sendmsg_redir
> psock = sk->psock
> tcp_bpf_send_verdict
> lock_sock(sk) ... blocking on B
> if (psock->eval == NONE) <- boom.
> psock->eval will have msgA state
>
> The problem here is we dropped the lock on msgA and grabbed it with msgB.
> Now we have old state in psock and importantly psock->eval has not been
> cleared. So msgB will run whatever action was done on A and the verdict
> program may never see it.
> "
>
> Showing the flow makes it painfully obvious why dropping that lock with old
> state is broken.
>
Thanks a lot for such a detailed example.
>
> >
> > Signed-off-by: Liu Jian <liujian56@...wei.com>
>
> We need a fixes tag as well so we can backport this.
I will add it.
>
> > ---
> > net/ipv4/tcp_bpf.c | 12 ++++++++++++
> > 1 file changed, 12 insertions(+)
> >
> > diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index
> > d3e9386b493e..02442e43ac4d 100644
> > --- a/net/ipv4/tcp_bpf.c
> > +++ b/net/ipv4/tcp_bpf.c
> > @@ -232,6 +232,7 @@ static int tcp_bpf_send_verdict(struct sock *sk,
> struct sk_psock *psock,
> > bool cork = false, enospc = sk_msg_full(msg);
> > struct sock *sk_redir;
> > u32 tosend, delta = 0;
> > + u32 eval = __SK_NONE;
> > int ret;
> >
> > more_data:
> > @@ -274,6 +275,12 @@ static int tcp_bpf_send_verdict(struct sock *sk,
> struct sk_psock *psock,
> > break;
> > case __SK_REDIRECT:
> > sk_redir = psock->sk_redir;
> > + if (!psock->apply_bytes) {
> > + /* Clean up before releasing the sock lock. */
> > + eval = psock->eval;
> > + psock->eval = __SK_NONE;
> > + psock->sk_redir = NULL;
> > + }
>
> We need to move above chunk below sk_msg_apply_bytes() so we account
> for the bytes and if we zero apply bytes with this send we clear the psock
> state. Otherwise we could have the same issue with stale state on the
> boundary where apply bytes is met.
>
> > sk_msg_apply_bytes(psock, tosend);
>
> <-- put above chunk here.
yes, here looks better.
>
> > if (psock->cork) {
> > cork = true;
>
> Interestingly, I caught the race with cork state, but missed it with the eval
> case. Likely because our program redirected to a single sk.
>
Yes.
> > @@ -281,7 +288,12 @@ static int tcp_bpf_send_verdict(struct sock *sk,
> struct sk_psock *psock,
> > }
> > sk_msg_return(sk, msg, tosend);
> > release_sock(sk);
> > +
> > ret = tcp_bpf_sendmsg_redir(sk_redir, msg, tosend, flags);
> > +
> > + if (eval == __SK_REDIRECT)
>
> Is the 'if' needed? we are in this case because eval is SK_REDIRECT.
>
Need it, because If the "apply bytes" is not zero, i did not change the logic.
> > + sock_put(sk_redir);
> > +
> > lock_sock(sk);
> > if (unlikely(ret < 0)) {
> > int free = sk_msg_free_nocharge(sk, msg);
> > --
> > 2.17.1
> >
Powered by blists - more mailing lists