netdev - RE: [PATCH] tcp_bpf: Fix one concurrency problem in the tcp_bpf_send

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <dfed692354c94efab9c83dee2cd5cf16@huawei.com>
Date:   Mon, 4 Oct 2021 04:21:30 +0000
From:   "liujian (CE)" <liujian56@...wei.com>
To:     John Fastabend <john.fastabend@...il.com>,
        "daniel@...earbox.net" <daniel@...earbox.net>,
        "jakub@...udflare.com" <jakub@...udflare.com>,
        "lmb@...udflare.com" <lmb@...udflare.com>,
        "edumazet@...gle.com" <edumazet@...gle.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "yoshfuji@...ux-ipv6.org" <yoshfuji@...ux-ipv6.org>,
        "dsahern@...nel.org" <dsahern@...nel.org>,
        "kuba@...nel.org" <kuba@...nel.org>,
        "ast@...nel.org" <ast@...nel.org>,
        "andrii@...nel.org" <andrii@...nel.org>,
        "kafai@...com" <kafai@...com>,
        "songliubraving@...com" <songliubraving@...com>,
        "yhs@...com" <yhs@...com>,
        "kpsingh@...nel.org" <kpsingh@...nel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "bpf@...r.kernel.org" <bpf@...r.kernel.org>
Subject: RE: [PATCH] tcp_bpf: Fix one concurrency problem in the
 tcp_bpf_send_verdict function



> -----Original Message-----
> From: John Fastabend [mailto:john.fastabend@...il.com]
> Sent: Friday, October 1, 2021 6:25 AM
> To: liujian (CE) <liujian56@...wei.com>; john.fastabend@...il.com;
> daniel@...earbox.net; jakub@...udflare.com; lmb@...udflare.com;
> edumazet@...gle.com; davem@...emloft.net; yoshfuji@...ux-ipv6.org;
> dsahern@...nel.org; kuba@...nel.org; ast@...nel.org; andrii@...nel.org;
> kafai@...com; songliubraving@...com; yhs@...com; kpsingh@...nel.org;
> netdev@...r.kernel.org; bpf@...r.kernel.org
> Cc: liujian (CE) <liujian56@...wei.com>
> Subject: RE: [PATCH] tcp_bpf: Fix one concurrency problem in the
> tcp_bpf_send_verdict function
> 
> Liu Jian wrote:
> > In the following cases:
> > We need to redirect the first msg to sock1 and the second msg to sock2.
> > The sock lock needs to be released at __SK_REDIRECT and to get another
> > sock lock, this will cause the probability that psock->eval is not set
> > to __SK_NONE when the second msg comes.
> >
> > If psock does not set apple bytes, fix this by do the cleanup before
> > releasing the sock lock. And keep the original logic in other cases.
> 
> It took me sometime to figure out the above description. Please include a bit
> more details here this needs to be backported so we want to be very clear
> what the error  is and how to trigger it.
> 
> In this case we should list the flow to show how the interleaving of msgs
> breaks.
> 
> "
> With two Msgs, msgA and msgB and a user doing nonblocking sendmsg calls
> (or multiple cores) on a single socket 'sk' we could get the following flow.
> 
>  msgA, sk                               msgB, sk
>  -----------                            ---------------
>  tcp_bpf_sendmsg()
>  lock(sk)
>  psock = sk->psock
>                                         tcp_bpf_sendmsg()
>                                         lock(sk) ... blocking  tcp_bpf_send_verdict  if (psock-
> >eval == NONE)
>    psock->eval = sk_psock_msg_verdict
>  ..
>  < handle SK_REDIRECT case >
>    release_sock(sk)                     < lock dropped so grab here >
>    ret = tcp_bpf_sendmsg_redir
>                                         psock = sk->psock
>                                         tcp_bpf_send_verdict
>  lock_sock(sk) ... blocking on B
>                                         if (psock->eval == NONE) <- boom.
>                                          psock->eval will have msgA state
> 
> The problem here is we dropped the lock on msgA and grabbed it with msgB.
> Now we have old state in psock and importantly psock->eval has not been
> cleared. So msgB will run whatever action was done on A and the verdict
> program may never see it.
> "
> 
> Showing the flow makes it painfully obvious why dropping that lock with old
> state is broken.
> 
Thanks a lot for such a detailed example.
> 
> >
> > Signed-off-by: Liu Jian <liujian56@...wei.com>
> 
> We need a fixes tag as well so we can backport this.
 I will add it.
> 
> > ---
> >  net/ipv4/tcp_bpf.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index
> > d3e9386b493e..02442e43ac4d 100644
> > --- a/net/ipv4/tcp_bpf.c
> > +++ b/net/ipv4/tcp_bpf.c
> > @@ -232,6 +232,7 @@ static int tcp_bpf_send_verdict(struct sock *sk,
> struct sk_psock *psock,
> >  	bool cork = false, enospc = sk_msg_full(msg);
> >  	struct sock *sk_redir;
> >  	u32 tosend, delta = 0;
> > +	u32 eval = __SK_NONE;
> >  	int ret;
> >
> >  more_data:
> > @@ -274,6 +275,12 @@ static int tcp_bpf_send_verdict(struct sock *sk,
> struct sk_psock *psock,
> >  		break;
> >  	case __SK_REDIRECT:
> >  		sk_redir = psock->sk_redir;
> > +		if (!psock->apply_bytes) {
> > +			/* Clean up before releasing the sock lock. */
> > +			eval = psock->eval;
> > +			psock->eval = __SK_NONE;
> > +			psock->sk_redir = NULL;
> > +		}
> 
> We need to move above chunk below sk_msg_apply_bytes() so we account
> for the bytes and if we zero apply bytes with this send we clear the psock
> state. Otherwise we could have the same issue with stale state on the
> boundary where apply bytes is met.
> 
> >  		sk_msg_apply_bytes(psock, tosend);
> 
> <-- put above chunk here.
yes, here looks better. 
> 
> >  		if (psock->cork) {
> >  			cork = true;
> 
> Interestingly, I caught the race with cork state, but missed it with the eval
> case. Likely because our program redirected to a single sk.
> 
Yes. 
> > @@ -281,7 +288,12 @@ static int tcp_bpf_send_verdict(struct sock *sk,
> struct sk_psock *psock,
> >  		}
> >  		sk_msg_return(sk, msg, tosend);
> >  		release_sock(sk);
> > +
> >  		ret = tcp_bpf_sendmsg_redir(sk_redir, msg, tosend, flags);
> > +
> > +		if (eval == __SK_REDIRECT)
> 
> Is the 'if' needed? we are in this case because eval is SK_REDIRECT.
> 
Need it, because If the "apply bytes" is not zero, i did not change the logic.
> > +			sock_put(sk_redir);
> > +
> >  		lock_sock(sk);
> >  		if (unlikely(ret < 0)) {
> >  			int free = sk_msg_free_nocharge(sk, msg);
> > --
> > 2.17.1
> >