netdev - RE: [PATCH] bpf/sockmap: read psock ingress_msg before sk_receive

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <5e14a5fe53ac8_67962afd051fc5c0ea@john-XPS-13-9370.notmuch>
Date:   Tue, 07 Jan 2020 07:38:38 -0800
From:   John Fastabend <john.fastabend@...il.com>
To:     Lingpeng Chen <forrest0579@...il.com>,
        John Fastabend <john.fastabend@...il.com>
Cc:     Daniel Borkmann <daniel@...earbox.net>, netdev@...r.kernel.org,
        bpf@...r.kernel.or, Lingpeng Chen <forrest0579@...il.com>
Subject: RE: [PATCH] bpf/sockmap: read psock ingress_msg before
 sk_receive_queue

Lingpeng Chen wrote:
> Right now in tcp_bpf_recvmsg, sock read data first from sk_receive_queue
> if not empty than psock->ingress_msg otherwise. If a FIN packet arrives
> and there's also some data in psock->ingress_msg, the data in
> psock->ingress_msg will be purged. It is always happen when request to a
> HTTP1.0 server like python SimpleHTTPServer since the server send FIN
> packet after data is sent out.
> 
> Signed-off-by: Lingpeng Chen <forrest0579@...il.com>

Hi, Good timing I have a very similar patch I was just about to send out
on my queue as well. Also needs Fixes tag but see patch below

> ---
>  net/ipv4/tcp_bpf.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> index e38705165ac9..cd4b699d3d0d 100644
> --- a/net/ipv4/tcp_bpf.c
> +++ b/net/ipv4/tcp_bpf.c
> @@ -123,8 +123,6 @@ int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
>  
>  	if (unlikely(flags & MSG_ERRQUEUE))
>  		return inet_recv_error(sk, msg, len, addr_len);
> -	if (!skb_queue_empty(&sk->sk_receive_queue))
> -		return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);

I agree with this part.

>  
>  	psock = sk_psock_get(sk);
>  	if (unlikely(!psock))
> @@ -139,7 +137,7 @@ int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
>  		timeo = sock_rcvtimeo(sk, nonblock);
>  		data = tcp_bpf_wait_data(sk, psock, flags, timeo, &err);
>  		if (data) {
> -			if (skb_queue_empty(&sk->sk_receive_queue))
> +			if (!sk_psock_queue_empty(psock))

+1

>  				goto msg_bytes_ready;
>  			release_sock(sk);
>  			sk_psock_put(sk, psock);

I think it just misses one extra piece. We don't want to grab lock, call
__tcp_bpf_recvmsg(), call tcp_bpf_wait_data(), etc. when we know the
psock queue is empty. How about this patch I think it would solve your
case as well. If you think this also works go ahead and add your
Signed-off-by and send it. Or I'll send it later today with the upcoming
series I have with a couple syzbot fixes as well.

commit 40d1c0965cda3713f444c7c0b570364220b94a8a
Author: John Fastabend <john.fastabend@...il.com>
Date:   Thu Dec 19 17:18:42 2019 +0000

    bpf: bpf redirect should handle any received data before sk_receive_queue
    
    Arika reported that when SOCK_DONE occurs we handle sk_receive_queue before
    psock->ingress_msg so we may leave data in the ingress_msg queue. Resulting
    in a possible error on application side.
    
    Fix this by handling ingress_msg queue first so that data is not left in
    the insgress_msg queue.
    
    Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
    Reported-by: Arika Chen <eaglesora@...il.com>
    Suggested-by: Arika Chen <eaglesora@...il.com>
    Signed-off-by: John Fastabend <john.fastabend@...il.com>

diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index e38705165ac9..3b235c2cbc83 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -123,12 +123,14 @@ int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 
        if (unlikely(flags & MSG_ERRQUEUE))
                return inet_recv_error(sk, msg, len, addr_len);
-       if (!skb_queue_empty(&sk->sk_receive_queue))
-               return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
 
        psock = sk_psock_get(sk);
        if (unlikely(!psock))
                return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
+
+       if (!skb_queue_empty(&sk->sk_receive_queue) && sk_psock_queue_empty(psock))
+               return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
+
        lock_sock(sk);
 msg_bytes_ready:
        copied = __tcp_bpf_recvmsg(sk, psock, msg, len, flags);
@@ -139,7 +141,7 @@ int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
                timeo = sock_rcvtimeo(sk, nonblock);
                data = tcp_bpf_wait_data(sk, psock, flags, timeo, &err);
                if (data) {
-                       if (skb_queue_empty(&sk->sk_receive_queue))
+                       if (!sk_psock_queue_empty(psock))
                                goto msg_bytes_ready;
                        release_sock(sk);
                        sk_psock_put(sk, psock);