lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <60ddec01c651b_3fe24208dc@john-XPS-13-9370.notmuch>
Date:   Thu, 01 Jul 2021 09:23:29 -0700
From:   John Fastabend <john.fastabend@...il.com>
To:     Cong Wang <xiyou.wangcong@...il.com>, netdev@...r.kernel.org
Cc:     bpf@...r.kernel.org, Cong Wang <cong.wang@...edance.com>,
        Jiang Wang <jiang.wang@...edance.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        John Fastabend <john.fastabend@...il.com>,
        Lorenz Bauer <lmb@...udflare.com>,
        Jakub Sitnicki <jakub@...udflare.com>
Subject: RE: [Patch bpf v2] skmsg: check sk_rcvbuf limit before queuing to
 ingress_skb

Cong Wang wrote:
> From: Cong Wang <cong.wang@...edance.com>
> 
> Jiang observed OOM frequently when testing our AF_UNIX/UDP
> proxy. This is due to the fact that we do not actually limit
> the socket memory before queueing skb to ingress_skb. We
> charge the skb memory later when handling the psock backlog,
> but it is not limited either.

Right, its not limiting but charging it should push back on
the stack so it stops feeding skbs to us. Maybe this doesn't
happen in UDP side?

> 
> This patch adds checks for sk->sk_rcvbuf right before queuing
> to ingress_skb and drops packets if this limit exceeds. This
> is very similar to UDP receive path. Ideally we should set the
> skb owner before this check too, but it is hard to make TCP
> happy about sk_forward_alloc.

But it breaks TCP side see below.

> 
> Reported-by: Jiang Wang <jiang.wang@...edance.com>
> Cc: Daniel Borkmann <daniel@...earbox.net>
> Cc: John Fastabend <john.fastabend@...il.com>
> Cc: Lorenz Bauer <lmb@...udflare.com>
> Cc: Jakub Sitnicki <jakub@...udflare.com>
> Signed-off-by: Cong Wang <cong.wang@...edance.com>
> ---
>  net/core/skmsg.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index 9b6160a191f8..a5185c781332 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -854,7 +854,8 @@ static int sk_psock_skb_redirect(struct sk_psock *from, struct sk_buff *skb)
>  		return -EIO;
>  	}
>  	spin_lock_bh(&psock_other->ingress_lock);
> -	if (!sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED)) {
> +	if (!sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED) ||
> +	    atomic_read(&sk_other->sk_rmem_alloc) > READ_ONCE(sk_other->sk_rcvbuf)) {
>  		spin_unlock_bh(&psock_other->ingress_lock);
>  		skb_bpf_redirect_clear(skb);
>  		sock_drop(from->sk, skb);
> @@ -930,7 +931,8 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb,
>  		}
>  		if (err < 0) {
>  			spin_lock_bh(&psock->ingress_lock);
> -			if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) {
> +			if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED) &&
> +			    atomic_read(&sk_other->sk_rmem_alloc) <= READ_ONCE(sk_other->sk_rcvbuf)) {
>  				skb_queue_tail(&psock->ingress_skb, skb);

We can't just drop the packet in the memory overrun case here. This will
break TCP because the data will be gone and no one will retransmit.

Thats why in the current scheme on redirect we can push back when we
move it to the other queues ingress message queue or redirect into
the other socket via send.

At one point I considered charging the data sitting in the ingress_skb?
Would that solve the problem here? I think it would cause the enqueue
at the UDP to start dropping packets from __udp_enqueue_schedule_skb()?

>  				schedule_work(&psock->work);
>  				err = 0;
> -- 
> 2.27.0
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ