[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM_iQpUg6EQq0XdahWVZe9CYbN-iY_gfFq5p=+2URPmH0r6bwQ@mail.gmail.com>
Date: Thu, 1 Jul 2021 11:00:35 -0700
From: Cong Wang <xiyou.wangcong@...il.com>
To: John Fastabend <john.fastabend@...il.com>
Cc: Linux Kernel Network Developers <netdev@...r.kernel.org>,
bpf <bpf@...r.kernel.org>, Cong Wang <cong.wang@...edance.com>,
Jiang Wang <jiang.wang@...edance.com>,
Daniel Borkmann <daniel@...earbox.net>,
Lorenz Bauer <lmb@...udflare.com>,
Jakub Sitnicki <jakub@...udflare.com>
Subject: Re: [Patch bpf v2] skmsg: check sk_rcvbuf limit before queuing to ingress_skb
On Thu, Jul 1, 2021 at 9:23 AM John Fastabend <john.fastabend@...il.com> wrote:
>
> Cong Wang wrote:
> > From: Cong Wang <cong.wang@...edance.com>
> >
> > Jiang observed OOM frequently when testing our AF_UNIX/UDP
> > proxy. This is due to the fact that we do not actually limit
> > the socket memory before queueing skb to ingress_skb. We
> > charge the skb memory later when handling the psock backlog,
> > but it is not limited either.
>
> Right, its not limiting but charging it should push back on
> the stack so it stops feeding skbs to us. Maybe this doesn't
> happen in UDP side?
The OOM is due to skb queued in ingress_skb, not due to
user-space consuming skb slowly.
>
> >
> > This patch adds checks for sk->sk_rcvbuf right before queuing
> > to ingress_skb and drops packets if this limit exceeds. This
> > is very similar to UDP receive path. Ideally we should set the
> > skb owner before this check too, but it is hard to make TCP
> > happy about sk_forward_alloc.
>
> But it breaks TCP side see below.
>
> >
> > Reported-by: Jiang Wang <jiang.wang@...edance.com>
> > Cc: Daniel Borkmann <daniel@...earbox.net>
> > Cc: John Fastabend <john.fastabend@...il.com>
> > Cc: Lorenz Bauer <lmb@...udflare.com>
> > Cc: Jakub Sitnicki <jakub@...udflare.com>
> > Signed-off-by: Cong Wang <cong.wang@...edance.com>
> > ---
> > net/core/skmsg.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> > index 9b6160a191f8..a5185c781332 100644
> > --- a/net/core/skmsg.c
> > +++ b/net/core/skmsg.c
> > @@ -854,7 +854,8 @@ static int sk_psock_skb_redirect(struct sk_psock *from, struct sk_buff *skb)
> > return -EIO;
> > }
> > spin_lock_bh(&psock_other->ingress_lock);
> > - if (!sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED)) {
> > + if (!sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED) ||
> > + atomic_read(&sk_other->sk_rmem_alloc) > READ_ONCE(sk_other->sk_rcvbuf)) {
> > spin_unlock_bh(&psock_other->ingress_lock);
> > skb_bpf_redirect_clear(skb);
> > sock_drop(from->sk, skb);
> > @@ -930,7 +931,8 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb,
> > }
> > if (err < 0) {
> > spin_lock_bh(&psock->ingress_lock);
> > - if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) {
> > + if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED) &&
> > + atomic_read(&sk_other->sk_rmem_alloc) <= READ_ONCE(sk_other->sk_rcvbuf)) {
> > skb_queue_tail(&psock->ingress_skb, skb);
>
> We can't just drop the packet in the memory overrun case here. This will
> break TCP because the data will be gone and no one will retransmit.
>
> Thats why in the current scheme on redirect we can push back when we
> move it to the other queues ingress message queue or redirect into
> the other socket via send.
>
> At one point I considered charging the data sitting in the ingress_skb?
> Would that solve the problem here? I think it would cause the enqueue
> at the UDP to start dropping packets from __udp_enqueue_schedule_skb()?
I tried to move skb_set_owner_r() here, TCP is clearly unhappy about it,
as I explained in changelog. Yes, it probably helps if we could move it here.
Thanks.
Powered by blists - more mailing lists