[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <60a69f9f1610_4ea08208a3@john-XPS-13-9370.notmuch>
Date: Thu, 20 May 2021 10:42:55 -0700
From: John Fastabend <john.fastabend@...il.com>
To: Cong Wang <xiyou.wangcong@...il.com>,
John Fastabend <john.fastabend@...il.com>
Cc: Linux Kernel Network Developers <netdev@...r.kernel.org>,
bpf <bpf@...r.kernel.org>, Cong Wang <cong.wang@...edance.com>,
Daniel Borkmann <daniel@...earbox.net>,
Jakub Sitnicki <jakub@...udflare.com>,
Lorenz Bauer <lmb@...udflare.com>
Subject: Re: [Patch bpf] udp: fix a memory leak in udp_read_sock()
Cong Wang wrote:
> On Wed, May 19, 2021 at 2:54 PM John Fastabend <john.fastabend@...il.com> wrote:
> >
> > Cong Wang wrote:
> > > On Wed, May 19, 2021 at 12:06 PM John Fastabend
> > > <john.fastabend@...il.com> wrote:
> > > >
> > > > Cong Wang wrote:
> > > > > On Tue, May 18, 2021 at 12:56 PM John Fastabend
> > > > > <john.fastabend@...il.com> wrote:
> > > > > >
> > > > > > Cong Wang wrote:
> > > > > > > On Mon, May 17, 2021 at 10:36 PM John Fastabend
> > > > > > > <john.fastabend@...il.com> wrote:
> > > > > > > >
> > > > > > > > Cong Wang wrote:
> > > > > > > > > From: Cong Wang <cong.wang@...edance.com>
> > > > > > > > >
> > > > > > > > > sk_psock_verdict_recv() clones the skb and uses the clone
> > > > > > > > > afterward, so udp_read_sock() should free the original skb after
> > > > > > > > > done using it.
> > > > > > > >
> > > > > > > > The clone only happens if sk_psock_verdict_recv() returns >0.
> > > > > > >
> > > > > > > Sure, in case of error, no one uses the original skb either,
> > > > > > > so still need to free it.
> > > > > >
> > > > > > But the data is going to be dropped then. I'm questioning if this
> > > > > > is the best we can do or not. Its simplest sure, but could we
> > > > > > do a bit more work and peek those skbs or requeue them? Otherwise
> > > > > > if you cross memory limits for a bit your likely to drop these
> > > > > > unnecessarily.
> > > > >
> > > > > What are the benefits of not dropping it? When sockmap takes
> > > > > over sk->sk_data_ready() it should have total control over the skb's
> > > > > in the receive queue. Otherwise user-space recvmsg() would race
> > > > > with sockmap when they try to read the first skb at the same time,
> > > > > therefore potentially user-space could get duplicated data (one via
> > > > > recvmsg(), one via sockmap). I don't see any benefits but races here.
> > > >
> > > > The benefit of _not_ dropping it is the packet gets to the receiver
> > > > side. We've spent a bit of effort to get a packet across the network,
> > > > received on the stack, and then we drop it at the last point is not
> > > > so friendly.
> > >
> > > Well, at least udp_recvmsg() could drop packets too in various
> > > scenarios, for example, a copy error. So, I do not think sockmap
> > > is special.
> >
> > OK I am at least convinced now that dropping packets is OK and likely
> > a useful performance/complexity compromise.
> >
> > But, at this point we wont have any visibility into these drops correct?
> > Looks like the pattern in UDP stack to handle this is to increment
> > sk_drops and UDP_MIB_INERRORS. How about we do that here as well?
>
> We are not dropping the packet, the packet is cloned and deliver to
> user-space via sk_psock_verdict_recv(), thus, we are simply leaking
> the original skb, regardless of any error. Maybe udp_read_sock()
> should check desc->error, but it has nothing to do with this path which
> only aims to address a memory leak. A separate patch is need to check
> desc->error, if really needed.
>
> Thanks.
"We are not dropping the packet" you'll need to be more explicit on
how this path is not dropping the skb,
udp_read_sock()
skb = skb_recv_udp()
__skb_recv_udp()
__skb_try_recv_from_queue()
__skb_unlink() // skb is off the queue
used = recv_actor()
sk_psock_verdict_recv()
return 0;
if (used <= 0) {
kfree(skb) // skb is unlink'd and kfree'd
break;
return 0
So if in the error case the skb is unlink'd from the queue and
kfree'd where is it still existing, how do we get it back? It
sure looks dropped to me. Yes, the kfree() is needed to not
leak it, but I'm saying we don't want to drop packets silently.
The convention in UDP space looks to be inc sk->sk_drop and inc
the MIB. When we have to debug this on deployed systems and
packets silently get dropped its going to cause lots of pain so
lets be sure we get the counters correct.
.John
Powered by blists - more mailing lists