[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM_iQpWxJrXhdxyhO6O+h1d9dz=4BBk8i-EYrVG6v8ix_0gCnQ@mail.gmail.com>
Date: Thu, 20 May 2021 13:14:43 -0700
From: Cong Wang <xiyou.wangcong@...il.com>
To: John Fastabend <john.fastabend@...il.com>
Cc: Linux Kernel Network Developers <netdev@...r.kernel.org>,
bpf <bpf@...r.kernel.org>, Cong Wang <cong.wang@...edance.com>,
Daniel Borkmann <daniel@...earbox.net>,
Jakub Sitnicki <jakub@...udflare.com>,
Lorenz Bauer <lmb@...udflare.com>
Subject: Re: [Patch bpf] udp: fix a memory leak in udp_read_sock()
On Thu, May 20, 2021 at 10:43 AM John Fastabend
<john.fastabend@...il.com> wrote:
>
> Cong Wang wrote:
> > On Wed, May 19, 2021 at 2:54 PM John Fastabend <john.fastabend@...il.com> wrote:
> > >
> > > Cong Wang wrote:
> > > > On Wed, May 19, 2021 at 12:06 PM John Fastabend
> > > > <john.fastabend@...il.com> wrote:
> > > > >
> > > > > Cong Wang wrote:
> > > > > > On Tue, May 18, 2021 at 12:56 PM John Fastabend
> > > > > > <john.fastabend@...il.com> wrote:
> > > > > > >
> > > > > > > Cong Wang wrote:
> > > > > > > > On Mon, May 17, 2021 at 10:36 PM John Fastabend
> > > > > > > > <john.fastabend@...il.com> wrote:
> > > > > > > > >
> > > > > > > > > Cong Wang wrote:
> > > > > > > > > > From: Cong Wang <cong.wang@...edance.com>
> > > > > > > > > >
> > > > > > > > > > sk_psock_verdict_recv() clones the skb and uses the clone
> > > > > > > > > > afterward, so udp_read_sock() should free the original skb after
> > > > > > > > > > done using it.
> > > > > > > > >
> > > > > > > > > The clone only happens if sk_psock_verdict_recv() returns >0.
> > > > > > > >
> > > > > > > > Sure, in case of error, no one uses the original skb either,
> > > > > > > > so still need to free it.
> > > > > > >
> > > > > > > But the data is going to be dropped then. I'm questioning if this
> > > > > > > is the best we can do or not. Its simplest sure, but could we
> > > > > > > do a bit more work and peek those skbs or requeue them? Otherwise
> > > > > > > if you cross memory limits for a bit your likely to drop these
> > > > > > > unnecessarily.
> > > > > >
> > > > > > What are the benefits of not dropping it? When sockmap takes
> > > > > > over sk->sk_data_ready() it should have total control over the skb's
> > > > > > in the receive queue. Otherwise user-space recvmsg() would race
> > > > > > with sockmap when they try to read the first skb at the same time,
> > > > > > therefore potentially user-space could get duplicated data (one via
> > > > > > recvmsg(), one via sockmap). I don't see any benefits but races here.
> > > > >
> > > > > The benefit of _not_ dropping it is the packet gets to the receiver
> > > > > side. We've spent a bit of effort to get a packet across the network,
> > > > > received on the stack, and then we drop it at the last point is not
> > > > > so friendly.
> > > >
> > > > Well, at least udp_recvmsg() could drop packets too in various
> > > > scenarios, for example, a copy error. So, I do not think sockmap
> > > > is special.
> > >
> > > OK I am at least convinced now that dropping packets is OK and likely
> > > a useful performance/complexity compromise.
> > >
> > > But, at this point we wont have any visibility into these drops correct?
> > > Looks like the pattern in UDP stack to handle this is to increment
> > > sk_drops and UDP_MIB_INERRORS. How about we do that here as well?
> >
> > We are not dropping the packet, the packet is cloned and deliver to
> > user-space via sk_psock_verdict_recv(), thus, we are simply leaking
> > the original skb, regardless of any error. Maybe udp_read_sock()
> > should check desc->error, but it has nothing to do with this path which
> > only aims to address a memory leak. A separate patch is need to check
> > desc->error, if really needed.
> >
> > Thanks.
>
> "We are not dropping the packet" you'll need to be more explicit on
> how this path is not dropping the skb,
You know it is cloned, don't you? So if we clone an skb and deliver
the clone then free the original, what is dropped here? Absolutely
nothing.
By "drop", we clearly mean nothing is delivered. Or do you have
any different definition of "drop"?
>
> udp_read_sock()
> skb = skb_recv_udp()
> __skb_recv_udp()
> __skb_try_recv_from_queue()
> __skb_unlink() // skb is off the queue
> used = recv_actor()
> sk_psock_verdict_recv()
Why do you intentionally ignore the fact the skb is cloned
and the clone is delivered??
> return 0;
> if (used <= 0) {
> kfree(skb) // skb is unlink'd and kfree'd
Why do you ignore the other kfree_skb() I added in this patch?
Which is clearly on the non-error path. This is why I said the
skb needs to be freed _regardless_ of error or not. You just
keep ignoring it.
> break;
> return 0
>
> So if in the error case the skb is unlink'd from the queue and
> kfree'd where is it still existing, how do we get it back? It
> sure looks dropped to me. Yes, the kfree() is needed to not
> leak it, but I'm saying we don't want to drop packets silently.
See above, you clearly ignored the other kfree_skb() which is
on non-error path.
> The convention in UDP space looks to be inc sk->sk_drop and inc
> the MIB. When we have to debug this on deployed systems and
> packets silently get dropped its going to cause lots of pain so
> lets be sure we get the counters correct.
Sure, let me quote what I already said:
" A separate patch is need to check desc->error, if really needed."
This patch, as its subject tells, aims to address a memory leak, not
to address error counters. BTW, TCP does not increase error
counters either, yet another reason it deserves a separate patch
to address both.
Thanks.
Powered by blists - more mailing lists