[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM_iQpWzoP9SOQcMPB--jp6C_xnUVAmVtS4yMCN43kL248y4QA@mail.gmail.com>
Date: Thu, 25 Mar 2021 12:27:17 -0700
From: Cong Wang <xiyou.wangcong@...il.com>
To: John Fastabend <john.fastabend@...il.com>
Cc: Andrii Nakryiko <andrii@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Alexei Starovoitov <ast@...com>, bpf <bpf@...r.kernel.org>,
Linux Kernel Network Developers <netdev@...r.kernel.org>,
Lorenz Bauer <lmb@...udflare.com>
Subject: Re: [bpf PATCH 2/2] bpf, sockmap: fix incorrect fwd_alloc accounting
On Wed, Mar 24, 2021 at 7:46 PM John Fastabend <john.fastabend@...il.com> wrote:
>
> Cong Wang wrote:
> > On Wed, Mar 24, 2021 at 2:00 PM John Fastabend <john.fastabend@...il.com> wrote:
> > >
> > > Incorrect accounting fwd_alloc can result in a warning when the socket
> > > is torn down,
> > >
>
> [...]
>
> > > To resolve lets only account for sockets on the ingress queue that are
> > > still associated with the current socket. On the redirect case we will
> > > check memory limits per 6fa9201a89898, but will omit fwd_alloc accounting
> > > until skb is actually enqueued. When the skb is sent via skb_send_sock_locked
> > > or received with sk_psock_skb_ingress memory will be claimed on psock_other.
> ^^^^^^^^^^^^^^^^^^^^
> >
> > You mean sk_psock_skb_ingress(), right?
>
> Yes.
skb_send_sock_locked() actually allocates its own skb when sending, hence
it uses a different skb for memory accounting.
>
> [...]
>
> > > @@ -880,12 +876,13 @@ static void sk_psock_strp_read(struct strparser *strp, struct sk_buff *skb)
> > > kfree_skb(skb);
> > > goto out;
> > > }
> > > - skb_set_owner_r(skb, sk);
> > > prog = READ_ONCE(psock->progs.skb_verdict);
> > > if (likely(prog)) {
> > > + skb->sk = psock->sk;
> >
> > Why is skb_orphan() not needed here?
>
> These come from strparser which do not have skb->sk set.
Hmm, but sk_psock_verdict_recv() passes a clone too, like
strparser, so either we need it for both, or not at all. Clones
do not have skb->sk, so I think you can remove the one in
sk_psock_verdict_recv() too.
>
> >
> > Nit: You can just use 'sk' here, so "skb->sk = sk".
>
> Sure that is a bit nicer, will respin with this.
>
> >
> >
> > > tcp_skb_bpf_redirect_clear(skb);
> > > ret = sk_psock_bpf_run(psock, prog, skb);
> > > ret = sk_psock_map_verd(ret, tcp_skb_bpf_redirect_fetch(skb));
> > > + skb->sk = NULL;
> >
> > Why do you want to set it to NULL here?
>
> So we don't cause the stack to throw other errors later if we
> were to call skb_orphan for example. Various places in the skb
> helpers expect both skb->sk and skb->destructor to be set together
> and here we are just using it as a mechanism to feed the sk into
> the BPF program side. The above skb_set_owner_r for example
> would likely BUG().
Sounds reasonable.
Thanks.
Powered by blists - more mailing lists