netdev - Re: [bpf PATCH 2/2] bpf, sockmap: fix incorrect fwd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAM_iQpWzoP9SOQcMPB--jp6C_xnUVAmVtS4yMCN43kL248y4QA@mail.gmail.com>
Date:   Thu, 25 Mar 2021 12:27:17 -0700
From:   Cong Wang <xiyou.wangcong@...il.com>
To:     John Fastabend <john.fastabend@...il.com>
Cc:     Andrii Nakryiko <andrii@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Alexei Starovoitov <ast@...com>, bpf <bpf@...r.kernel.org>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Lorenz Bauer <lmb@...udflare.com>
Subject: Re: [bpf PATCH 2/2] bpf, sockmap: fix incorrect fwd_alloc accounting

On Wed, Mar 24, 2021 at 7:46 PM John Fastabend <john.fastabend@...il.com> wrote:
>
> Cong Wang wrote:
> > On Wed, Mar 24, 2021 at 2:00 PM John Fastabend <john.fastabend@...il.com> wrote:
> > >
> > > Incorrect accounting fwd_alloc can result in a warning when the socket
> > > is torn down,
> > >
>
> [...]
>
> > > To resolve lets only account for sockets on the ingress queue that are
> > > still associated with the current socket. On the redirect case we will
> > > check memory limits per 6fa9201a89898, but will omit fwd_alloc accounting
> > > until skb is actually enqueued. When the skb is sent via skb_send_sock_locked
> > > or received with sk_psock_skb_ingress memory will be claimed on psock_other.
>                      ^^^^^^^^^^^^^^^^^^^^
> >
> > You mean sk_psock_skb_ingress(), right?
>
> Yes.

skb_send_sock_locked() actually allocates its own skb when sending, hence
it uses a different skb for memory accounting.

>
> [...]
>
> > > @@ -880,12 +876,13 @@ static void sk_psock_strp_read(struct strparser *strp, struct sk_buff *skb)
> > >                 kfree_skb(skb);
> > >                 goto out;
> > >         }
> > > -       skb_set_owner_r(skb, sk);
> > >         prog = READ_ONCE(psock->progs.skb_verdict);
> > >         if (likely(prog)) {
> > > +               skb->sk = psock->sk;
> >
> > Why is skb_orphan() not needed here?
>
> These come from strparser which do not have skb->sk set.

Hmm, but sk_psock_verdict_recv() passes a clone too, like
strparser, so either we need it for both, or not at all. Clones
do not have skb->sk, so I think you can remove the one in
sk_psock_verdict_recv() too.


>
> >
> > Nit: You can just use 'sk' here, so "skb->sk = sk".
>
> Sure that is a bit nicer, will respin with this.
>
> >
> >
> > >                 tcp_skb_bpf_redirect_clear(skb);
> > >                 ret = sk_psock_bpf_run(psock, prog, skb);
> > >                 ret = sk_psock_map_verd(ret, tcp_skb_bpf_redirect_fetch(skb));
> > > +               skb->sk = NULL;
> >
> > Why do you want to set it to NULL here?
>
> So we don't cause the stack to throw other errors later if we
> were to call skb_orphan for example. Various places in the skb
> helpers expect both skb->sk and skb->destructor to be set together
> and here we are just using it as a mechanism to feed the sk into
> the BPF program side. The above skb_set_owner_r for example
> would likely BUG().

Sounds reasonable.

Thanks.