lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ+HfNh3G0cHs2kb+3tBd+L=SEs3L-qiCxUjyG4M60+JLOKJqQ@mail.gmail.com>
Date:   Thu, 26 Jul 2018 14:13:20 +0200
From:   Björn Töpel <bjorn.topel@...il.com>
To:     Jakub Kicinski <jakub.kicinski@...ronome.com>
Cc:     Daniel Borkmann <daniel@...earbox.net>,
        Björn Töpel <bjorn.topel@...el.com>,
        Jesper Dangaard Brouer <brouer@...hat.com>, kafai@...com,
        Taehee Yoo <ap420073@...il.com>, ast@...nel.org,
        Netdev <netdev@...r.kernel.org>,
        "Karlsson, Magnus" <magnus.karlsson@...el.com>
Subject: Re: [PATCH bpf] xdp: add NULL pointer check in __xdp_return()

Den mån 23 juli 2018 kl 21:58 skrev Jakub Kicinski
<jakub.kicinski@...ronome.com>:
>
> On Mon, 23 Jul 2018 11:39:36 +0200, Björn Töpel wrote:
> > Den fre 20 juli 2018 kl 22:08 skrev Jakub Kicinski:
> > > On Fri, 20 Jul 2018 10:18:21 -0700, Martin KaFai Lau wrote:
> > > > On Sat, Jul 21, 2018 at 01:04:45AM +0900, Taehee Yoo wrote:
> > > > > rhashtable_lookup() can return NULL. so that NULL pointer
> > > > > check routine should be added.
> > > > >
> > > > > Fixes: 02b55e5657c3 ("xdp: add MEM_TYPE_ZERO_COPY")
> > > > > Signed-off-by: Taehee Yoo <ap420073@...il.com>
> > > > > ---
> > > > >  net/core/xdp.c | 3 ++-
> > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/net/core/xdp.c b/net/core/xdp.c
> > > > > index 9d1f220..1c12bc7 100644
> > > > > --- a/net/core/xdp.c
> > > > > +++ b/net/core/xdp.c
> > > > > @@ -345,7 +345,8 @@ static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
> > > > >             rcu_read_lock();
> > > > >             /* mem->id is valid, checked in xdp_rxq_info_reg_mem_model() */
> > > > >             xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
> > > > > -           xa->zc_alloc->free(xa->zc_alloc, handle);
> > > > > +           if (xa)
> > > > > +                   xa->zc_alloc->free(xa->zc_alloc, handle);
> > > > hmm...It is not clear to me the "!xa" case don't have to be handled?
> > >
> > > Actually I have a more fundamental question about this interface I've
> > > been meaning to ask.
> > >
> > > IIUC free() can happen on any CPU at any time, when whatever device,
> > > socket or CPU this got redirected to completed the TX.  IOW there may
> > > be multiple producers.  Drivers would need to create spin lock a'la the
> > > a9744f7ca200 ("xsk: fix potential race in SKB TX completion code") fix?
> > >
> >
> > Jakub, apologies for the slow response. I'm still in
> > "holiday/hammock&beer mode", but will be back in a week. :-P
>
> Ah, sorry to interrupt! :)
>

Don't make it a habit! ;-P

> > The idea with the xdp_return_* functions are that an xdp_buff and
> > xdp_frame can have custom allocations schemes. The difference beween
> > struct xdp_buff and struct xdp_frame is lifetime. The xdp_buff
> > lifetime is within the napi context, whereas xdp_frame can have a
> > lifetime longer/outside the napi context. E.g. for a XDP_REDIRECT
> > scenario an xdp_buff is converted to a xdp_frame. The conversion is
> > done in include/net/xdp.h:convert_to_xdp_frame.
> >
> > Currently, the zero-copy MEM_TYPE_ZERO_COPY memtype can *only* be used
> > for xdp_buff, meaning that the lifetime is constrained to a napi
> > context. Further, given an xdp_buff with memtype MEM_TYPE_ZERO_COPY,
> > doing XDP_REDIRECT to a target that is *not* an AF_XDP socket would
> > mean converting the xdp_buff to an xdp_frame. The xdp_frame can then
> > be free'd on any CPU.
> >
> > Note that the xsk_rcv* functions is always called from an napi
> > context, and therefore is using the xdp_return_buff calls.
> >
> > To answer your question -- no, this fix is *not* needed, because the
> > xdp_buff napi constrained, and the xdp_buff will only be free'd on one
> > CPU.
>
> Oh, thanks, I missed the check in convert_to_xdp_frame(), so the only
> frames which can come back via the free path are out of the error path
> in __xsk_rcv_zc()?
>
> That path looks a little surprising too, isn't the expectation that if
> xdp_do_redirect() returns an error the driver retains the ownership of
> the buffer?
>
> static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
> {
>         int err = xskq_produce_batch_desc(xs->rx, (u64)xdp->handle, len);
>
>         if (err) {
>                 xdp_return_buff(xdp);
>                 xs->rx_dropped++;
>         }
>
>         return err;
> }
>
> This seems to call xdp_return_buff() *and* return an error.
>

Ugh, this is indeed an error. The xdp_return buff should be removed.
Thanks for spotting this!

So, yes, the way to get the buffer back (in ZC) to the driver is via
the error path (recycling) or via the UMEM fill queue.

> > > We need some form of internal kernel circulation which would be MPSC.
> > > I'm currently hacking up the XSK code to tell me whether the frame was
> > > consumed by the correct XSK, and always clone the frame otherwise
> > > (claiming to be the "traditional" MEM_TYPE_PAGE_ORDER0).
> > >
> > > I feel like I'm missing something about the code.  Is redirect of
> > > ZC/UMEM frame outside the xsk not possible and the only returns we will
> > > see are from net/xdp/xsk.c?  That would work, but I don't see such a
> > > check.  Help would be appreciated.
> > >
> >
> > Right now, this is the case (refer to the TODO in
> > convert_to_xdp_frame), i.e. you cannot redirect an ZC/UMEM allocated
> > xdp_buff to a target that is not an xsk. This must, obviously, change
> > so that an xdp_buff (of MEM_TYPE_ZERO_COPY) can be converted to an
> > xdp_frame. The xdp_frame must be able to be free'd from multiple CPUs,
> > so here the a more sophisticated allocation scheme is required.
> >
> > > Also the fact that XSK bufs can't be freed, only completed, adds to the
> > > pain of implementing AF_XDP, we'd certainly need some form of "give
> > > back the frame, but I may need it later" SPSC mechanism, otherwise
> > > driver writers will have tough time.  Unless, again, I'm missing
> > > something about the code :)
> > >
> >
> > Yup, moving the recycling scheme from driver to "generic" is a good
> > idea! I need to finish up those i40e zerocopy patches first though...
>
> Interesting, FWIW I wasn't necessarily thinking about full recycling,
> although that would be the holy grail.  Just a generic way of giving up
> buffers for example when user changes ring sizes or brings the device
> down.
>
> > (...and I'm very excited that you're doing nfp support for AF_XDP!!!)
>
> Thanks, I'm still way out in the weeds but it's interesting work :)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ