netdev - Re: [PATCH v6] vmxnet3: Add XDP support.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALDO+SYz2LYk8_tvwoY75OVT-7B3_HgjF8PTnUN9hdDL+pucwQ@mail.gmail.com>
Date:   Sat, 17 Dec 2022 07:42:06 -0800
From:   William Tu <u9012063@...il.com>
To:     Alexander Duyck <alexander.duyck@...il.com>
Cc:     netdev@...r.kernel.org, tuc@...are.com, gyang@...are.com,
        doshir@...are.com
Subject: Re: [PATCH v6] vmxnet3: Add XDP support.

On Fri, Dec 16, 2022 at 4:37 PM Alexander Duyck
<alexander.duyck@...il.com> wrote:
>
> On Fri, Dec 16, 2022 at 2:53 PM William Tu <u9012063@...il.com> wrote:
> >
Hi Alexander,
Thanks for taking a look at this patch!

> > > > Single core performance comparison with skb-mode.
> > > > 64B:      skb-mode -> native-mode (with this patch)
> > > > XDP_DROP: 932Kpps -> 2.0Mpps
> > > > XDP_PASS: 284Kpps -> 314Kpps
> > > > XDP_TX:   591Kpps -> 1.8Mpps
> > > > REDIRECT: 453Kpps -> 501Kpps
> > > >
> > > > 512B:      skb-mode -> native-mode (with this patch)
> > > > XDP_DROP: 890Kpps -> 1.3Mpps
> > > > XDP_PASS: 284Kpps -> 314Kpps
> > > > XDP_TX:   555Kpps -> 1.2Mpps
> > > > REDIRECT: 670Kpps -> 430Kpps
> > > >
> > >
> > > I hadn't noticed it before. Based on this it looks like native mode is
> > > performing worse then skb-mode for redirect w/ 512B packets? Have you
> > > looked into why that might be?
> >
> > yes, I noticed it but don't know why, maybe it's due to extra copy and page
> > allocation like you said below. I will dig deeper.

I've fixed the issue like you mentioned, and now the redirect shows
better performance.
I will update in next version.

> >
> > >
> > > My main concern would be that you are optimizing for recyling in the Tx
> > > and Redirect paths, when you might be better off just releasing the
> > > buffers and batch allocating new pages in your Rx path.
> >
> > right, are you talking about using the page pool allocator, ex: slide 8 below
> > https://legacy.netdevconf.info/0x14/pub/slides/10/add-xdp-on-driver.pdf
> > I tried it before but then I found I have to replace lots of existing vmxnet3
> > code, basically replacing all the rx/tx buffer allocation code with new
> > page pool api, even without XDP.  I'd love to give it a try, do you think it's
> > worth doing it?
>
> It might be. It is hard for me to say without knowing more about the
> driver itself. However if i were doing a driver from scratch that
> supported XDP I would probably go that route. Having to refactor an
> existing driver is admittedly going to be more work.
>
I see, thanks.
>
> >
> > >
> > > > +     __netif_tx_unlock(nq);
> > > > +     return err;
> > > > +}
> > > > +
> > > > +int
> > > > +vmxnet3_xdp_xmit(struct net_device *dev,
> > > > +              int n, struct xdp_frame **frames, u32 flags)
> > > > +{
> > > > +     struct vmxnet3_adapter *adapter;
> > > > +     struct vmxnet3_tx_queue *tq;
> > > > +     struct netdev_queue *nq;
> > > > +     int i, err, cpu;
> > > > +     int nxmit = 0;
> > > > +     int tq_number;
> > > > +
> > > > +     adapter = netdev_priv(dev);
> > > > +
> > > > +     if (unlikely(test_bit(VMXNET3_STATE_BIT_QUIESCED, &adapter->state)))
> > > > +             return -ENETDOWN;
> > > > +     if (unlikely(test_bit(VMXNET3_STATE_BIT_RESETTING, &adapter->state)))
> > > > +             return -EINVAL;
> > > > +
> > > > +     tq_number = adapter->num_tx_queues;
> > > > +     cpu = smp_processor_id();
> > > > +     tq = &adapter->tx_queue[cpu % tq_number];
> > > > +     if (tq->stopped)
> > > > +             return -ENETDOWN;
> > > > +
> > > > +     nq = netdev_get_tx_queue(adapter->netdev, tq->qid);
> > > > +
> > > > +     __netif_tx_lock(nq, cpu);
> > > > +     for (i = 0; i < n; i++) {
> > > > +             err = vmxnet3_xdp_xmit_frame(adapter, frames[i], tq);
> > > > +             if (err) {
> > > > +                     tq->stats.xdp_xmit_err++;
> > > > +                     break;
> > > > +             }
> > > > +             nxmit++;
> > > > +     }
> > > > +
> > > > +     tq->stats.xdp_xmit += nxmit;
> > > > +     __netif_tx_unlock(nq);
> > > > +
> > >
> > > Are you doing anything to free the frames after you transmit them? If I
> > > am not mistaken you are just copying them over into skbs aren't you, so
> > > what is freeing the frames after that?
> >
> > The frames will be free at vmxnet3_tq_cleanup() at dev_kfree_skb_any(tbi->skb);
> > Because at the vmxnet3_xdp_xmit_frame the allocated skb is saved at tbi->skb,
> > so it can be freed at tq cleanup.
>
> The frames I am referring to are the xdp_frame, not the skb.
> Specifically your function is copying the data out. So in the redirect
> case I think you might be leaking pages. That is one of the reasons
> why I was thinking it might be better to just push the data all the
> way through.

Got it, you're right, it's leaking memory there.

>
> In the other email you sent me the call xdp_return_frame was used to
> free the frame. That is what would be expected in this function after
> you cleaned the data out and placed it in an skbuff in
> vmxnet3_xdp_xmit_frame.

OK! will do it.

>
> > >
> > > > +     return nxmit;
> > > > +}
> > > > +
> > > > +static int
> > > > +__vmxnet3_run_xdp(struct vmxnet3_rx_queue *rq, void *data, int data_len,
> > > > +               int headroom, int frame_sz, bool *need_xdp_flush,
> > > > +               struct sk_buff *skb)
> > > > +{
> > > > +     struct xdp_frame *xdpf;
> > > > +     void *buf_hard_start;
> > > > +     struct xdp_buff xdp;
> > > > +     struct page *page;
> > > > +     void *orig_data;
> > > > +     int err, delta;
> > > > +     int delta_len;
> > > > +     u32 act;
> > > > +
> > > > +     buf_hard_start = data;
> > > > +     xdp_init_buff(&xdp, frame_sz, &rq->xdp_rxq);
> > > > +     xdp_prepare_buff(&xdp, buf_hard_start, headroom, data_len, true);
> > > > +     orig_data = xdp.data;
> > > > +
> > > > +     act = bpf_prog_run_xdp(rq->xdp_bpf_prog, &xdp);
> > > > +     rq->stats.xdp_packets++;
> > > > +
> > > > +     switch (act) {
> > > > +     case XDP_DROP:
> > > > +             rq->stats.xdp_drops++;
> > > > +             break;
> > > > +     case XDP_PASS:
> > > > +             /* bpf prog might change len and data position.
> > > > +              * dataring does not use skb so not support this.
> > > > +              */
> > > > +             delta = xdp.data - orig_data;
> > > > +             delta_len = (xdp.data_end - xdp.data) - data_len;
> > > > +             if (skb) {
> > > > +                     skb_reserve(skb, delta);
> > > > +                     skb_put(skb, delta_len);
> > > > +             }
> > > > +             break;
> > > > +     case XDP_TX:
> > > > +             xdpf = xdp_convert_buff_to_frame(&xdp);
> > > > +             if (!xdpf ||
> > > > +                 vmxnet3_xdp_xmit_back(rq->adapter, xdpf)) {
> > > > +                     rq->stats.xdp_drops++;
> > > > +             } else {
> > > > +                     rq->stats.xdp_tx++;
> > > > +             }
> > > > +             break;
> > > > +     case XDP_ABORTED:
> > > > +             trace_xdp_exception(rq->adapter->netdev, rq->xdp_bpf_prog,
> > > > +                                 act);
> > > > +             rq->stats.xdp_aborted++;
> > > > +             break;
> > > > +     case XDP_REDIRECT:
> > > > +             page = alloc_page(GFP_ATOMIC);
> > > > +             if (!page) {
> > > > +                     rq->stats.rx_buf_alloc_failure++;
> > > > +                     return XDP_DROP;
> > > > +             }
> > >
> > > So I think I see the problem I had questions about here. If I am not
> > > mistaken you are copying the buffer to this page, and then copying this
> > > page to an skb right? I think you might be better off just writing off
> > > the Tx/Redirect pages and letting them go through their respective
> > > paths and just allocating new pages instead assuming these pages were
> > > consumed.
> >
> > I'm not sure I understand, can you elaborate?
> >
> > For XDP_TX, I'm doing 1 extra copy, copying to the newly allocated skb
> > in vmxnet3_xdp_xmit_back.
> > For XDP_REDIREC, I allocate a page and copy to the page and call
> > xdp_do_redirect, there is no copying to skb again. If I don't allocate a
> > new page, it always crashes at
> > [62020.425932] BUG: Bad page state in process cpumap/0/map:29  pfn:107548
> > [62020.440905] kernel BUG at include/linux/mm.h:757!
> >  VM_BUG_ON_PAGE(page_ref_count(page) == 0, page);
>
> What I was referring to was one copy here, and then another copy in
> your vmxnet3_xdp_xmit_frame() function with the page you allocated
> here possibly being leaked.
>
> Though based on the other trace you provided it would look like you
> are redirecting to something else as your code currently doesn't
> reference xdp_return_frame which is what I was referring to earlier in
> terms of missing the logic to free the frame in your transmit path.
yes, I tried a couple ways and now it's working.
basically I have to call get_page to increment the refcnt then call
xdp_return_frame. I will update performance number and send
next version.

Thanks
William