netdev - Re: [PATCH net v2] ixgbe: fix double clean of tx descriptors with xdp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAKgT0Uetca8i4o89YHukoKTW_TTdrW8n-gKON7sXUi=U7bqrSw@mail.gmail.com>
Date:   Thu, 22 Aug 2019 10:10:01 -0700
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     Ilya Maximets <i.maximets@...sung.com>
Cc:     Netdev <netdev@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>, bpf@...r.kernel.org,
        "David S. Miller" <davem@...emloft.net>,
        Björn Töpel <bjorn.topel@...el.com>,
        Magnus Karlsson <magnus.karlsson@...el.com>,
        Jakub Kicinski <jakub.kicinski@...ronome.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
        intel-wired-lan <intel-wired-lan@...ts.osuosl.org>,
        Eelco Chaudron <echaudro@...hat.com>,
        William Tu <u9012063@...il.com>
Subject: Re: [PATCH net v2] ixgbe: fix double clean of tx descriptors with xdp

On Thu, Aug 22, 2019 at 9:58 AM Ilya Maximets <i.maximets@...sung.com> wrote:
>
> On 22.08.2019 19:38, Alexander Duyck wrote:
> > On Thu, Aug 22, 2019 at 5:30 AM Ilya Maximets <i.maximets@...sung.com> wrote:
> >>
> >> Tx code doesn't clear the descriptors' status after cleaning.
> >> So, if the budget is larger than number of used elems in a ring, some
> >> descriptors will be accounted twice and xsk_umem_complete_tx will move
> >> prod_tail far beyond the prod_head breaking the comletion queue ring.
> >>
> >> Fix that by limiting the number of descriptors to clean by the number
> >> of used descriptors in the tx ring.
> >>
> >> 'ixgbe_clean_xdp_tx_irq()' function refactored to look more like
> >> 'ixgbe_xsk_clean_tx_ring()' since we don't need most of the
> >> complications implemented in the regular 'ixgbe_clean_tx_irq()'
> >> and we're allowed to directly use 'next_to_clean' and 'next_to_use'
> >> indexes.
> >>
> >> Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support")
> >> Signed-off-by: Ilya Maximets <i.maximets@...sung.com>
> >> ---
> >>
> >> Version 2:
> >>   * 'ixgbe_clean_xdp_tx_irq()' refactored to look more like
> >>     'ixgbe_xsk_clean_tx_ring()'.
> >>
> >>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 34 ++++++++------------
> >>  1 file changed, 13 insertions(+), 21 deletions(-)
> >>
> >> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> >> index 6b609553329f..d1297660e14a 100644
> >> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> >> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> >> @@ -633,22 +633,23 @@ static void ixgbe_clean_xdp_tx_buffer(struct ixgbe_ring *tx_ring,
> >>  bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
> >>                             struct ixgbe_ring *tx_ring, int napi_budget)
> >>  {
> >> +       u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use;
> >>         unsigned int total_packets = 0, total_bytes = 0;
> >> -       u32 i = tx_ring->next_to_clean, xsk_frames = 0;
> >>         unsigned int budget = q_vector->tx.work_limit;
> >>         struct xdp_umem *umem = tx_ring->xsk_umem;
> >> -       union ixgbe_adv_tx_desc *tx_desc;
> >> -       struct ixgbe_tx_buffer *tx_bi;
> >> +       u32 xsk_frames = 0;
> >>         bool xmit_done;
> >>
> >> -       tx_bi = &tx_ring->tx_buffer_info[i];
> >> -       tx_desc = IXGBE_TX_DESC(tx_ring, i);
> >> -       i -= tx_ring->count;
> >> +       while (likely(ntc != ntu && budget)) {
> >
> > I would say you can get rid of budget entirely. It was only really
> > needed for the regular Tx case where you can have multiple CPUs
> > feeding a single Tx queue and causing a stall. Since we have a 1:1
> > mapping we should never have more than the Rx budget worth of packets
> > to really process. In addition we can only make one pass through the
> > ring since the ntu value is not updated while running the loop.
>
> OK. Will remove.
>
> >
> >> +               union ixgbe_adv_tx_desc *tx_desc;
> >> +               struct ixgbe_tx_buffer *tx_bi;
> >> +
> >> +               tx_desc = IXGBE_TX_DESC(tx_ring, ntc);
> >>
> >> -       do {
> >>                 if (!(tx_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)))
> >>                         break;
> >>
> >> +               tx_bi = &tx_ring->tx_buffer_info[ntc];
> >
> > Please don't move this logic into the loop. We were intentionally
> > processing this outside of the loop once and then just doing the
> > increments because it is faster that way. It takes several operations
> > to compute tx_bi based on ntc, whereas just incrementing is a single
> > operation.
>
> OK.
>
> >
> >>                 total_bytes += tx_bi->bytecount;
> >>                 total_packets += tx_bi->gso_segs;
> >>
> >> @@ -659,24 +660,15 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
> >>
> >>                 tx_bi->xdpf = NULL;
> >>
> >> -               tx_bi++;
> >> -               tx_desc++;
> >> -               i++;
> >> -               if (unlikely(!i)) {
> >> -                       i -= tx_ring->count;
> >
> > So these two lines can probably just be replaced by:
> > if (unlikely(ntc == tx_ring->count)) {
> >         ntc = 0;
>
> Sure.
>
> >
> >> -                       tx_bi = tx_ring->tx_buffer_info;
> >> -                       tx_desc = IXGBE_TX_DESC(tx_ring, 0);
> >> -               }
> >> -
> >> -               /* issue prefetch for next Tx descriptor */
> >> -               prefetch(tx_desc);
> >
> > Did you just drop the prefetch?
>
> I'll keep the prefetch in v3 because, as you fairly mentioned, it's not
> related to this patch. However, I'm not sure if this prefetch makes any
> sense here, because there is only one comparison operation between the
> prefetch and the data usage:
>
>  while (ntc != ntu) {
>      if (!(tx_desc->wb.status ...
>      <...>
>      prefetch(tx_desc);
>  }

I'm not opposed to dropping the prefetch, but if you are going to do
it you should do it in a separate patch.