linux-kernel - Re: [PATCH net v1 1/2] lan743x: improve performance: fix rx_napi

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iKxjQawkMBCg5Mt=eMgqvD_cpYSs4664GoGZFrMTgWJFw@mail.gmail.com>
Date:   Wed, 9 Dec 2020 00:50:10 +0100
From:   Eric Dumazet <edumazet@...gle.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     Sven Van Asbroeck <thesven73@...il.com>,
        Bryan Whitehead <bryan.whitehead@...rochip.com>,
        Microchip Linux Driver Support <UNGLinuxDriver@...rochip.com>,
        David S Miller <davem@...emloft.net>,
        Andrew Lunn <andrew@...n.ch>, netdev <netdev@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net v1 1/2] lan743x: improve performance: fix
 rx_napi_poll/interrupt ping-pong

On Wed, Dec 9, 2020 at 12:29 AM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Tue, 8 Dec 2020 17:23:08 -0500 Sven Van Asbroeck wrote:
> > On Tue, Dec 8, 2020 at 2:50 PM Jakub Kicinski <kuba@...nel.org> wrote:
> > >
> > > >
> > > > +done:
> > > >       /* update RX_TAIL */
> > > >       lan743x_csr_write(adapter, RX_TAIL(rx->channel_number),
> > > >                         rx_tail_flags | rx->last_tail);
> > > > -done:
> > > > +
> > >
> > > I assume this rings the doorbell to let the device know that more
> > > buffers are available? If so it's a little unusual to do this at the
> > > end of NAPI poll. The more usual place would be to do this every n
> > > times a new buffer is allocated (in lan743x_rx_init_ring_element()?)
> > > That's to say for example ring the doorbell every time a buffer is put
> > > at an index divisible by 16.
> >
> > Yes, I believe it tells the device that new buffers have become available.
> >
> > I wonder why it's so unusual to do this at the end of a napi poll? Omitting
> > this could result in sub-optimal use of buffers, right?
> >
> > Example:
> > - tail is at position 0
> > - core calls napi_poll(weight=64)
> > - napi poll consumes 15 buffers and pushes 15 skbs, then ring empty
> > - index not divisible by 16, so tail is not updated
> > - weight not reached, so napi poll re-enables interrupts and bails out
> >
> > Result: now there are 15 buffers which the device could potentially use, but
> > because the tail wasn't updated, it doesn't know about them.
>
> Perhaps 16 for a device with 64 descriptors is rather high indeed.
> Let's say 8. If the device misses 8 packet buffers on the ring,
> that should be negligible.
>

mlx4 uses 8 as the threshold ( mlx4_en_refill_rx_buffers())

> Depends on the cost of the CSR write, usually packet processing is
> putting a lot of pressure on the memory subsystem of the CPU, hence
> amortizing the write over multiple descriptors helps. The other thing
> is that you could delay the descriptor writes to write full cache lines,
> but I don't think that will help on IMX6.
>
> > It does make sense to update the tail more frequently than only at the end
> > of the napi poll, though?
> >
> > I'm new to napi polling, so I'm quite interested to learn about this.
>
> There is a tracepoint which records how many packets NAPI has polled:
> napi:napi_poll, you can see easily what your system is doing.
>
> What you want to avoid is the situation where the device already used
> up all the descriptors by the time driver finishes the Rx processing.
> That'd result in drops. So the driver should push the buffers back to
> the device reasonably early.
>
> With a ring of 64 descriptors and NAPI budget of 64 it's not unlikely
> that the ring will be completely used when processing runs.
>
> > > > +     /* up to half of elements in a full rx ring are
> > > > +      * extension frames. these do not generate skbs.
> > > > +      * to prevent napi/interrupt ping-pong, limit default
> > > > +      * weight to the smallest no. of skbs that can be
> > > > +      * generated by a full rx ring.
> > > > +      */
> > > >       netif_napi_add(adapter->netdev,
> > > >                      &rx->napi, lan743x_rx_napi_poll,
> > > > -                    rx->ring_size - 1);
> > > > +                    (rx->ring_size - 1) / 2);
> > >
> > > This is rather unusual, drivers should generally pass NAPI_POLL_WEIGHT
> > > here.
> >
> > I agree. The problem is that a full ring buffer of 64 buffers will only
> > contain 32 buffers with network data - the others are timestamps.
> >
> > So napi_poll(weight=64) can never reach its full weight. Even with a full
> > buffer, it always assumes that it has to stop polling, and re-enable
> > interrupts, which results in a ping-pong.
>
> Interesting I don't think we ever had this problem before. Let me CC
> Eric to see if he has any thoughts on the case. AFAIU you should think
> of the weight as way of arbitrating between devices (if there is more
> than one).

Driver could be called with an arbitrary budget (of 64),
and if its ring buffer has been depleted, return @budget instead of skb counts,
and not ream the interrupt

if (count < budget && !rx_ring_fully_processed) {
    if (napi_complete_done(napi, count))
          ream_irqs();
   return count;
}
return budget;


>
> NAPI does not do any deferral (in wall clock time terms) of processing,
> so the only difference you may get for lower weight is that softirq
> kthread will get a chance to kick in earlier.
>
> > Would it be better to fix the weight counting? Increase the count
> > for every buffer consumed, instead of for every skb pushed?
>