[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK8P3a3DiCfAj6Dk4tygzYpjccrEN60LZ8h_GP6JL2O_cCrivg@mail.gmail.com>
Date: Tue, 10 May 2022 15:18:30 +0200
From: Arnd Bergmann <arnd@...db.de>
To: Rafał Miłecki <zajec5@...il.com>
Cc: Arnd Bergmann <arnd@...db.de>, Andrew Lunn <andrew@...n.ch>,
Alexander Lobakin <alexandr.lobakin@...el.com>,
Network Development <netdev@...r.kernel.org>,
linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>,
Russell King <linux@...linux.org.uk>,
Felix Fietkau <nbd@....name>,
"openwrt-devel@...ts.openwrt.org" <openwrt-devel@...ts.openwrt.org>,
Florian Fainelli <f.fainelli@...il.com>
Subject: Re: Optimizing kernel compilation / alignments for network performance
On Tue, May 10, 2022 at 1:23 PM Rafał Miłecki <zajec5@...il.com> wrote:
> On 6.05.2022 10:45, Arnd Bergmann wrote:
> > - The higher-end networking SoCs are usually cache-coherent and
> > can avoid the cache management entirely. There is a slim chance
> > that this chip is designed that way and it just needs to be enabled
> > properly. Most low-end chips don't implement the coherent
> > interconnect though, and I suppose you have checked this already.
>
> To my best knowledge Northstar platform doesn't support hw coherency.
>
> I just took an extra look at Broadcom's SDK and them seem to have some
> driver for selected chipsets but BCM708 isn't there.
>
> config BCM_GLB_COHERENCY
> bool "Global Hardware Cache Coherency"
> default n
> depends on BCM963158 || BCM96846 || BCM96858 || BCM96856 || BCM963178 || BCM947622 || BCM963146 || BCM94912 || BCM96813 || BCM96756 || BCM96855
Ok
> > - bgmac_dma_rx_update_index() and bgmac_dma_tx_add() appear
> > to have an extraneous dma_wmb(), which should be implied by the
> > non-relaxed writel() in bgmac_write().
>
> I tried dropping wmb() calls.
> With wmb(): 421 Mb/s
> Without: 418 Mb/s
That's probably within the noise here. I suppose doing two wmb()
calls in a row is not that expensive because there is nothing left to
wait for. If the extra wmb() is measurably faster than no wmb(), there
is something else going wrong ;-)
> I also tried dropping bgmac_read() from bgmac_chip_intrs_off() which
> seems to be a flushing readback.
>
> With bgmac_read(): 421 Mb/s
> Without: 413 Mb/s
Interesting, so this is statistically significant, right? It could be that
this changing the interrupt timing just enough that it ends up doing
more work at once some of the time.
> > - accesses to the DMA descriptor don't show up in the profile here,
> > but look like they can get misoptimized by the compiler. I would
> > generally use READ_ONCE() and WRITE_ONCE() for these to
> > ensure that you don't end up with extra or out-of-order accesses.
> > This also makes it clearer to the reader that something special
> > happens here.
>
> Should I use something as below?
>
> FWIW it doesn't seem to change NAT performance.
> Without WRITE_ONCE: 421 Mb/s
> With: 419 Mb/s
This one depends on the compiler. What I would expect here is that
it often makes no difference, but if the compiler does something
odd, then the WRITE_ONCE() would prevent this and make it behave
as before. I would suggest adding this part regardless.
The other suggestion I had was this, I think you did not test this:
--- a/drivers/net/ethernet/broadcom/bgmac.c
+++ b/drivers/net/ethernet/broadcom/bgmac.c
@@ -1156,11 +1156,12 @@ static int bgmac_poll(struct napi_struct
*napi, int weight)
bgmac_dma_tx_free(bgmac, &bgmac->tx_ring[0]);
handled += bgmac_dma_rx_read(bgmac, &bgmac->rx_ring[0], weight);
- /* Poll again if more events arrived in the meantime */
- if (bgmac_read(bgmac, BGMAC_INT_STATUS) & (BGMAC_IS_TX0 | BGMAC_IS_RX))
- return weight;
-
if (handled < weight) {
+ /* Poll again if more events arrived in the meantime */
+ if (bgmac_read(bgmac, BGMAC_INT_STATUS) &
+ (BGMAC_IS_TX0 | BGMAC_IS_RX))
+ return weight;
+
napi_complete_done(napi, handled);
bgmac_chip_intrs_on(bgmac);
}
Or possibly, remove that extra check entirely and just rely on the irq to do
this after it gets turned on again.
Arnd
Powered by blists - more mailing lists