[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201208225125.GA2602479@lunn.ch>
Date: Tue, 8 Dec 2020 23:51:25 +0100
From: Andrew Lunn <andrew@...n.ch>
To: Sven Van Asbroeck <thesven73@...il.com>
Cc: Jakub Kicinski <kuba@...nel.org>,
Bryan Whitehead <bryan.whitehead@...rochip.com>,
Microchip Linux Driver Support <UNGLinuxDriver@...rochip.com>,
David S Miller <davem@...emloft.net>,
netdev <netdev@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net v1 2/2] lan743x: boost performance: limit PCIe
bandwidth requirement
> That's a good question. I used perf to create a flame graph of what
> the cpu was doing when receiving data at high speed. It showed that
> __dma_page_dev_to_cpu took up most of the cpu time. Which is triggered
> by dma_unmap_single(9K, DMA_FROM_DEVICE).
>
> So I assumed that it's a PCIe dma bandwidth issue, but I could be wrong -
> I didn't do any PCIe bandwidth measurements.
Sometimes it is actually cache operations which take all the
time. This needs to invalidate the cache, so that when the memory is
then accessed, it get fetched from RAM. On SMP machines, cache
invalidation can be expensive, due to all the cross CPU operations.
I've actually got better performance by building a UP kernel on some
low core count ARM CPUs.
There are some tricks which can be played. Do you actually need all
9K? Does the descriptor tell you actually how much is used? You can
get a nice speed up if you just unmap 64 bytes for a TCP ACK, rather
than the full 9K.
Andrew
Powered by blists - more mailing lists