lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 8 Dec 2020 23:51:25 +0100
From:   Andrew Lunn <>
To:     Sven Van Asbroeck <>
Cc:     Jakub Kicinski <>,
        Bryan Whitehead <>,
        Microchip Linux Driver Support <>,
        David S Miller <>,
        netdev <>,
        Linux Kernel Mailing List <>
Subject: Re: [PATCH net v1 2/2] lan743x: boost performance: limit PCIe
 bandwidth requirement

> That's a good question. I used perf to create a flame graph of what
> the cpu was doing when receiving data at high speed. It showed that
> __dma_page_dev_to_cpu took up most of the cpu time. Which is triggered
> by dma_unmap_single(9K, DMA_FROM_DEVICE).
> So I assumed that it's a PCIe dma bandwidth issue, but I could be wrong -
> I didn't do any PCIe bandwidth measurements.

Sometimes it is actually cache operations which take all the
time. This needs to invalidate the cache, so that when the memory is
then accessed, it get fetched from RAM. On SMP machines, cache
invalidation can be expensive, due to all the cross CPU operations.
I've actually got better performance by building a UP kernel on some
low core count ARM CPUs.

There are some tricks which can be played. Do you actually need all
9K? Does the descriptor tell you actually how much is used? You can
get a nice speed up if you just unmap 64 bytes for a TCP ACK, rather
than the full 9K.


Powered by blists - more mailing lists