linux-kernel - [PATCH net-next v3 0/4] net: mvneta: improve rx/tx performance

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170220125344.3555-1-jszhang@marvell.com>
Date:   Mon, 20 Feb 2017 20:53:40 +0800
From:   Jisheng Zhang <jszhang@...vell.com>
To:     <thomas.petazzoni@...e-electrons.com>, <davem@...emloft.net>,
        <arnd@...db.de>, <gregory.clement@...e-electrons.com>,
        <mw@...ihalf.com>
CC:     <linux-arm-kernel@...ts.infradead.org>, <netdev@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>, Jisheng Zhang <jszhang@...vell.com>
Subject: [PATCH net-next v3 0/4] net: mvneta: improve rx/tx performance

In hot code path such as mvneta_rx_swbm(), we access fields of rx_desc
and tx_desc. These DMA descs are allocated by dma_alloc_coherent, they
are uncacheable if the device isn't cache coherent, reading from
uncached memory is fairly slow.

patch1 reuses the read out status to getting status field of rx_desc
again.

patch2 avoids getting buf_phys_addr from rx_desc again in
mvneta_rx_hwbm by reusing the phys_addr variable.

patch3 avoids reading from tx_desc as much as possible by store what
we need in local variable.

We get the following performance data on Marvell BG4CT Platforms
(tested with iperf):

before the patch:
sending 1GB in mvneta_tx()(disabled TSO) costs 793553760ns

after the patch:
sending 1GB in mvneta_tx()(disabled TSO) costs 719953800ns

we saved 9.2% time.

patch4 uses cacheable memory to store the rx buffer DMA address.

We get the following performance data on Marvell BG4CT Platforms
(tested with iperf):

before the patch:
recving 1GB in mvneta_rx_swbm() costs 1492659600 ns

after the patch:
recving 1GB in mvneta_rx_swbm() costs 1421565640 ns

We saved 4.76% time.

Basically, patch1 and patch4 do what Arnd mentioned in [1].

Hi Arnd,

I added "Suggested-by you" tag, I hope you don't mind ;)

Thanks

[1] https://www.spinics.net/lists/netdev/msg405889.html

Since v2:
  - add Gregory's ack to patch1
  - only get rx buffer DMA address from cacheable memory for mvneta_rx_swbm()
  - add patch 2 to read rx_desc->buf_phys_addr once in mvneta_rx_hwbm()
  - add patch 3 to avoid reading from tx_desc as much as possible

Since v1:
  - correct the performance data typo

Jisheng Zhang (4):
  net: mvneta: avoid getting status from rx_desc as much as possible
  net: mvneta: avoid getting buf_phys_addr from rx_desc again
  net: mvneta: avoid reading from tx_desc as much as possible
  net: mvneta: Use cacheable memory to store the rx buffer DMA address

 drivers/net/ethernet/marvell/mvneta.c | 80 +++++++++++++++++++----------------
 1 file changed, 43 insertions(+), 37 deletions(-)

-- 
2.11.0