netdev - RE: [PATCH net-next v2 1/5] lan743x: boost performance on cpu archs w/o dma cache snooping

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <MN2PR11MB3662BCAD7CAA6F33D91F5725FA8B9@MN2PR11MB3662.namprd11.prod.outlook.com>
Date:   Fri, 12 Feb 2021 20:22:59 +0000
From:   <Bryan.Whitehead@...rochip.com>
To:     <thesven73@...il.com>, <UNGLinuxDriver@...rochip.com>,
        <davem@...emloft.net>, <kuba@...nel.org>
CC:     <andrew@...n.ch>, <rtgbnm@...il.com>, <sbauer@...ckbox.su>,
        <tharvey@...eworks.com>, <anders@...ningen.priv.no>,
        <hdanton@...a.com>, <hch@....de>,
        <willemdebruijn.kernel@...il.com>, <netdev@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH net-next v2 1/5] lan743x: boost performance on cpu archs
 w/o dma cache snooping

Hi Sven,

> Subject: [PATCH net-next v2 1/5] lan743x: boost performance on cpu archs
> w/o dma cache snooping
> 
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the
> content is safe
> 
> From: Sven Van Asbroeck <thesven73@...il.com>
> 
> The buffers in the lan743x driver's receive ring are always 9K, even when the
> largest packet that can be received (the mtu) is much smaller. This performs
> particularly badly on cpu archs without dma cache snooping (such as ARM):
> each received packet results in a 9K dma_{map|unmap} operation, which is
> very expensive because cpu caches need to be invalidated.
> 
> Careful measurement of the driver rx path on armv7 reveals that the cpu
> spends the majority of its time waiting for cache invalidation.
> 
> Optimize by keeping the rx ring buffer size as close as possible to the mtu.
> This limits the amount of cache that requires invalidation.
> 
> This optimization would normally force us to re-allocate all ring buffers when
> the mtu is changed - a disruptive event, because it can only happen when
> the network interface is down.
> 
> Remove the need to re-allocate all ring buffers by adding support for multi-
> buffer frames. Now any combination of mtu and ring buffer size will work.
> When the mtu changes from mtu1 to mtu2, consumed buffers of size mtu1
> are lazily replaced by newly allocated buffers of size mtu2.
> 
> These optimizations double the rx performance on armv7.
> Third parties report 3x rx speedup on armv8.
> 
> Tested with iperf3 on a freescale imx6qp + lan7430, both sides set to mtu
> 1500 bytes, measure rx performance:
> 
> Before:
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-20.00  sec   550 MBytes   231 Mbits/sec    0
> After:
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-20.00  sec  1.33 GBytes   570 Mbits/sec    0
> 
> Signed-off-by: Sven Van Asbroeck <thesven73@...il.com>

Looks good

Reviewed-by: Bryan Whitehead <Bryan.Whitehead@...rochip.com>