lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 4 Dec 2020 11:14:26 +0100 From: Eric Dumazet <eric.dumazet@...il.com> To: Sieng Piaw Liew <liew.s.piaw@...il.com>, Florian Fainelli <f.fainelli@...il.com> Cc: bcm-kernel-feedback-list@...adcom.com, netdev@...r.kernel.org Subject: Re: [PATCH net-next] bcm63xx_enet: convert to build_skb On 12/4/20 6:46 AM, Sieng Piaw Liew wrote: > We can increase the efficiency of rx path by using buffers to receive > packets then build SKBs around them just before passing into the network > stack. In contrast, preallocating SKBs too early reduces CPU cache > efficiency. > > Check if we're in NAPI context when refilling RX. Normally we're almost > always running in NAPI context. Dispatch to napi_alloc_frag directly > instead of relying on netdev_alloc_frag which still runs > local_bh_disable/enable. > > Tested on BCM6328 320 MHz and iperf3 -M 512 to measure packet/sec > performance. Included netif_receive_skb_list and NET_IP_ALIGN > optimizations. > > Before: > [ ID] Interval Transfer Bandwidth Retr > [ 4] 0.00-10.00 sec 49.9 MBytes 41.9 Mbits/sec 197 sender > [ 4] 0.00-10.00 sec 49.3 MBytes 41.3 Mbits/sec receiver > > After: > [ ID] Interval Transfer Bandwidth Retr > [ 4] 0.00-30.00 sec 171 MBytes 47.8 Mbits/sec 272 sender > [ 4] 0.00-30.00 sec 170 MBytes 47.6 Mbits/sec receiver Please test this again after GRO has been added to this driver. Problem with build_skb() is that overall skb truesize after GRO might be increased a lot, since we have sizeof(struct skb_shared_info) added overhead per MSS, and this can double the truesize depending on device MTU. This matters on long RTT flows, because an inflation of skb->truesize reduces TCP receive window quite a lot. Ideally if you want best performance, this driver should use napi_gro_frags(), so that skb->len/skb->truesize is the smallest one. In order to test your change you need to set up a testbed with 10ms or 50ms delay between the hosts, unless this driver is only used by hosts on the same LAN (which I doubt)
Powered by blists - more mailing lists