lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Sun, 7 Apr 2019 13:53:19 +0200 From: Rafał Miłecki <zajec5@...il.com> To: netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>, Toshiaki Makita <makita.toshiaki@....ntt.co.jp>, Toke Høiland-Jørgensen <toke@...hat.com>, Florian Westphal <fw@...len.de>, Eric Dumazet <eric.dumazet@...il.com> Cc: Stefano Brivio <sbrivio@...hat.com>, Sabrina Dubroca <sd@...asysnail.net>, David Ahern <dsahern@...il.com>, Felix Fietkau <nbd@....name>, Jo-Philipp Wich <jo@...n.io>, Koen Vandeputte <koen.vandeputte@...ntric.com> Subject: Re: NAT performance regression caused by vlan GRO support On 04.04.2019 14:57, Rafał Miłecki wrote: > Long story short, starting with the commit 66e5133f19e9 ("vlan: Add GRO support > for non hardware accelerated vlan") - which first hit kernel 4.2 - NAT > performance of my router dropped by 30% - 40%. I'll try to provide some summary for this issue. I'll focus on TCP traffic as that's what I happened to test. Basically all slowdowns are related to the csum_partial(). Calculating checksum has a significant impact on NAT performance on less CPU powerful devices. ********** GRO disabled Without GRO a csum_partial() is used only when validating TCP packets in the nf_conntrack_tcp_packet() (known as tcp_packet() in kernels older than 5.1). Simplified forward trace for that case: nf_conntrack_in nf_conntrack_tcp_packet tcp_error if (state->net->ct.sysctl_checksum) nf_checksum nf_ip_checksum __skb_checksum_complete That validation can be disabled using nf_conntrack_checksum sysfs and it bumps NAT speed for me from 666 Mb/s to 940 Mb/s (+41%). ********** GRO enabled First of all GRO also includes TCP validation that requires calculating a checksum. Simplified forward trace for that case: vlan_gro_receive call_gro_receive inet_gro_receive indirect_call_gro_receive tcp4_gro_receive skb_gro_checksum_validate tcp_gro_receive *If* we had a way to disable that validation it *would* result in bumping NAT speed for me from 577 Mb/s to 825 Mb/s (+43%). Secondly using GRO means we need to calculate a checksum before transmitting packets (applies to devices without HW checksum offloading). I think it's related to packets merging in the skb_gro_receive() and then setting CHECKSUM_PARTIAL: vlan_gro_complete inet_gro_complete tcp4_gro_complete tcp_gro_complete skb->ip_summed = CHECKSUM_PARTIAL; That results in bgmac calculating a checksum from the scratch, take a look at the bgmac_dma_tx_add() which does: if (skb->ip_summed == CHECKSUM_PARTIAL) skb_checksum_help(skb); Performing that whole checksum calculation will always result in GRO slowing down NAT for me when using BCM47094 SoC with that not-so-powerful ARM CPUs.
Powered by blists - more mailing lists