lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 7 Aug 2020 17:21:51 -0600 From: Ryan Cox <ryan_cox@....edu> To: Scott Dial <scott@...ttdial.com> Cc: Antoine Tenart <antoine.tenart@...tlin.com>, netdev@...r.kernel.org, davem@...emloft.net, sd@...asysnail.net Subject: Re: Severe performance regression in "net: macsec: preserve ingress frame ordering" On 8/6/20 9:48 PM, Scott Dial wrote: > The aes-aesni driver is smart enough to use the FPU if it's not busy and > fallback to the CPU otherwise. Unfortunately, the ghash-clmulni driver > does not have that kind of logic in it and only provides an async version, > so we are forced to use the ghash-generic implementation, which is a pure > CPU implementation. The ideal would be for aesni_intel to provide a > synchronous version of gcm(aes) that fell back to the CPU if the FPU is > busy. I don't know how the AES-NI support works, but I did see your specific mention of aesni_intel and figured I should mention that this does also affect AMD. I just got access to AMD nodes (2 x EPYC 7302) with a Mellanox 10 GbE NIC. I did the same test and it had a similar performance pattern. I doubt this means much but I figured I should mention it. > I don't know if the crypto maintainers would be open to such a change, but > if the choice was between reverting and patching the crypto code, then I > would work on patching the crypto code. I can't opine on anything crypto-related since it is extremely way outside of my area of expertise, though it is helpful to hear what is going on. > In any case, you didn't report how many packets arrived out of order, which > was the issue being addressed by my change. It would be helpful to get > the output of "ip -s macsec show" and specifically the InPktsDelayed > counter. Did iperf3 report out-of-order packets with the patch reverted? > Otherwise, if this is the only process running on your test servers, > then you may not be generating any contention for the FPU, which is the > source of the out-of-order issue. Maybe you could run prime95 to busy > the FPU to see the issue that I was seeing. I ran some tests again on the same servers as before with the Intel NICs. I tested with prime95 running on 27 of the 28 cores in *each* server simultaneously (allowing iperf3 to use a core on each) throughout the entire test. This was using 5.7.11 with ab046a5d4be4c90a3952a0eae75617b49c0cb01b reverted, so pre-5.7 performance. MACsec interfaces are deleted and recreated before each test, so counters are always fresh. == MACSEC WITHOUT ENCRYPTION == * Server1: 18: ms1: protect on validate strict sc off sa off encrypt off send_sci on end_station off scb off replay off cipher suite: GCM-AES-128, using ICV length 16 TXSC: 0000000000001234 on SA 0 stats: OutPktsUntagged InPktsUntagged OutPktsTooLong InPktsNoTag InPktsBadTag InPktsUnknownSCI InPktsNoSCI InPktsOverrun 0 0 0 1123 0 0 1 0 stats: OutPktsProtected OutPktsEncrypted OutOctetsProtected OutOctetsEncrypted 3798421 0 30889802591 0 0: PN 3799655, state on, key 01000000000000000000000000000000 stats: OutPktsProtected OutPktsEncrypted 3798421 0 RXSC: 0000000000001234, state on stats: InOctetsValidated InOctetsDecrypted InPktsUnchecked InPktsDelayed InPktsOK InPktsInvalid InPktsLate InPktsNotValid InPktsNotUsingSA InPktsUnusedSA 30042694872 0 0 218 3675170 0 0 0 0 0 0: PN 3676633, state on, key 01000000000000000000000000000000 stats: InPktsOK InPktsInvalid InPktsNotValid InPktsNotUsingSA InPktsUnusedSA 3675170 0 0 0 0 *Server2: 18: ms1: protect on validate strict sc off sa off encrypt off send_sci on end_station off scb off replay off cipher suite: GCM-AES-128, using ICV length 16 TXSC: 0000000000001234 on SA 0 stats: OutPktsUntagged InPktsUntagged OutPktsTooLong InPktsNoTag InPktsBadTag InPktsUnknownSCI InPktsNoSCI InPktsOverrun 0 0 0 1227 0 0 1 0 stats: OutPktsProtected OutPktsEncrypted OutOctetsProtected OutOctetsEncrypted 3675399 0 30042696158 0 0: PN 3676633, state on, key 01000000000000000000000000000000 stats: OutPktsProtected OutPktsEncrypted 3675399 0 RXSC: 0000000000001234, state on stats: InOctetsValidated InOctetsDecrypted InPktsUnchecked InPktsDelayed InPktsOK InPktsInvalid InPktsLate InPktsNotValid InPktsNotUsingSA InPktsUnusedSA 30889801305 0 0 0 3798410 0 0 0 0 0 0: PN 3799655, state on, key 01000000000000000000000000000000 stats: InPktsOK InPktsInvalid InPktsNotValid InPktsNotUsingSA InPktsUnusedSA 3798410 0 0 0 0 InPktsDelayed was 218 for Server1 and 0 for Server2. == MACSEC WITH ENCRYPTION == I got the following *with* encryption (macsec interface deleted and recreated before the test, so counters are fresh): *Server1: 19: ms1: protect on validate strict sc off sa off encrypt on send_sci on end_station off scb off replay off cipher suite: GCM-AES-128, using ICV length 16 TXSC: 0000000000001234 on SA 0 stats: OutPktsUntagged InPktsUntagged OutPktsTooLong InPktsNoTag InPktsBadTag InPktsUnknownSCI InPktsNoSCI InPktsOverrun 0 0 0 1397 0 0 0 0 stats: OutPktsProtected OutPktsEncrypted OutOctetsProtected OutOctetsEncrypted 0 5560714 0 46931594623 0: PN 5561948, state on, key 01000000000000000000000000000000 stats: OutPktsProtected OutPktsEncrypted 0 5560714 RXSC: 0000000000001234, state on stats: InOctetsValidated InOctetsDecrypted InPktsUnchecked InPktsDelayed InPktsOK InPktsInvalid InPktsLate InPktsNotValid InPktsNotUsingSA InPktsUnusedSA 0 45977049585 0 3771 5417843 0 0 0 0 0 0: PN 5422860, state on, key 01000000000000000000000000000000 stats: InPktsOK InPktsInvalid InPktsNotValid InPktsNotUsingSA InPktsUnusedSA 5417843 0 0 0 0 *Server2: 19: ms1: protect on validate strict sc off sa off encrypt on send_sci on end_station off scb off replay off cipher suite: GCM-AES-128, using ICV length 16 TXSC: 0000000000001234 on SA 0 stats: OutPktsUntagged InPktsUntagged OutPktsTooLong InPktsNoTag InPktsBadTag InPktsUnknownSCI InPktsNoSCI InPktsOverrun 0 0 0 1490 0 0 0 0 stats: OutPktsProtected OutPktsEncrypted OutOctetsProtected OutOctetsEncrypted 0 5421626 0 45977059885 0: PN 5422860, state on, key 01000000000000000000000000000000 stats: OutPktsProtected OutPktsEncrypted 0 5421626 RXSC: 0000000000001234, state on stats: InOctetsValidated InOctetsDecrypted InPktsUnchecked InPktsDelayed InPktsOK InPktsInvalid InPktsLate InPktsNotValid InPktsNotUsingSA InPktsUnusedSA 0 46931106683 0 109 5560541 0 0 0 0 0 0: PN 5561948, state on, key 01000000000000000000000000000000 stats: InPktsOK InPktsInvalid InPktsNotValid InPktsNotUsingSA InPktsUnusedSA 5560541 0 0 0 0 InPktsDelayed was 3771 for Server1 and 109 for Server2. The performance numbers were: * 9.87 Gb/s without macsec * 6.00 Gb/s with macsec WITHOUT encryption * 9.19 Gb/s with macsec WITH encryption iperf3 retransmits were: * 27 without macsec * 1211 with macsec WITHOUT encryption * 721 with macsec WITH encryption Thanks for the reply and for the background on this. Ryan
Powered by blists - more mailing lists