[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a9af0c0a-ec7c-fa01-05ac-147fccb94fbf@scottdial.com>
Date: Wed, 23 Aug 2023 16:22:31 -0400
From: Scott Dial <scott@...ttdial.com>
To: Sabrina Dubroca <sd@...asysnail.net>, Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, Jonathan Corbet <corbet@....net>,
linux-doc@...r.kernel.org
Subject: Re: [PATCH net-next] macsec: introduce default_async_crypto sysctl
> 2023-08-18, 18:46:48 -0700, Jakub Kicinski wrote:
>> Can we not fix the ordering problem?
>> Queue the packets locally if they get out of order?
AES-NI's implementation of gcm(aes) requires the FPU, so if it's busy
the decrypt gets stuck on the cryptd queue, but that queue is not
order-preserving. If the macsec driver maintained a queue for the netdev
that was order-preserving, then you could resolve the issue, but it adds
more complexity to the macsec driver, so I assume that's why the
maintainers have always desired to revert my patch instead of ensuring
packet order.
With respect to AES-NI's implementation of gcm(aes), it's unfortunate
that there is not a synchronous version that uses the FPU when available
and fallsback to gcm_base(ctr(aes-aesni),ghash-generic) when it's not.
In that case, you would get the benefit of the FPU for the majority of
time when it's available. When I suggested this to linux-crypto, I was
told that relying on synchronous crypto in the macsec driver was wrong:
On 12 Aug 2020 10:45:00 +0000, Pascal Van Leeuwen wrote:
> Forcing the use of sync algorithms only would be detrimental to platforms
> that do not have CPU accelerated crypto, but do have HW acceleration
> for crypto external to the CPU. I understand it's much easier to implement,
> but that is just being lazy IMHO. For bulk crypto of relatively independent
> blocks (networking packets, disk sectors), ASYNC should always be preferred.
So, I abandoned my suggestion to add a fallback. The complexity of the
queueing the macsec driver was beyond the time I had available, and the
regression in performance was not significant for my use case, but I
understand that others may have different requirements. I would
emphasize that benchmarking of network performance should be done by
looking at more than just the interface frame rate. For instance,
out-of-order deliver of packets can trigger TCP backoff. I was never
interested in how many packets the macsec driver could stuff onto the
wire, because the impact was my TCP socket stalling and my UDP streams
being garbled.
On 8/22/2023 11:39 AM, Sabrina Dubroca wrote:
> Actually, looking into the crypto API side, I don't see how they can
> get out of order since commit 81760ea6a95a ("crypto: cryptd - Add
> helpers to check whether a tfm is queued"):
>
> [...] ensure that no reordering is introduced because of requests
> queued in cryptd with respect to requests being processed in
> softirq context.
>
> And cryptd_aead_queued() is used by AESNI (via simd_aead_decrypt()) to
> decide whether to process the request synchronously or not.
I have not been following linux-crypto changes, but I would be surprised
if request is not flagged with CRYPTO_TFM_REQ_MAY_BACKLOG, so it would
be queue. If that's not the case, then the attempt to decrypt would
return -EBUSY, which would translate to a packet error, since
macsec_decrypt MUST handle the skb during the softirq.
> So I really don't get what commit ab046a5d4be4 was trying to fix. I've
> never been able to reproduce that issue, I guess commit 81760ea6a95a
> explains why.
>
> I'd suggest to revert commit ab046a5d4be4, but it feels wrong to
> revert it without really understanding what problem Scott hit and why
> 81760ea6a95a didn't solve it.
I don't think that commit has any relevance to the issue. For instance
with AES-NI, you need to have competing load on the FPU such that
crypto_simd_usable() fails to be true. In the past, I replicated this
failure mode using two SuperMicro 5018D-FN4T servers directly connected
to each other, which is a Xeon-D 1541 w/ Intel 10GbE NIC (ixgbe driver).
From there, I would send /dev/urandom as UDP to the other host. I would
get about 1 out of 10k packets queued on cryptd with that setup. My real
world case was transporting MPEG TS video streams, each about 1k pps, so
that is an decode error in the video stream every 10 seconds.
--
Scott Dial
scott@...ttdial.com
Powered by blists - more mailing lists