[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3c90d3cd-5224-4224-e9d9-e45546ce51c6@nvidia.com>
Date: Mon, 18 Apr 2022 17:56:26 +0300
From: Maxim Mikityanskiy <maximmi@...dia.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Boris Pismenny <borisp@...dia.com>,
John Fastabend <john.fastabend@...il.com>,
Daniel Borkmann <daniel@...earbox.net>,
"David S. Miller" <davem@...emloft.net>,
Paolo Abeni <pabeni@...hat.com>,
Tariq Toukan <tariqt@...dia.com>,
Aviad Yehezkel <aviadye@...lanox.com>,
Ilya Lesokhin <ilyal@...lanox.com>, netdev@...r.kernel.org
Subject: Re: [PATCH net] tls: Skip tls_append_frag on zero copy size
On 2022-04-14 13:28, Jakub Kicinski wrote:
> On Wed, 13 Apr 2022 16:49:56 +0300 Maxim Mikityanskiy wrote:
>> Calling tls_append_frag when max_open_record_len == record->len might
>> add an empty fragment to the TLS record if the call happens to be on the
>> page boundary. Normally tls_append_frag coalesces the zero-sized
>> fragment to the previous one, but not if it's on page boundary.
>>
>> If a resync happens then, the mlx5 driver posts dump WQEs in
>> tx_post_resync_dump, and the empty fragment may become a data segment
>> with byte_count == 0, which will confuse the NIC and lead to a CQE
>> error.
>>
>> This commit fixes the described issue by skipping tls_append_frag on
>> zero size to avoid adding empty fragments. The fix is not in the driver,
>> because an empty fragment is hardly the desired behavior.
>>
>> Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
>> Signed-off-by: Maxim Mikityanskiy <maximmi@...dia.com>
>> Reviewed-by: Tariq Toukan <tariqt@...dia.com>
>> ---
>> net/tls/tls_device.c | 12 +++++++-----
>> 1 file changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
>> index 12f7b56771d9..af875ad4a822 100644
>> --- a/net/tls/tls_device.c
>> +++ b/net/tls/tls_device.c
>> @@ -483,11 +483,13 @@ static int tls_push_data(struct sock *sk,
>> copy = min_t(size_t, size, (pfrag->size - pfrag->offset));
>> copy = min_t(size_t, copy, (max_open_record_len - record->len));
>>
>> - rc = tls_device_copy_data(page_address(pfrag->page) +
>> - pfrag->offset, copy, msg_iter);
>> - if (rc)
>> - goto handle_error;
>> - tls_append_frag(record, pfrag, copy);
>> + if (copy) {
>> + rc = tls_device_copy_data(page_address(pfrag->page) +
>> + pfrag->offset, copy, msg_iter);
>> + if (rc)
>> + goto handle_error;
>> + tls_append_frag(record, pfrag, copy);
>> + }
>
> I appreciate you're likely trying to keep the fix minimal but Greg
> always says "fix it right, worry about backports later".
>
> I think we should skip more, we can reorder the mins and if
> min(size, rec space) == 0 then we can skip the allocation as well.
Sorry, I didn't get the idea. Could you elaborate?
Reordering the mins:
copy = min_t(size_t, size, max_open_record_len - record->len);
copy = min_t(size_t, copy, pfrag->size - pfrag->offset);
I assume by skipping the allocation you mean skipping
tls_do_allocation(), right? Do you suggest to skip it if the result of
the first min_t() is 0?
record->len used in the first min_t() comes from ctx->open_record, which
either exists or is allocated by tls_do_allocation(). If we move the
copy == 0 check above the tls_do_allocation() call, first we'll have to
check whether ctx->open_record is NULL, which is currently checked by
tls_do_allocation() itself.
If open_record is not NULL, there isn't much to skip in
tls_do_allocation on copy == 0, the main part is already skipped,
regardless of the value of copy. If open_record is NULL, we can't skip
tls_do_allocation, and copy won't be 0 afterwards.
To compare, before (pseudocode):
tls_do_allocation {
if (!ctx->open_record)
ALLOCATE RECORD
Now ctx->open_record is not NULL
if (!sk_page_frag_refill(sk, pfrag))
return -ENOMEM
}
handle errors from tls_do_allocation
copy = min(size, pfrag->size - pfrag->offset)
copy = min(copy, max_open_record_len - ctx->open_record->len)
if (copy)
copy data and append frag
After:
if (ctx->open_record) {
copy = min(size, max_open_record_len - ctx->open_record->len)
if (copy) {
// You want to put this part of tls_do_allocation under if (copy)?
if (!sk_page_frag_refill(sk, pfrag))
handle errors
copy = min(copy, pfrag->size - pfrag->offset)
if (copy)
copy data and append frag
}
} else {
ALLOCATE RECORD
if (!sk_page_frag_refill(sk, pfrag))
handle errors
// Have to do this after the allocation anyway.
copy = min(size, max_open_record_len - ctx->open_record->len)
copy = min(copy, pfrag->size - pfrag->offset)
if (copy)
copy data and append frag
}
Either I totally don't get what you suggested, or it doesn't make sense
to me, because we have +1 branch in the common path when a record is
open and copy is not 0, no changes when there is no record, and more
repeating code hard to compress.
If I missed your idea, please explain in more details.
> Maybe some application wants to do zero-length sends to flush the
> MSG_MORE and would benefit that way?
If it's a zero-length send, it means that size is 0 initially, and
max_open_record_len - ctx->open_record->len isn't 0 (otherwise the
record would have been closed at a previous iteration). That doesn't
sound related to swapping the mins and skipping tls_do_allocation on
copy == 0.
Thanks,
Max
>> size -= copy;
>> if (!size) {
>
Powered by blists - more mailing lists