[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <da984a08-1730-1b0c-d845-cf7ec732ba4c@nvidia.com>
Date: Thu, 21 Apr 2022 12:47:18 +0300
From: Maxim Mikityanskiy <maximmi@...dia.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Boris Pismenny <borisp@...dia.com>,
John Fastabend <john.fastabend@...il.com>,
Daniel Borkmann <daniel@...earbox.net>,
"David S. Miller" <davem@...emloft.net>,
Paolo Abeni <pabeni@...hat.com>,
Tariq Toukan <tariqt@...dia.com>,
Aviad Yehezkel <aviadye@...lanox.com>,
Ilya Lesokhin <ilyal@...lanox.com>, netdev@...r.kernel.org
Subject: Re: [PATCH net] tls: Skip tls_append_frag on zero copy size
On 2022-04-18 17:56, Maxim Mikityanskiy wrote:
> On 2022-04-14 13:28, Jakub Kicinski wrote:
>> On Wed, 13 Apr 2022 16:49:56 +0300 Maxim Mikityanskiy wrote:
>>> Calling tls_append_frag when max_open_record_len == record->len might
>>> add an empty fragment to the TLS record if the call happens to be on the
>>> page boundary. Normally tls_append_frag coalesces the zero-sized
>>> fragment to the previous one, but not if it's on page boundary.
>>>
>>> If a resync happens then, the mlx5 driver posts dump WQEs in
>>> tx_post_resync_dump, and the empty fragment may become a data segment
>>> with byte_count == 0, which will confuse the NIC and lead to a CQE
>>> error.
>>>
>>> This commit fixes the described issue by skipping tls_append_frag on
>>> zero size to avoid adding empty fragments. The fix is not in the driver,
>>> because an empty fragment is hardly the desired behavior.
>>>
>>> Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
>>> Signed-off-by: Maxim Mikityanskiy <maximmi@...dia.com>
>>> Reviewed-by: Tariq Toukan <tariqt@...dia.com>
>>> ---
>>> net/tls/tls_device.c | 12 +++++++-----
>>> 1 file changed, 7 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
>>> index 12f7b56771d9..af875ad4a822 100644
>>> --- a/net/tls/tls_device.c
>>> +++ b/net/tls/tls_device.c
>>> @@ -483,11 +483,13 @@ static int tls_push_data(struct sock *sk,
>>> copy = min_t(size_t, size, (pfrag->size - pfrag->offset));
>>> copy = min_t(size_t, copy, (max_open_record_len -
>>> record->len));
>>> - rc = tls_device_copy_data(page_address(pfrag->page) +
>>> - pfrag->offset, copy, msg_iter);
>>> - if (rc)
>>> - goto handle_error;
>>> - tls_append_frag(record, pfrag, copy);
>>> + if (copy) {
>>> + rc = tls_device_copy_data(page_address(pfrag->page) +
>>> + pfrag->offset, copy, msg_iter);
>>> + if (rc)
>>> + goto handle_error;
>>> + tls_append_frag(record, pfrag, copy);
>>> + }
>>
>> I appreciate you're likely trying to keep the fix minimal but Greg
>> always says "fix it right, worry about backports later".
>>
>> I think we should skip more, we can reorder the mins and if
>> min(size, rec space) == 0 then we can skip the allocation as well.
>
> Sorry, I didn't get the idea. Could you elaborate?
>
> Reordering the mins:
>
> copy = min_t(size_t, size, max_open_record_len - record->len);
> copy = min_t(size_t, copy, pfrag->size - pfrag->offset);
>
> I assume by skipping the allocation you mean skipping
> tls_do_allocation(), right? Do you suggest to skip it if the result of
> the first min_t() is 0?
>
> record->len used in the first min_t() comes from ctx->open_record, which
> either exists or is allocated by tls_do_allocation(). If we move the
> copy == 0 check above the tls_do_allocation() call, first we'll have to
> check whether ctx->open_record is NULL, which is currently checked by
> tls_do_allocation() itself.
>
> If open_record is not NULL, there isn't much to skip in
> tls_do_allocation on copy == 0, the main part is already skipped,
> regardless of the value of copy. If open_record is NULL, we can't skip
> tls_do_allocation, and copy won't be 0 afterwards.
>
> To compare, before (pseudocode):
>
> tls_do_allocation {
> if (!ctx->open_record)
> ALLOCATE RECORD
> Now ctx->open_record is not NULL
> if (!sk_page_frag_refill(sk, pfrag))
> return -ENOMEM
> }
> handle errors from tls_do_allocation
> copy = min(size, pfrag->size - pfrag->offset)
> copy = min(copy, max_open_record_len - ctx->open_record->len)
> if (copy)
> copy data and append frag
>
> After:
>
> if (ctx->open_record) {
> copy = min(size, max_open_record_len - ctx->open_record->len)
> if (copy) {
> // You want to put this part of tls_do_allocation under if (copy)?
> if (!sk_page_frag_refill(sk, pfrag))
> handle errors
> copy = min(copy, pfrag->size - pfrag->offset)
> if (copy)
> copy data and append frag
> }
> } else {
> ALLOCATE RECORD
> if (!sk_page_frag_refill(sk, pfrag))
> handle errors
> // Have to do this after the allocation anyway.
> copy = min(size, max_open_record_len - ctx->open_record->len)
> copy = min(copy, pfrag->size - pfrag->offset)
> if (copy)
> copy data and append frag
> }
>
> Either I totally don't get what you suggested, or it doesn't make sense
> to me, because we have +1 branch in the common path when a record is
> open and copy is not 0, no changes when there is no record, and more
> repeating code hard to compress.
>
> If I missed your idea, please explain in more details.
Jakub, is your comment still relevant after my response? If not, can the
patch be merged?
>> Maybe some application wants to do zero-length sends to flush the
>> MSG_MORE and would benefit that way?
>
> If it's a zero-length send, it means that size is 0 initially, and
> max_open_record_len - ctx->open_record->len isn't 0 (otherwise the
> record would have been closed at a previous iteration). That doesn't
> sound related to swapping the mins and skipping tls_do_allocation on
> copy == 0.
>
> Thanks,
> Max
>
>>> size -= copy;
>>> if (!size) {
>>
>
Powered by blists - more mailing lists