netdev - Re: [PATCH net] tls: Skip tls_append

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3c90d3cd-5224-4224-e9d9-e45546ce51c6@nvidia.com>
Date:   Mon, 18 Apr 2022 17:56:26 +0300
From:   Maxim Mikityanskiy <maximmi@...dia.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     Boris Pismenny <borisp@...dia.com>,
        John Fastabend <john.fastabend@...il.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        "David S. Miller" <davem@...emloft.net>,
        Paolo Abeni <pabeni@...hat.com>,
        Tariq Toukan <tariqt@...dia.com>,
        Aviad Yehezkel <aviadye@...lanox.com>,
        Ilya Lesokhin <ilyal@...lanox.com>, netdev@...r.kernel.org
Subject: Re: [PATCH net] tls: Skip tls_append_frag on zero copy size

On 2022-04-14 13:28, Jakub Kicinski wrote:
> On Wed, 13 Apr 2022 16:49:56 +0300 Maxim Mikityanskiy wrote:
>> Calling tls_append_frag when max_open_record_len == record->len might
>> add an empty fragment to the TLS record if the call happens to be on the
>> page boundary. Normally tls_append_frag coalesces the zero-sized
>> fragment to the previous one, but not if it's on page boundary.
>>
>> If a resync happens then, the mlx5 driver posts dump WQEs in
>> tx_post_resync_dump, and the empty fragment may become a data segment
>> with byte_count == 0, which will confuse the NIC and lead to a CQE
>> error.
>>
>> This commit fixes the described issue by skipping tls_append_frag on
>> zero size to avoid adding empty fragments. The fix is not in the driver,
>> because an empty fragment is hardly the desired behavior.
>>
>> Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
>> Signed-off-by: Maxim Mikityanskiy <maximmi@...dia.com>
>> Reviewed-by: Tariq Toukan <tariqt@...dia.com>
>> ---
>>   net/tls/tls_device.c | 12 +++++++-----
>>   1 file changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
>> index 12f7b56771d9..af875ad4a822 100644
>> --- a/net/tls/tls_device.c
>> +++ b/net/tls/tls_device.c
>> @@ -483,11 +483,13 @@ static int tls_push_data(struct sock *sk,
>>   		copy = min_t(size_t, size, (pfrag->size - pfrag->offset));
>>   		copy = min_t(size_t, copy, (max_open_record_len - record->len));
>>   
>> -		rc = tls_device_copy_data(page_address(pfrag->page) +
>> -					  pfrag->offset, copy, msg_iter);
>> -		if (rc)
>> -			goto handle_error;
>> -		tls_append_frag(record, pfrag, copy);
>> +		if (copy) {
>> +			rc = tls_device_copy_data(page_address(pfrag->page) +
>> +						  pfrag->offset, copy, msg_iter);
>> +			if (rc)
>> +				goto handle_error;
>> +			tls_append_frag(record, pfrag, copy);
>> +		}
> 
> I appreciate you're likely trying to keep the fix minimal but Greg
> always says "fix it right, worry about backports later".
> 
> I think we should skip more, we can reorder the mins and if
> min(size, rec space) == 0 then we can skip the allocation as well.

Sorry, I didn't get the idea. Could you elaborate?

Reordering the mins:

copy = min_t(size_t, size, max_open_record_len - record->len);
copy = min_t(size_t, copy, pfrag->size - pfrag->offset);

I assume by skipping the allocation you mean skipping 
tls_do_allocation(), right? Do you suggest to skip it if the result of 
the first min_t() is 0?

record->len used in the first min_t() comes from ctx->open_record, which 
either exists or is allocated by tls_do_allocation(). If we move the 
copy == 0 check above the tls_do_allocation() call, first we'll have to 
check whether ctx->open_record is NULL, which is currently checked by 
tls_do_allocation() itself.

If open_record is not NULL, there isn't much to skip in 
tls_do_allocation on copy == 0, the main part is already skipped, 
regardless of the value of copy. If open_record is NULL, we can't skip 
tls_do_allocation, and copy won't be 0 afterwards.

To compare, before (pseudocode):

tls_do_allocation {
     if (!ctx->open_record)
         ALLOCATE RECORD
         Now ctx->open_record is not NULL
     if (!sk_page_frag_refill(sk, pfrag))
         return -ENOMEM
}
handle errors from tls_do_allocation
copy = min(size, pfrag->size - pfrag->offset)
copy = min(copy, max_open_record_len - ctx->open_record->len)
if (copy)
     copy data and append frag

After:

if (ctx->open_record) {
     copy = min(size, max_open_record_len - ctx->open_record->len)
     if (copy) {
         // You want to put this part of tls_do_allocation under if (copy)?
         if (!sk_page_frag_refill(sk, pfrag))
             handle errors
         copy = min(copy, pfrag->size - pfrag->offset)
         if (copy)
             copy data and append frag
     }
} else {
     ALLOCATE RECORD
     if (!sk_page_frag_refill(sk, pfrag))
         handle errors
     // Have to do this after the allocation anyway.
     copy = min(size, max_open_record_len - ctx->open_record->len)
     copy = min(copy, pfrag->size - pfrag->offset)
     if (copy)
         copy data and append frag
}

Either I totally don't get what you suggested, or it doesn't make sense 
to me, because we have +1 branch in the common path when a record is 
open and copy is not 0, no changes when there is no record, and more 
repeating code hard to compress.

If I missed your idea, please explain in more details.

> Maybe some application wants to do zero-length sends to flush the
> MSG_MORE and would benefit that way?

If it's a zero-length send, it means that size is 0 initially, and 
max_open_record_len - ctx->open_record->len isn't 0 (otherwise the 
record would have been closed at a previous iteration). That doesn't 
sound related to swapping the mins and skipping tls_do_allocation on 
copy == 0.

Thanks,
Max

>>   		size -= copy;
>>   		if (!size) {
>