lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 21 Apr 2022 12:47:18 +0300 From: Maxim Mikityanskiy <maximmi@...dia.com> To: Jakub Kicinski <kuba@...nel.org> Cc: Boris Pismenny <borisp@...dia.com>, John Fastabend <john.fastabend@...il.com>, Daniel Borkmann <daniel@...earbox.net>, "David S. Miller" <davem@...emloft.net>, Paolo Abeni <pabeni@...hat.com>, Tariq Toukan <tariqt@...dia.com>, Aviad Yehezkel <aviadye@...lanox.com>, Ilya Lesokhin <ilyal@...lanox.com>, netdev@...r.kernel.org Subject: Re: [PATCH net] tls: Skip tls_append_frag on zero copy size On 2022-04-18 17:56, Maxim Mikityanskiy wrote: > On 2022-04-14 13:28, Jakub Kicinski wrote: >> On Wed, 13 Apr 2022 16:49:56 +0300 Maxim Mikityanskiy wrote: >>> Calling tls_append_frag when max_open_record_len == record->len might >>> add an empty fragment to the TLS record if the call happens to be on the >>> page boundary. Normally tls_append_frag coalesces the zero-sized >>> fragment to the previous one, but not if it's on page boundary. >>> >>> If a resync happens then, the mlx5 driver posts dump WQEs in >>> tx_post_resync_dump, and the empty fragment may become a data segment >>> with byte_count == 0, which will confuse the NIC and lead to a CQE >>> error. >>> >>> This commit fixes the described issue by skipping tls_append_frag on >>> zero size to avoid adding empty fragments. The fix is not in the driver, >>> because an empty fragment is hardly the desired behavior. >>> >>> Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") >>> Signed-off-by: Maxim Mikityanskiy <maximmi@...dia.com> >>> Reviewed-by: Tariq Toukan <tariqt@...dia.com> >>> --- >>> net/tls/tls_device.c | 12 +++++++----- >>> 1 file changed, 7 insertions(+), 5 deletions(-) >>> >>> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c >>> index 12f7b56771d9..af875ad4a822 100644 >>> --- a/net/tls/tls_device.c >>> +++ b/net/tls/tls_device.c >>> @@ -483,11 +483,13 @@ static int tls_push_data(struct sock *sk, >>> copy = min_t(size_t, size, (pfrag->size - pfrag->offset)); >>> copy = min_t(size_t, copy, (max_open_record_len - >>> record->len)); >>> - rc = tls_device_copy_data(page_address(pfrag->page) + >>> - pfrag->offset, copy, msg_iter); >>> - if (rc) >>> - goto handle_error; >>> - tls_append_frag(record, pfrag, copy); >>> + if (copy) { >>> + rc = tls_device_copy_data(page_address(pfrag->page) + >>> + pfrag->offset, copy, msg_iter); >>> + if (rc) >>> + goto handle_error; >>> + tls_append_frag(record, pfrag, copy); >>> + } >> >> I appreciate you're likely trying to keep the fix minimal but Greg >> always says "fix it right, worry about backports later". >> >> I think we should skip more, we can reorder the mins and if >> min(size, rec space) == 0 then we can skip the allocation as well. > > Sorry, I didn't get the idea. Could you elaborate? > > Reordering the mins: > > copy = min_t(size_t, size, max_open_record_len - record->len); > copy = min_t(size_t, copy, pfrag->size - pfrag->offset); > > I assume by skipping the allocation you mean skipping > tls_do_allocation(), right? Do you suggest to skip it if the result of > the first min_t() is 0? > > record->len used in the first min_t() comes from ctx->open_record, which > either exists or is allocated by tls_do_allocation(). If we move the > copy == 0 check above the tls_do_allocation() call, first we'll have to > check whether ctx->open_record is NULL, which is currently checked by > tls_do_allocation() itself. > > If open_record is not NULL, there isn't much to skip in > tls_do_allocation on copy == 0, the main part is already skipped, > regardless of the value of copy. If open_record is NULL, we can't skip > tls_do_allocation, and copy won't be 0 afterwards. > > To compare, before (pseudocode): > > tls_do_allocation { > if (!ctx->open_record) > ALLOCATE RECORD > Now ctx->open_record is not NULL > if (!sk_page_frag_refill(sk, pfrag)) > return -ENOMEM > } > handle errors from tls_do_allocation > copy = min(size, pfrag->size - pfrag->offset) > copy = min(copy, max_open_record_len - ctx->open_record->len) > if (copy) > copy data and append frag > > After: > > if (ctx->open_record) { > copy = min(size, max_open_record_len - ctx->open_record->len) > if (copy) { > // You want to put this part of tls_do_allocation under if (copy)? > if (!sk_page_frag_refill(sk, pfrag)) > handle errors > copy = min(copy, pfrag->size - pfrag->offset) > if (copy) > copy data and append frag > } > } else { > ALLOCATE RECORD > if (!sk_page_frag_refill(sk, pfrag)) > handle errors > // Have to do this after the allocation anyway. > copy = min(size, max_open_record_len - ctx->open_record->len) > copy = min(copy, pfrag->size - pfrag->offset) > if (copy) > copy data and append frag > } > > Either I totally don't get what you suggested, or it doesn't make sense > to me, because we have +1 branch in the common path when a record is > open and copy is not 0, no changes when there is no record, and more > repeating code hard to compress. > > If I missed your idea, please explain in more details. Jakub, is your comment still relevant after my response? If not, can the patch be merged? >> Maybe some application wants to do zero-length sends to flush the >> MSG_MORE and would benefit that way? > > If it's a zero-length send, it means that size is 0 initially, and > max_open_record_len - ctx->open_record->len isn't 0 (otherwise the > record would have been closed at a previous iteration). That doesn't > sound related to swapping the mins and skipping tls_do_allocation on > copy == 0. > > Thanks, > Max > >>> size -= copy; >>> if (!size) { >> >
Powered by blists - more mailing lists