netdev - Re: [PATCH net-next] tls: Add opt-in zerocopy mode of sendfile()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220429121117.21bf7490@kernel.org>
Date:   Fri, 29 Apr 2022 12:11:17 -0700
From:   Jakub Kicinski <kuba@...nel.org>
To:     Maxim Mikityanskiy <maximmi@...dia.com>
Cc:     "David S. Miller" <davem@...emloft.net>,
        Daniel Borkmann <daniel@...earbox.net>,
        Paolo Abeni <pabeni@...hat.com>,
        Boris Pismenny <borisp@...dia.com>,
        Tariq Toukan <tariqt@...dia.com>,
        Saeed Mahameed <saeedm@...dia.com>,
        Gal Pressman <gal@...dia.com>, netdev@...r.kernel.org
Subject: Re: [PATCH net-next] tls: Add opt-in zerocopy mode of sendfile()

On Fri, 29 Apr 2022 17:21:59 +0300 Maxim Mikityanskiy wrote:
> On 2022-04-29 01:11, Jakub Kicinski wrote:
> >> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> >> index b12f81a2b44c..715401b20c8b 100644
> >> --- a/net/tls/tls_device.c
> >> +++ b/net/tls/tls_device.c
> >> @@ -411,10 +411,16 @@ static int tls_device_copy_data(void *addr, size_t bytes, struct iov_iter *i)
> >>   	return 0;
> >>   }
> >>   
> >> +union tls_iter_offset {
> >> +	struct iov_iter *msg_iter;
> >> +	int offset;
> >> +};  
> > 
> > Is this sort of atrocity used elsewhere in the kernel?
> > If you can't refactor the code you can pack args into
> > a structure  
> 
> What's the point of packing arguments into a struct in this particular 
> case? Depending on zc_page, I need either msg_iter or offset, and I'm 
> reusing the same CPU register to pass either of them. The performance 
> isn't affected, and the amount of memory used is the same. A struct 
> won't allow to achieve this, it would force me to drag 8 extra bytes, 
> but we already use all 6 registers used to pass parameters on x86_64.

I know why you're doing this, but you're not writing assembly:

+	rc = tls_push_data(sk, (union tls_iter_offset)&msg_iter, size,

+	return tls_push_data(sk, (union tls_iter_offset)&msg_iter, 0, flags,

even if it's legal C (i.e. not UB) it looks awkward.

> > but I've not seen people cast mutually exclusive
> > arguments to a union type.  
> 
> It's the purpose of a union, to hold one of mutually exclusive values, 
> isn't it?

The union itself is not the problem.

> > Is this "inspired" by some higher
> > level language?  
> 
> It's unfortunately inspired by C and its freedom to allow 
> microoptimizations/hacks. The hack here is that I use a pointer being 
> NULL or not-NULL as an indicator what type the other argument has.
> 
> The closest alternative from high-level languages I can think of is 
> enums with attached data from rust or swift. However, rust isn't smart 
> enough to perform the optimization I described, so no, it's not inspired 
> by it :)
> 
> Options that I see here:
> 
> 1. Union.
> 
> 2. Just pass both parameters and use one of them. Drawbacks: now we have 
> 7 parameters, one will be passed through the stack, and it's datapath code.
> 
> 3. Pass `struct iov_iter *` and cast it to `int offset` when zc_page 
> isn't NULL. As we are compiling with -fno-strict-aliasing, and int 
> shouldn't be bigger than a pointer on all supported architectures, it's 
> going to work, and we still have 6 parameters.
> 
> 4. Combine `int offset` and `int flags` into a single 64-bit parameter.
> 
> Which one do you prefer, or do you have anything better in mind?

If you declare the union on the stack in the callers, and pass by value
- is the compiler not going to be clever enough to still DDRT?

> >>   static int tls_push_data(struct sock *sk,
> >> -			 struct iov_iter *msg_iter,
> >> +			 union tls_iter_offset iter_offset,
> >>   			 size_t size, int flags,
> >> -			 unsigned char record_type)
> >> +			 unsigned char record_type,
> >> +			 struct page *zc_page)
> >>   {
> >>   	struct tls_context *tls_ctx = tls_get_ctx(sk);
> >>   	struct tls_prot_info *prot = &tls_ctx->prot_info;
> >> @@ -480,15 +486,29 @@ static int tls_push_data(struct sock *sk,
> >>   		}
> >>   
> >>   		record = ctx->open_record;
> >> -		copy = min_t(size_t, size, (pfrag->size - pfrag->offset));
> >> -		copy = min_t(size_t, copy, (max_open_record_len - record->len));
> >> -
> >> -		if (copy) {
> >> -			rc = tls_device_copy_data(page_address(pfrag->page) +
> >> -						  pfrag->offset, copy, msg_iter);
> >> -			if (rc)
> >> -				goto handle_error;
> >> -			tls_append_frag(record, pfrag, copy);
> >> +
> >> +		if (!zc_page) {
> >> +			copy = min_t(size_t, size, pfrag->size - pfrag->offset);
> >> +			copy = min_t(size_t, copy, max_open_record_len - record->len);  
> > 
> > Nope, refactor this please. 95% sure you don't need 4 indentation
> > levels here. Space left in record can be calculated before the if,
> > then you can do
> > 
> > if (zc_page) {
> > 	..
> > } else if (copy) {
> > 	..
> > }  
> 
> It'll save one indentation level for the zc_page case, but not for the 
> other:
> 
> copy = min_t(size_t, size, max_open_record_len - record->len);
> if (zc_page && copy) {
>      ...
> } else {
>      // I still have to do this in non-zc case:
>      copy = min_t(size_t, copy, pfrag->size - pfrag->offset);

Can pfrag->size - pfrag->offset really be 0 provided
tls_do_allocation() did not return an error?

>      if (copy) {
>          // Same indentation level as in my patch.
>          ...
>      }
> }
> 
> Is it good enough?

> > We should allow setting the option for non-HW. It's an opt-in, the app
> > has opted in, the fact that the kernel will not make use of the liberty
> > to apply the optimization is not important, no?  
> 
> Yes, I agree that if the application opted in, it should work properly 
> regardless of whether the optimization actually did turn on. However, 
> the indication could be useful, for example, for diagnostic purposes, to 
> show the user whether zerocopy mode was enabled, if someone is trying to 
> debug some performance issue. If you insist, though, I can make 
> setsockopt succeed and getsockopt return 1. What do you think?

I'd say "whether the optimization is applicable" rather than "whether
the optimization is turned on". User can check whether the connection
is using SW or HW TLS if they want to make sure it's taken advantage of.

Speaking of which, should we report the state of this knob via socket
diag?