netdev - Re: [PATCH net-next v2 0/5] tls: implement key updates for TLS1.3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y/eT/M+b6jUtTdng@hog>
Date:   Thu, 23 Feb 2023 17:27:40 +0100
From:   Sabrina Dubroca <sd@...asysnail.net>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     netdev@...r.kernel.org, Vadim Fedorenko <vfedorenko@...ek.ru>,
        Frantisek Krenzelok <fkrenzel@...hat.com>,
        Kuniyuki Iwashima <kuniyu@...zon.com>,
        Apoorv Kothari <apoorvko@...zon.com>,
        Boris Pismenny <borisp@...dia.com>,
        John Fastabend <john.fastabend@...il.com>,
        Shuah Khan <shuah@...nel.org>, linux-kselftest@...r.kernel.org,
        Gal Pressman <gal@...dia.com>,
        Marcel Holtmann <marcel@...tmann.org>
Subject: Re: [PATCH net-next v2 0/5] tls: implement key updates for TLS1.3

2023-02-21, 19:19:44 -0800, Jakub Kicinski wrote:
> Sorry for the delay, long weekend + merge window.

No worries, I wasn't expecting much activity on this from you during
the merge window.

> On Thu, 16 Feb 2023 17:23:19 +0100 Sabrina Dubroca wrote:
> > 2023-02-15, 19:57:48 -0800, Jakub Kicinski wrote:
> > > I think we could try to switch to SW crypto on Tx until all data using
> > > old key is ACK'ed, drivers can look at skb->decrypted to skip touching
> > > the transitional skbs. Then remove old key, install new one, resume
> > > offload.  
> > 
> > "all data using the old key" needs to be one list of record per old
> > key, since we can have multiple rekeys.
> 
> No fully parsing this bit.

We can have multiple rekeys in the time it takes to get an ACK for the
first KeyUpdate message to be ACK'ed. I'm not sure why I talked about
a "list of records".

But we could have this sequence of records:

  recN(k1,hwenc)
  KeyUpdate(k1,hwenc)
  // switch to k2 and sw crypto

  rec0(k2,swenc)
  rec1(k2,swenc)
  KeyUpdate(k2,swenc)
  rec0(k3,swenc)
  // receive ACK for KU1, don't enable HW offload for k2 or k3 because we've already switched off k2

  rec1(k3,swenc)
  // receive ACK for KU2, now enable HW offload for k3

  rec2(k3,hwenc)

So we'll need to record the most recent TX rekey, and wait until the
corresponding KU record is ACK'ed, before we resume offload using the
most recent key (and skip possible intermediate keys).

Installing the key in HW and re-enabling the offload will need to
happen via the icsk_clean_acked callback. We'll need a workqueue so
that we don't actually talk to the driver from softirq.

Then, we have to handle a failure to install the key. Since we're not
installing it in HW immediately during setsockopt, notifying userspace
of a rekey failure is more complicated. Maybe we can do a
rekey_prepare during the setsocktopt, and then the actual rekey is an
operation that cannot fail?

> > Could we install the new key in HW a bit earlier? Keep the old key as
> > SW fallback for rtx, but the driver installs the new key when the
> > corresponding KeyUpdate record has gone through and tells the stack to
> > stop doing SW crypto? I'm not sure that'd be a significant improvement
> > in the standard case, though.
> 
> Important consideration is making the non-rekey path as fast as
> possible (given rekeying is extremely rare). Looking at skb->decrypted
> should be very fast but we can possibly fit some other indication of
> "are we rekeying" into another already referenced cache line.
> We definitely don't want to have to look up the record to know what
> state we're in.
> 
> The fallback can't use AES-NI (it's in sirq context) so it's slower 
> than SW encrypt before queuing to TCP. Hence my first thought is using
> SW crypto for new key and let the traffic we already queued with old
> key drain leveraging HW crypto. But as I said the impact on performance
> when not rekeying is more important, and so is driver simplicity.

Right, sorry, full tls_sw path and not the existing fallback.

Changing the socket ops back and forth between the HW and SW variants
worries me, because we only lock the socket once we have entered
tls_{device,sw}_sendmsg. So I think we have to stay on the _device ops
even during the SW crypto phase of the rekey, and let that call into
the SW variant after locking the socket and making sure we're in a
rekey.

> > > We may need special care to make sure we don't try to encrypt the same
> > > packet with both keys. In case a rtx gets stuck somewhere and comes to
> > > the NIC after it's already acked (happens surprisingly often).  
> > 
> > Don't we have that already? If there's a retransmit while we're
> > setting the TX key in HW, data that was queued on the socket before
> > (and shouldn't be encrypted at all) would also be encrypted
> > otherwise. Or is it different with rekey?
> 
> We have a "start marker" record which is supposed to indicate that
> anything before it has already been encrypted. The driver is programmed
> with the start seq no, when it sees a packet from before this seq no
> it checks if a record exists, finds its before the start marker and
> sends the data as is.

Yes, I was looking into that earlier this week. I think we could reuse
a similar mechanism for rekeying. tls_dev_add takes tcp_sk->write_seq,
we could have a tls_dev_rekey op passing the new key and new write_seq
to the driver. I think we can also reuse the ->eor trick from
tls_set_device_offload, and we wouldn't have to look at
skb->decrypted. Close and push the current SW record, mark ->eor, pass
write_seq to the driver along with the key. Also pretty close to what
tls_device_resync_tx does.

[...]
> > This makes me wonder again if we should have fake offloads on veth
> > (still calling the kernel's crypto library to simulate a device doing
> > the encryption and/or decryption), to make it easy to play with the
> > software bits, without requiring actual hardware that can offload
> > TLS/IPsec/MACsec. But maybe it's too complex to be useful and we'd
> > just waste our time fixing bugs in the fake offload rather than
> > improving the stack.
> 
> It should be quite useful. I also usually just hack up veth, but 
> I reckon adding support to netdevsim would be a better fit.
> We just need a way to tell two netdevsim ports to "connect to each
> other".

Oh, nice idea. I'll add that to my todo list.

-- 
Sabrina