netdev - Re: [PATCH v2 01/25] tcp: authopt: Initial support and key management

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5cfd1a51-1607-702e-5def-f8affce18db6@gmail.com>
Date:   Fri, 5 Nov 2021 20:00:50 +0200
From:   Leonard Crestez <cdleonard@...il.com>
To:     Dmitry Safonov <0x7f454c46@...il.com>
Cc:     "David S. Miller" <davem@...emloft.net>,
        Herbert Xu <herbert@...dor.apana.org.au>,
        Kuniyuki Iwashima <kuniyu@...zon.co.jp>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        Jakub Kicinski <kuba@...nel.org>,
        Yuchung Cheng <ycheng@...gle.com>,
        Francesco Ruggeri <fruggeri@...sta.com>,
        Mat Martineau <mathew.j.martineau@...ux.intel.com>,
        Christoph Paasch <cpaasch@...le.com>,
        Ivan Delalande <colona@...sta.com>,
        Priyaranjan Jha <priyarjha@...gle.com>, netdev@...r.kernel.org,
        linux-crypto@...r.kernel.org, linux-kselftest@...r.kernel.org,
        linux-kernel@...r.kernel.org, Eric Dumazet <edumazet@...gle.com>,
        Shuah Khan <shuah@...nel.org>, David Ahern <dsahern@...nel.org>
Subject: Re: [PATCH v2 01/25] tcp: authopt: Initial support and key management

On 11/5/21 4:50 PM, Dmitry Safonov wrote:
> On 11/5/21 07:04, Leonard Crestez wrote:
>> On 11/5/21 3:22 AM, Dmitry Safonov wrote:
> [..]
>>> I remember we discussed it in RFC, that removing a key that's currently
>>> in use may result in random MKT to be used.
>>>
>>> I think, it's possible to make this API a bit more predictable if:
>>> - DEL command fails to remove a key that is current/receive_next;
>>> - opt.flags has CURR/NEXT flag that has corresponding `u8 current_key`
>>> and `u8 receive_next` values. As socket lock is held - that makes
>>> current_key/receive_next change atomic with deletion of an existing key
>>> that might have been in use.
>>>
>>> In result user may remove a key that's not in use or has to set new
>>> current/next. Which avoids the issue with random MKT being used to sign
>>> segments.
>>
>> The MKT used to sign segments is already essentially random unless the
>> user makes a deliberate choice. This is what happens if you add two keys
>> an call connect(). But why is this a problem?
> 
> The issue is predictability and less control for a user on how the key
> is selected.
> 
> Let's say as a user I have two MKTs A and B. I want to use A for 6 weeks
> and then change to B. I want to switch to B as soon as the admin of the
> peer adds the key and the peer sends me (rnext_key = B.id).
> 
> With your semantics currently a random key will be used as long as I
> don't "lock" the id which means that rnext_key won't be respected.
> So there's clearly less predictability for a user to select current key
> in use.

RFC makes two requirements regarding keyid selection:
A) 7.2: TCP SEND [..] MUST be augmented so that the preferred outgoing 
MKT (current_key) can be indicated.
B) 7.5.2.e: Key must be switch to rnextkeyid if that key is available.

These requirements are in conflict so I added a flag 
TCP_AUTHOPT_LOCK_KEYID to determine if keyid is determined by local 
userspace or based on the peer's rnextkeyid.

Without a "locking" bit any key selections made from userspace would get 
flipped by incoming traffic. Indeed together with your suggestion of not 
allowing the current key to be deleted it would be possible for delete 
to fail repeatedly because the peer keeps sending us valid packets!

The expectation is that complex applications will use the "locking" 
functionality and handle the switch to recv_rnextkeyid themselves. 
Alternatively it's also possible for peers to only control rnextkeyid 
and perform key switch that way.

In your scenario the key will switch automatically as soon as the peer 
sends "B.send_id" in the rnextkeyid field.

Please note that key selection is only fully implemented in PATCH 19, 
without it the behavior is indeed more random. That patch was separated 
for ease of review and because detailed behavior is worth a separate 
discussion.

Entirely different key selections mechanisms are possible, for example 
each key could have a "preference" score. The most detailed discussion 
of key rollover I found is from Cisco:

https://www.cisco.com/c/en/us/td/docs/ios-xml/ios/iproute_pi/configuration/xe-16-12/iri-xe-16-12-book/tcp-ao.html#concept_mtn_h4n_j3b

That document describes key rollover based on lifetime intervals for 
each key. I believe my patches provide sufficient control to implement 
the same but that it is a concern for userspace.

>> Applications which want to deliberately control the send key can do so
>> with TCP_AUTHOPT_FLAG_LOCK_KEYID. If that flag is not set then the key
>> with send_id == recv_rnextkeyid is preffered as suggested by the RFC, or
>> a random one on connect.
>>
>> I think your suggestion would force additional complexity on all
>> applications for no clear gain.
> 
> I disagree. From RFC (3.1):
> 
> "It is presumed that an MKT affecting a particular connection cannot be
> destroyed during an active connection -- or, equivalently, that its
> parameters are copied to an area local to the connection (i.e.,
> instantiated) and so changes would affect only new connections."
> 
> which means that the user shouldn't be able to remove a key in use.
> So, by default you should return an error if the key in use being deleted.

I believe this behavior belongs in application software.

> The only use-case to delete a key that is in use is if it has been
> compromised RFC(6.1):
> 
> "Deciding when to start using a key is a performance issue. Deciding
> when to remove an MKT is a security issue. Invalid MKTs are expected to
> be removed. TCP-AO provides no mechanism to coordinate their removal, as
> we consider this a key management operation."
> 
> I might misread the RFC, but it seems that shouldn't happen in an
> ordinary usage scenario (as long as the user don't --force removal of
> the compromised key in an exceptional case).
> 
> So, if you allow a user to set current_key/rnext_key atomically with
> removal - it seems to fit this --force use-case and let user more
> control over which key is in use.

Control is available: user can "lock" a different key before removing 
the current one.

The kernel just doesn't make this mandatory.

>> Key selection controls are only added much later in the series, this is
>> also part of the effort to split the code into readable patches. See
>> this patch:
>>
>> https://lore.kernel.org/netdev/2dc569c0d60c80c26aafcaa201ba5b5ec53ce6bd.1635784253.git.cdleonard@gmail.com/
> 
> A separate issue with that one (if I'm not misreading) seems to be that
> you're going to send segments with info->send_rnextkeyid if the deleted
> key was TCP_AUTHOPT_FLAG_LOCK_RNEXTKEYID one.
> And won't be able to verify the peer inbound segments/replies.

I'm not sure I understand this.

If TCP_AUTHOPT_FLAG_LOCK_RNEXTKEYID is set then the rnextkeyid byte in 
output packets is controlled by user directly, this behavior is kept 
deliberately simple.

Otherwise we send the recv_id of the current key, this attempts to 
ensure symmetry.

Userspace is expected to use the recv_id of a valid key with 
TCP_AUTHOPT_FLAG_LOCK_RNEXTKEYID, otherwise the connection will indeed 
break. The kernel could enforce that this value is valid but it does not 
attempt to prevent userspace bugs.

>> Removing a key while traffic is happening shouldn't cause failures in
>> recv or send code; this takes some effort but is also required to
>> prevent auth failures when a socket is closed and transitions to
>> timewait. I attempted to ensure this by only doing rcu_dereference for
>> tcp_authopt_info and tcp_authopt_key_info once per packet.