[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG48ez0TfTAkaRWFCTb44x=TWP_sDZVx-5U2hvfQSFOhghNrCA@mail.gmail.com>
Date: Tue, 5 Dec 2023 17:40:24 +0100
From: Jann Horn <jannh@...gle.com>
To: Pablo Neira Ayuso <pablo@...filter.org>,
Jozsef Kadlecsik <kadlec@...filter.org>,
Florian Westphal <fw@...len.de>,
netfilter-devel <netfilter-devel@...r.kernel.org>,
coreteam@...filter.org
Cc: Christian Brauner <brauner@...nel.org>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Network Development <netdev@...r.kernel.org>,
kernel list <linux-kernel@...r.kernel.org>
Subject: Is xt_owner's owner_mt() racy with sock_orphan()? [worse with new
TYPESAFE_BY_RCU file lifetime?]
Hi!
I think this code is racy, but testing that seems like a pain...
owner_mt() in xt_owner runs in context of a NF_INET_LOCAL_OUT or
NF_INET_POST_ROUTING hook. It first checks that sk->sk_socket is
non-NULL, then checks that sk->sk_socket->file is non-NULL, then
accesses the ->f_cred of that file.
I don't see anything that protects this against a concurrent
sock_orphan(), which NULLs out the sk->sk_socket pointer, if we're in
the context of a TCP retransmit or something like that. So I think we
can theoretically have a NULL deref, though the compiler will probably
optimize it away by merging the two loads of sk->sk_socket:
static bool
owner_mt(const struct sk_buff *skb, struct xt_action_param *par)
{
[...]
const struct file *filp;
struct sock *sk = skb_to_full_sk(skb);
[...]
if (!sk || !sk->sk_socket || !net_eq(net, sock_net(sk)))
return (info->match ^ info->invert) == 0;
else if (info->match & info->invert & XT_OWNER_SOCKET)
[...]
// concurrent sock_orphan() while we're here
// null deref on second access to sk->sk_socket
filp = sk->sk_socket->file;
if (filp == NULL)
[...]
[...]
}
(Sidenote: I guess this also means xt_owner will treat a sock as
having no owner as soon as it is orphaned? Which probably means that
when a TCP socket enters linger state because it is close()d with data
remaining in the output buffer, the remaining packets will be treated
differently by netfilter?)
I also think we have no guarantee here that the socket's ->file won't
go away due to a concurrent __sock_release(), which could cause us to
continue reading file credentials out of a file whose refcount has
already dropped to zero?
That was probably working sorta okay-ish in practice until now because
files had RCU lifetime, "struct cred" also normally has RCU lifetime,
and nf_hook() runs in an RCU read-side critical section; but now that
"struct file" uses SLAB_TYPESAFE_BY_RCU, I think we can theoretically
race such that the "struct file" could have been freed and reallocated
in the meantime, causing us to see an entirely unrelated "struct file"
and look at creds that are unrelated to the context from which the
packet was sent.
But again, I haven't actually tested this yet, I might be getting it wrong.
Powered by blists - more mailing lists