[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <108a1c80eed41516f85ebb264d0f46f95e86f754.camel@redhat.com>
Date: Thu, 01 Dec 2022 19:26:43 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Ondrej Mosnacek <omosnace@...hat.com>,
SElinux list <selinux@...r.kernel.org>,
Linux Security Module list
<linux-security-module@...r.kernel.org>, mptcp@...ts.linux.dev
Cc: network dev <netdev@...r.kernel.org>,
Mat Martineau <mathew.j.martineau@...ux.intel.com>,
Matthieu Baerts <matthieu.baerts@...sares.net>,
Paul Moore <paul@...l-moore.com>
Subject: Re: Broken SELinux/LSM labeling with MPTCP and accept(2)
Hello,
On Thu, 2022-12-01 at 14:42 +0100, Ondrej Mosnacek wrote:
> As discovered by our QE, there is a problem with how the
> (userspace-facing) sockets returned by accept(2) are labeled when
> using MPTCP. Currently they always end up with the label representing
> the kernel (typically system_u:system_r:kernel_t:s0), white they
> should inherit the context from the parent socket (the one that is
> passed to accept(2)).
>
> A minimal reproducer on a Fedora/CentOS/RHEL system:
>
> # Install dependencies
> dnf install -y mptcpd nginx curl
> # Disable rules that silence some SELinux denials
> semodule -DB
> # Set up a dummy file to be served by nginx
> echo test > /usr/share/nginx/html/testfile
> chmod +r /usr/share/nginx/html/testfile
> # Set up nginx to use MPTCP
> sysctl -w net.mptcp.enabled=1
> systemctl stop nginx
> mptcpize enable nginx
> systemctl start nginx
> # This will fail (no reply from server)
> mptcpize run curl -k -o /dev/null http://127.0.0.1/testfile
> # This will show the SELinux denial that caused the failure
> ausearch -i -m avc | grep httpd
>
> It is also possible to trigger the issue by running the
> selinux-testsuite [1] under `mptcpize run` (it will fail on the
> inet_socket test in multiple places).
>
> Based on what I could infer from the net & mptcp code, this is roughly
> how it happens (may be inaccurate or incorrect - the maze of the
> networking stack is not easy to navigate for me):
> 1. When the server starts, the main mptcp socket is created:
> socket(2) -> ... -> socket_create() -> inet_create() ->
> mptcp_init_sock() -> __mptcp_socket_create()
> 2. __mptcp_socket_create() calls mptcp_subflow_create_socket(), which
> creates another "kern" socket, which represents the initial(?)
> subflow.
> 3. This subflow socket goes through security_socket_post_create() ->
> selinux_socket_post_create(), which gives it a kernel label based on
> kern == 1, which indicates that it's a kernel-internal socket.
> 4. The main socket goes through its own selinux_socket_post_create(),
> which gives it the label based on the current task.
> 5. Later, when the client connection is accepted via accept(2) on the
> main socket, an underlying accept operation is performed on the
> subflow socket, which is then returned directly as the result of the
> accept(2) syscall.
> 6. Since this socket is cloned from the subflow socket, it inherits
> the kernel label from the original subflow socket (via
> selinux_inet_conn_request() and selinux_inet_csk_clone()).
> selinux_sock_graft() then also copies the label onto the inode
> representing the socket.
> 7. When nginx later calls writev(2) on the new socket,
> selinux_file_permission() uses the inode label as the target in a
> tcp_socket::write permission check. This is denied, as in the Fedora
> policy httpd_t isn't allowed to write to kernel_t TCP sockets.
>
> Side note: There is currently an odd conditional in sock_has_perm() in
> security/selinux/hooks.c that skips SELinux permission checking for
> sockets that have the kernel label, so native socket operations (such
> as recv(2), send(2), recvmsg(2), ...) will not uncover this problem,
> only generic file operations such as read(2), write(2), writev(2),
> etc. I believe that check shouldn't be there, but that's for another
> discussion...
That comes from:
commit e2943dca2d5b67e9578111986495483fe720d58b
Author: James Morris <jmorris@...hat.com>
Date: Sat May 8 01:00:33 2004 -0700
[NET]: Add sock_create_kern()
"""
This addresses a class of potential issues in SELinux where, for example,
a TCP NFS session times out, then the kernel re-establishes an RPC
connection upon further user activity. We do not want such kernel
created sockets to be labeled with user security contexts.
"""
> So now the big question is: How to fix this? I can think of several
> possible solutions, but neither of them seems to be the obvious
> correct one:
> 1. Wrap the socket cloned from the subflow socket in another socket
> (similar to how the main socket + subflow(s) are handled), which would
> be cloned from the non-kern outer socket that has the right label.
> This could have the disadvantage of adding unnecessary overhead, but
> would probably be simpler to do.
I would avoid that option: we already have a suboptimal amount of
indirection.
> 2. Somehow ensure that the cloned socket gets the label from the main
> socket instead of the subflow socket. This would probably require
> adding a new LSM hook and I'm not sure at all what would be the best
> way to implement this.
> 3. Somehow communicate the subflow socket <-> main socket relationship
> to the LSM layer so that it can switch to use the label of the main
> socket when handling an operation on a subflow socket (thus copying
> the label correctly on accept(2)). Not a great solution, as it
> requires each LSM that labels sockets to duplicate the indirection
> logic.
> 4. Do not create the subflow sockets as "kern". (Not sure if that
> would be desirable.)
No, we need subflow being kernel sockets. Lockdep will bail otherwise,
and the would be the tip of the iceberg.
> 5. Stop labeling kern sockets with the kernel's label on the SELinux
> side and just label them based on the current task as usual. (This
> would probably cause other issues, but maybe not...)
>
> Any ideas, suggestions, or patches welcome!
I think something alike the following could work - not even tested,
with comments on bold assumptions.
I'll try to do some testing later.
---
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 99f5e51d5ca4..6cad50c6fd24 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -102,6 +102,20 @@ static int __mptcp_socket_create(struct mptcp_sock *msk)
if (err)
return err;
+ /* The first subflow can later be indirectly exposed to security
+ * relevant syscall alike accept() and bind(), and at this point
+ * carries a 'kern' related security context.
+ * Reset the security context to the relevant user-space one.
+ * Note that the following assumes security_socket_post_create()
+ * being idempotent
+ */
+ err = security_socket_post_create(ssock, sk->sk_family, SOCK_STREAM,
+ IPPROTO_TCP, 0);
+ if (err) {
+ sock_release(ssock);
+ return err;
+ }
+
msk->first = ssock->sk;
msk->subflow = ssock;
subflow = mptcp_subflow_ctx(ssock->sk);
Powered by blists - more mailing lists