netdev - Re: Broken SELinux/LSM labeling with MPTCP and accept(2)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <108a1c80eed41516f85ebb264d0f46f95e86f754.camel@redhat.com>
Date:   Thu, 01 Dec 2022 19:26:43 +0100
From:   Paolo Abeni <pabeni@...hat.com>
To:     Ondrej Mosnacek <omosnace@...hat.com>,
        SElinux list <selinux@...r.kernel.org>,
        Linux Security Module list 
        <linux-security-module@...r.kernel.org>, mptcp@...ts.linux.dev
Cc:     network dev <netdev@...r.kernel.org>,
        Mat Martineau <mathew.j.martineau@...ux.intel.com>,
        Matthieu Baerts <matthieu.baerts@...sares.net>,
        Paul Moore <paul@...l-moore.com>
Subject: Re: Broken SELinux/LSM labeling with MPTCP and accept(2)

Hello,

On Thu, 2022-12-01 at 14:42 +0100, Ondrej Mosnacek wrote:
> As discovered by our QE, there is a problem with how the
> (userspace-facing) sockets returned by accept(2) are labeled when
> using MPTCP. Currently they always end up with the label representing
> the kernel (typically system_u:system_r:kernel_t:s0), white they
> should inherit the context from the parent socket (the one that is
> passed to accept(2)).
> 
> A minimal reproducer on a Fedora/CentOS/RHEL system:
> 
> # Install dependencies
> dnf install -y mptcpd nginx curl
> # Disable rules that silence some SELinux denials
> semodule -DB
> # Set up a dummy file to be served by nginx
> echo test > /usr/share/nginx/html/testfile
> chmod +r /usr/share/nginx/html/testfile
> # Set up nginx to use MPTCP
> sysctl -w net.mptcp.enabled=1
> systemctl stop nginx
> mptcpize enable nginx
> systemctl start nginx
> # This will fail (no reply from server)
> mptcpize run curl -k -o /dev/null http://127.0.0.1/testfile
> # This will show the SELinux denial that caused the failure
> ausearch -i -m avc | grep httpd
> 
> It is also possible to trigger the issue by running the
> selinux-testsuite [1] under `mptcpize run` (it will fail on the
> inet_socket test in multiple places).
> 
> Based on what I could infer from the net & mptcp code, this is roughly
> how it happens (may be inaccurate or incorrect - the maze of the
> networking stack is not easy to navigate for me):
> 1. When the server starts, the main mptcp socket is created:
>    socket(2) -> ... -> socket_create() -> inet_create() ->
> mptcp_init_sock() -> __mptcp_socket_create()
> 2. __mptcp_socket_create() calls mptcp_subflow_create_socket(), which
> creates another "kern" socket, which represents the initial(?)
> subflow.
> 3. This subflow socket goes through security_socket_post_create() ->
> selinux_socket_post_create(), which gives it a kernel label based on
> kern == 1, which indicates that it's a kernel-internal socket.
> 4. The main socket goes through its own selinux_socket_post_create(),
> which gives it the label based on the current task.
> 5. Later, when the client connection is accepted via accept(2) on the
> main socket, an underlying accept operation is performed on the
> subflow socket, which is then returned directly as the result of the
> accept(2) syscall.
> 6. Since this socket is cloned from the subflow socket, it inherits
> the kernel label from the original subflow socket (via
> selinux_inet_conn_request() and selinux_inet_csk_clone()).
> selinux_sock_graft() then also copies the label onto the inode
> representing the socket.
> 7. When nginx later calls writev(2) on the new socket,
> selinux_file_permission() uses the inode label as the target in a
> tcp_socket::write permission check. This is denied, as in the Fedora
> policy httpd_t isn't allowed to write to kernel_t TCP sockets.
> 
> Side note: There is currently an odd conditional in sock_has_perm() in
> security/selinux/hooks.c that skips SELinux permission checking for
> sockets that have the kernel label, so native socket operations (such
> as recv(2), send(2), recvmsg(2), ...) will not uncover this problem,
> only generic file operations such as read(2), write(2), writev(2),
> etc. I believe that check shouldn't be there, but that's for another
> discussion...

That comes from:

commit e2943dca2d5b67e9578111986495483fe720d58b
Author: James Morris <jmorris@...hat.com>
Date:   Sat May 8 01:00:33 2004 -0700

    [NET]: Add sock_create_kern()

"""
    This addresses a class of potential issues in SELinux where, for example,
    a TCP NFS session times out, then the kernel re-establishes an RPC
    connection upon further user activity.  We do not want such kernel
    created sockets to be labeled with user security contexts.
"""

> So now the big question is: How to fix this? I can think of several
> possible solutions, but neither of them seems to be the obvious
> correct one:
> 1. Wrap the socket cloned from the subflow socket in another socket
> (similar to how the main socket + subflow(s) are handled), which would
> be cloned from the non-kern outer socket that has the right label.
> This could have the disadvantage of adding unnecessary overhead, but
> would probably be simpler to do.

I would avoid that option: we already have a suboptimal amount of
indirection.

> 2. Somehow ensure that the cloned socket gets the label from the main
> socket instead of the subflow socket. This would probably require
> adding a new LSM hook and I'm not sure at all what would be the best
> way to implement this.
> 3. Somehow communicate the subflow socket <-> main socket relationship
> to the LSM layer so that it can switch to use the label of the main
> socket when handling an operation on a subflow socket (thus copying
> the label correctly on accept(2)). Not a great solution, as it
> requires each LSM that labels sockets to duplicate the indirection
> logic.
> 4. Do not create the subflow sockets as "kern". (Not sure if that
> would be desirable.)

No, we need subflow being kernel sockets. Lockdep will bail otherwise,
and the would be the tip of the iceberg.

> 5. Stop labeling kern sockets with the kernel's label on the SELinux
> side and just label them based on the current task as usual. (This
> would probably cause other issues, but maybe not...)
> 
> Any ideas, suggestions, or patches welcome!

I think something alike the following could work - not even tested,
with comments on bold assumptions.

I'll try to do some testing later.

---
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 99f5e51d5ca4..6cad50c6fd24 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -102,6 +102,20 @@ static int __mptcp_socket_create(struct mptcp_sock *msk)
 	if (err)
 		return err;
 
+	/* The first subflow can later be indirectly exposed to security
+	 * relevant syscall alike accept() and bind(), and at this point
+	 * carries a 'kern' related security context.
+	 * Reset the security context to the relevant user-space one.
+	 * Note that the following assumes security_socket_post_create()
+	 * being idempotent
+	 */
+	err = security_socket_post_create(ssock, sk->sk_family, SOCK_STREAM,
+					  IPPROTO_TCP, 0);
+	if (err) {
+		sock_release(ssock);
+		return err;
+	}
+
 	msk->first = ssock->sk;
 	msk->subflow = ssock;
 	subflow = mptcp_subflow_ctx(ssock->sk);