linux-kernel - Re: [PATCH] netns: dangling pointer on netns bind mount point.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200407030504.GX23230@ZenIV.linux.org.uk>
Date:   Tue, 7 Apr 2020 04:05:04 +0100
From:   Al Viro <viro@...iv.linux.org.uk>
To:     Levi <ppbuk5246@...il.com>
Cc:     davem@...emloft.net, kuba@...nel.org, gnault@...hat.com,
        nicolas.dichtel@...nd.com, edumazet@...gle.com,
        lirongqing@...du.com, tglx@...utronix.de, johannes.berg@...el.com,
        dhowells@...hat.com, daniel@...earbox.net,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        netdev@...r.kernel.org
Subject: Re: [PATCH] netns: dangling pointer on netns bind mount point.

On Tue, Apr 07, 2020 at 11:35:46AM +0900, Levi wrote:
> When we try to bind mount on network namespace (ex) /proc/{pid}/ns/net,
> inode's private data can have dangling pointer to net_namespace that was
> already freed in below case.
> 
>     1. Forking the process.
>     2. [PARENT] Waiting the Child till the end.
>     3. [CHILD] call unshare for creating new network namespace
>     4. [CHILD] Bind mount with /proc/self/ns/net to some mount point.
>     5. [CHILD] Exit child.
>     6. [PARENT] Try to setns with binded mount point
> 
> In step 5, net_namespace made by child process'll be freed,
> But in bind mount point, it still held the pointer to net_namespace made
> by child process.
> In this situation, when parent try to call "setns" systemcall with the
> bind mount point, parent process try to access to freed memory, That
> makes memory corruption.
> 
> This patch fix the above scenario by increaseing reference count.

This can't be the right fix.

> +#ifdef CONFIG_NET_NS
> +	if (!(flag & CL_COPY_MNT_NS_FILE) && is_net_ns_file(root)) {
> +		ns = get_proc_ns(d_inode(root));
> +		if (ns == NULL || ns->ops->type != CLONE_NEWNET) {
> +			err = -EINVAL;
> +
> +			goto out_free;
> +		}
> +
> +		net = to_net_ns(ns);
> +		net = get_net(net);

No.  This is completely wrong.  You have each struct mount pointing to
that sucker to grab an extra reference on an object; you do *NOT* drop
said reference when struct mount is destroyed.  You are papering over
a dangling pointer of some sort by introducing a trivially exploitable
leak that happens to hit your scenario.

Hell, do (your step 4 + umount your mountpoint) in a loop, then watch what
happens to refcounts with that patch.

This is bollocks; the reference is *NOT* in struct mount.  At all.
It's not even in struct dentry.  What it's attached to is struct
inode and it should be pinned as long as that inode is alive -
it's dropped in nsfs_evict().  And if _that_ gets called while
dentry is still pinned (as ->mnt_root of something), you have
much worse problems.

Could you post a reproducer, preferably one that would trigger an oops
on mainline?