linux-kernel - Re: [PATCH] exit: clear TIF_MEMDIE after exit_task

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160607125014.GL12305@dhcp22.suse.cz>
Date:	Tue, 7 Jun 2016 14:50:14 +0200
From:	Michal Hocko <mhocko@...nel.org>
To:	"Michael S. Tsirkin" <mst@...hat.com>
Cc:	Vladimir Davydov <vdavydov@...tuozzo.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
	David Rientjes <rientjes@...gle.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] exit: clear TIF_MEMDIE after exit_task_work

On Mon 14-03-16 17:39:43, Michal Hocko wrote:
> On Tue 01-03-16 19:20:24, Michael S. Tsirkin wrote:
> > On Tue, Mar 01, 2016 at 06:17:58PM +0100, Michal Hocko wrote:
> [...]
> > > Sorry, I could have been more verbose... The code would have to make sure
> > > that the mm is still alive before calling g-u-p by
> > > atomic_inc_not_zero(&mm->mm_users) and fail if the user count dropped to
> > > 0 in the mean time. See how fs/proc/task_mmu.c does that (proc_mem_open
> > > + m_start + m_stop.
> > > 
> > > The biggest advanatage would be that the mm address space pin would be
> > > only for the particular operation. Not sure whether that is possible in
> > > the driver though. Anyway pinning the mm for a potentially unbounded
> > > amount of time doesn't sound too nice.
> > 
> > Hmm that would be another atomic on data path ...
> > I'd have to explore that.
> 
> Did you have any chance to look into this?

So this is my take to get rid of mm_users pinning for an unbounded
amount of time. This is even not compile tested. I am not sure how to
handle when the mm goes away while there are still work items pending.
It seems this is not handled current anyway and only shouts with a
warning so this shouldn't cause a new regression AFAICS. I am not
familiar with the vnet code at all so I might be missing many things,
though. Does the below sound even remotely reasonable to you Michael?
---
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 669fef1e2bb6..47a3e2c832ea 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -343,7 +343,12 @@ static int vhost_worker(void *data)
 
 		if (work) {
 			__set_current_state(TASK_RUNNING);
+			if (!mmget_not_zero(dev->mm)) {
+				pr_warn("vhost: device owner mm got released unexpectedly\n");
+				break;
+			}
 			work->fn(work);
+			mmput(dev->mm);
 			if (need_resched())
 				schedule();
 		} else
@@ -481,7 +486,16 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
 	}
 
 	/* No owner, become one */
-	dev->mm = get_task_mm(current);
+	task_lock(current);
+	if (current->mm) {
+		dev->mm = current->mm;
+		atomic_inc(&curent->mm->mm_count);
+	}
+	task_unlock(current);
+	if (!dev->mm) {
+		err = -EINVAL;
+		goto err_mm;
+	}
 	worker = kthread_create(vhost_worker, dev, "vhost-%d", current->pid);
 	if (IS_ERR(worker)) {
 		err = PTR_ERR(worker);
@@ -505,7 +519,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
 	dev->worker = NULL;
 err_worker:
 	if (dev->mm)
-		mmput(dev->mm);
+		mmdrop(dev->mm);
 	dev->mm = NULL;
 err_mm:
 	return err;
@@ -583,7 +597,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev, bool locked)
 		dev->worker = NULL;
 	}
 	if (dev->mm)
-		mmput(dev->mm);
+		mmdrop(dev->mm);
 	dev->mm = NULL;
 }
 EXPORT_SYMBOL_GPL(vhost_dev_cleanup);
-- 
Michal Hocko
SUSE Labs