linux-kernel - Re: [PATCH] exit: clear TIF_MEMDIE after exit_task

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160301155212.GJ9461@dhcp22.suse.cz>
Date:	Tue, 1 Mar 2016 16:52:12 +0100
From:	Michal Hocko <mhocko@...nel.org>
To:	Vladimir Davydov <vdavydov@...tuozzo.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
	David Rientjes <rientjes@...gle.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, "Michael S. Tsirkin" <mst@...hat.com>
Subject: Re: [PATCH] exit: clear TIF_MEMDIE after exit_task_work

[CCing vhost-net maintainer]

On Mon 29-02-16 20:02:09, Vladimir Davydov wrote:
> An mm_struct may be pinned by a file. An example is vhost-net device
> created by a qemu/kvm (see vhost_net_ioctl -> vhost_net_set_owner ->
> vhost_dev_set_owner).

The more I think about that the more I am wondering whether this is
actually OK and correct. Why does the driver have to pin the address
space? Nothing really prevents from parallel tearing down of the address
space anyway so the code cannot expect all the vmas to stay. Would it be
enough to pin the mm_struct only?

I am not sure I understand the code properly but what prevents from
the situation when a VHOST_SET_OWNER caller dies without calling
VHOST_RESET_OWNER and so the mm would be pinned indefinitely?

[Keeping the reset of the email for reference]

> If such process gets OOM-killed, the reference to
> its mm_struct will only be released from exit_task_work -> ____fput ->
> __fput -> vhost_net_release -> vhost_dev_cleanup, which is called after
> exit_mmap, where TIF_MEMDIE is cleared. As a result, we can start
> selecting the next victim before giving the last one a chance to free
> its memory. In practice, this leads to killing several VMs along with
> the fattest one.
> 
> Signed-off-by: Vladimir Davydov <vdavydov@...tuozzo.com>
> ---
>  kernel/exit.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/exit.c b/kernel/exit.c
> index fd90195667e1..cc50e12165f7 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -434,8 +434,6 @@ static void exit_mm(struct task_struct *tsk)
>  	task_unlock(tsk);
>  	mm_update_next_owner(mm);
>  	mmput(mm);
> -	if (test_thread_flag(TIF_MEMDIE))
> -		exit_oom_victim(tsk);
>  }
>  
>  static struct task_struct *find_alive_thread(struct task_struct *p)
> @@ -746,6 +744,8 @@ void do_exit(long code)
>  		disassociate_ctty(1);
>  	exit_task_namespaces(tsk);
>  	exit_task_work(tsk);
> +	if (test_thread_flag(TIF_MEMDIE))
> +		exit_oom_victim(tsk);
>  	exit_thread();
>  
>  	/*
> -- 
> 2.1.4

-- 
Michal Hocko
SUSE Labs