lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160301155212.GJ9461@dhcp22.suse.cz>
Date:	Tue, 1 Mar 2016 16:52:12 +0100
From:	Michal Hocko <mhocko@...nel.org>
To:	Vladimir Davydov <vdavydov@...tuozzo.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
	David Rientjes <rientjes@...gle.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, "Michael S. Tsirkin" <mst@...hat.com>
Subject: Re: [PATCH] exit: clear TIF_MEMDIE after exit_task_work

[CCing vhost-net maintainer]

On Mon 29-02-16 20:02:09, Vladimir Davydov wrote:
> An mm_struct may be pinned by a file. An example is vhost-net device
> created by a qemu/kvm (see vhost_net_ioctl -> vhost_net_set_owner ->
> vhost_dev_set_owner).

The more I think about that the more I am wondering whether this is
actually OK and correct. Why does the driver have to pin the address
space? Nothing really prevents from parallel tearing down of the address
space anyway so the code cannot expect all the vmas to stay. Would it be
enough to pin the mm_struct only?

I am not sure I understand the code properly but what prevents from
the situation when a VHOST_SET_OWNER caller dies without calling
VHOST_RESET_OWNER and so the mm would be pinned indefinitely?

[Keeping the reset of the email for reference]

> If such process gets OOM-killed, the reference to
> its mm_struct will only be released from exit_task_work -> ____fput ->
> __fput -> vhost_net_release -> vhost_dev_cleanup, which is called after
> exit_mmap, where TIF_MEMDIE is cleared. As a result, we can start
> selecting the next victim before giving the last one a chance to free
> its memory. In practice, this leads to killing several VMs along with
> the fattest one.
> 
> Signed-off-by: Vladimir Davydov <vdavydov@...tuozzo.com>
> ---
>  kernel/exit.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/exit.c b/kernel/exit.c
> index fd90195667e1..cc50e12165f7 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -434,8 +434,6 @@ static void exit_mm(struct task_struct *tsk)
>  	task_unlock(tsk);
>  	mm_update_next_owner(mm);
>  	mmput(mm);
> -	if (test_thread_flag(TIF_MEMDIE))
> -		exit_oom_victim(tsk);
>  }
>  
>  static struct task_struct *find_alive_thread(struct task_struct *p)
> @@ -746,6 +744,8 @@ void do_exit(long code)
>  		disassociate_ctty(1);
>  	exit_task_namespaces(tsk);
>  	exit_task_work(tsk);
> +	if (test_thread_flag(TIF_MEMDIE))
> +		exit_oom_victim(tsk);
>  	exit_thread();
>  
>  	/*
> -- 
> 2.1.4

-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ