linux-kernel - Re: [RFC PATCH] mm,mremap: Bail out earlier in mremap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190222130125.apa2ysnahgfuj2vx@kshutemo-mobl1>
Date:   Fri, 22 Feb 2019 16:01:25 +0300
From:   "Kirill A. Shutemov" <kirill@...temov.name>
To:     Oscar Salvador <osalvador@...e.de>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        linux-api@...r.kernel.org, hughd@...gle.com, vbabka@...e.cz,
        joel@...lfernandes.org, jglisse@...hat.com,
        yang.shi@...ux.alibaba.com, mgorman@...hsingularity.net
Subject: Re: [RFC PATCH] mm,mremap: Bail out earlier in mremap_to under map
 pressure

On Thu, Feb 21, 2019 at 09:54:06AM +0100, Oscar Salvador wrote:
> When using mremap() syscall in addition to MREMAP_FIXED flag,
> mremap() calls mremap_to() which does the following:
> 
> 1) unmaps the destination region where we are going to move the map
> 2) If the new region is going to be smaller, we unmap the last part
>    of the old region
> 
> Then, we will eventually call move_vma() to do the actual move.
> 
> move_vma() checks whether we are at least 4 maps below max_map_count
> before going further, otherwise it bails out with -ENOMEM.
> The problem is that we might have already unmapped the vma's in steps
> 1) and 2), so it is not possible for userspace to figure out the state
> of the vma's after it gets -ENOMEM, and it gets tricky for userspace
> to clean up properly on error path.
> 
> While it is true that we can return -ENOMEM for more reasons
> (e.g: see may_expand_vm() or move_page_tables()), I think that we can
> avoid this scenario in concret if we check early in mremap_to() if the
> operation has high chances to succeed map-wise.
> 
> Should not be that the case, we can bail out before we even try to unmap
> anything, so we make sure the vma's are left untouched in case we are likely
> to be short of maps.
> 
> The thumb-rule now is to rely on the worst-scenario case we can have.
> That is when both vma's (old region and new region) are going to be split
> in 3, so we get two more maps to the ones we already hold (one per each).
> If current map count + 2 maps still leads us to 4 maps below the threshold,
> we are going to pass the check in move_vma().
> 
> Of course, this is not free, as it might generate false positives when it is
> true that we are tight map-wise, but the unmap operation can release several
> vma's leading us to a good state.
> 
> Because of that I am sending this as a RFC.
> Another approach was also investigated [1], but it may be too much hassle
> for what it brings.

I believe we don't need the check in move_vma() with this patch. Or do we?

> 
> [1] https://lore.kernel.org/lkml/20190219155320.tkfkwvqk53tfdojt@d104.suse.de/
> 
> Signed-off-by: Oscar Salvador <osalvador@...e.de>
> ---
>  mm/mremap.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 3320616ed93f..e3edef6b7a12 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -516,6 +516,23 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
>  	if (addr + old_len > new_addr && new_addr + new_len > addr)
>  		goto out;
>  
> +	/*
> +	 * move_vma() need us to stay 4 maps below the threshold, otherwise
> +	 * it will bail out at the very beginning.
> +	 * That is a problem if we have already unmaped the regions here
> +	 * (new_addr, and old_addr), because userspace will not know the
> +	 * state of the vma's after it gets -ENOMEM.
> +	 * So, to avoid such scenario we can pre-compute if the whole
> +	 * operation has high chances to success map-wise.
> +	 * Worst-scenario case is when both vma's (new_addr and old_addr) get
> +	 * split in 3 before unmaping it.
> +	 * That means 2 more maps (1 for each) to the ones we already hold.
> +	 * Check whether current map count plus 2 still leads us to 4 maps below
> +	 * the threshold, otherwise return -ENOMEM here to be more safe.
> +	 */
> +	if ((mm->map_count + 2) >= sysctl_max_map_count - 3)

Nit: redundant parentheses around 'mm->map_count + 2'.

> +		return -ENOMEM;
> +
>  	ret = do_munmap(mm, new_addr, new_len, uf_unmap_early);
>  	if (ret)
>  		goto out;
> -- 
> 2.13.7
> 

-- 
 Kirill A. Shutemov