lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Y1MxEk5ewlJeaW28@monkey>
Date:   Fri, 21 Oct 2022 16:53:54 -0700
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     Rik van Riel <riel@...riel.com>
Cc:     linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        kernel-team@...a.com, stable@...nel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        David Hildenbrand <david@...hat.com>
Subject: Re: [PATCH v2] mm,madvise,hugetlb: fix unexpected data loss with
 MADV_DONTNEED on hugetlbfs

On 10/21/22 19:28, Rik van Riel wrote:
> A common use case for hugetlbfs is for the application to create
> memory pools backed by huge pages, which then get handed over to
> some malloc library (eg. jemalloc) for further management.
> 
> That malloc library may be doing MADV_DONTNEED calls on memory
> that is no longer needed, expecting those calls to happen on
> PAGE_SIZE boundaries.
> 
> However, currently the MADV_DONTNEED code rounds up any such
> requests to HPAGE_PMD_SIZE boundaries. This leads to undesired
> outcomes when jemalloc expects a 4kB MADV_DONTNEED, but 2MB of
> memory get zeroed out, instead.
> 
> Use of pre-built shared libraries means that user code does not
> always know the page size of every memory arena in use.
> 
> Avoid unexpected data loss with MADV_DONTNEED by rounding up
> only to PAGE_SIZE (in do_madvise), and rounding down to huge
> page granularity.
> 
> That way programs will only get as much memory zeroed out as
> they requested.
> 
> Cc: Mike Kravetz <mike.kravetz@...cle.com>
> Cc: David Hildenbrand <david@...hat.com>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Cc: stable@...nel.org
> Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings")

I do hate changing behavior, but in hindsight this is the right behavior.
Especially, since it can cause unexpected data loss.

Reviewed-by: Mike Kravetz <mike.kravetz@...cle.com>
-- 
Mike Kravetz

> ---
> v2: split out the most urgent fix for stable backports
> 
>  mm/madvise.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 2baa93ca2310..c7105ec6d08c 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -813,7 +813,14 @@ static bool madvise_dontneed_free_valid_vma(struct vm_area_struct *vma,
>  	if (start & ~huge_page_mask(hstate_vma(vma)))
>  		return false;
>  
> -	*end = ALIGN(*end, huge_page_size(hstate_vma(vma)));
> +	/*
> +	 * Madvise callers expect the length to be rounded up to PAGE_SIZE
> +	 * boundaries, and may be unaware that this VMA uses huge pages.
> +	 * Avoid unexpected data loss by rounding down the number of
> +	 * huge pages freed.
> +	 */
> +	*end = ALIGN_DOWN(*end, huge_page_size(hstate_vma(vma)));
> +
>  	return true;
>  }
>  
> @@ -828,6 +835,9 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
>  	if (!madvise_dontneed_free_valid_vma(vma, start, &end, behavior))
>  		return -EINVAL;
>  
> +	if (start == end)
> +		return 0;
> +
>  	if (!userfaultfd_remove(vma, start, end)) {
>  		*prev = NULL; /* mmap_lock has been dropped, prev is stale */
>  
> -- 
> 2.37.2
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ