[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Y1MxEk5ewlJeaW28@monkey>
Date: Fri, 21 Oct 2022 16:53:54 -0700
From: Mike Kravetz <mike.kravetz@...cle.com>
To: Rik van Riel <riel@...riel.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
kernel-team@...a.com, stable@...nel.org,
Andrew Morton <akpm@...ux-foundation.org>,
David Hildenbrand <david@...hat.com>
Subject: Re: [PATCH v2] mm,madvise,hugetlb: fix unexpected data loss with
MADV_DONTNEED on hugetlbfs
On 10/21/22 19:28, Rik van Riel wrote:
> A common use case for hugetlbfs is for the application to create
> memory pools backed by huge pages, which then get handed over to
> some malloc library (eg. jemalloc) for further management.
>
> That malloc library may be doing MADV_DONTNEED calls on memory
> that is no longer needed, expecting those calls to happen on
> PAGE_SIZE boundaries.
>
> However, currently the MADV_DONTNEED code rounds up any such
> requests to HPAGE_PMD_SIZE boundaries. This leads to undesired
> outcomes when jemalloc expects a 4kB MADV_DONTNEED, but 2MB of
> memory get zeroed out, instead.
>
> Use of pre-built shared libraries means that user code does not
> always know the page size of every memory arena in use.
>
> Avoid unexpected data loss with MADV_DONTNEED by rounding up
> only to PAGE_SIZE (in do_madvise), and rounding down to huge
> page granularity.
>
> That way programs will only get as much memory zeroed out as
> they requested.
>
> Cc: Mike Kravetz <mike.kravetz@...cle.com>
> Cc: David Hildenbrand <david@...hat.com>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Cc: stable@...nel.org
> Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings")
I do hate changing behavior, but in hindsight this is the right behavior.
Especially, since it can cause unexpected data loss.
Reviewed-by: Mike Kravetz <mike.kravetz@...cle.com>
--
Mike Kravetz
> ---
> v2: split out the most urgent fix for stable backports
>
> mm/madvise.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 2baa93ca2310..c7105ec6d08c 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -813,7 +813,14 @@ static bool madvise_dontneed_free_valid_vma(struct vm_area_struct *vma,
> if (start & ~huge_page_mask(hstate_vma(vma)))
> return false;
>
> - *end = ALIGN(*end, huge_page_size(hstate_vma(vma)));
> + /*
> + * Madvise callers expect the length to be rounded up to PAGE_SIZE
> + * boundaries, and may be unaware that this VMA uses huge pages.
> + * Avoid unexpected data loss by rounding down the number of
> + * huge pages freed.
> + */
> + *end = ALIGN_DOWN(*end, huge_page_size(hstate_vma(vma)));
> +
> return true;
> }
>
> @@ -828,6 +835,9 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
> if (!madvise_dontneed_free_valid_vma(vma, start, &end, behavior))
> return -EINVAL;
>
> + if (start == end)
> + return 0;
> +
> if (!userfaultfd_remove(vma, start, end)) {
> *prev = NULL; /* mmap_lock has been dropped, prev is stale */
>
> --
> 2.37.2
>
>
Powered by blists - more mailing lists