linux-kernel - Re: [RFC PATCH 02/26] hugetlb: sort hstates in hugetlb_init

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7ec9cbbe-742a-4629-8764-90213e85ac58@nutanix.com>
Date:   Mon, 27 Jun 2022 17:38:50 +0530
From:   "manish.mishra" <manish.mishra@...anix.com>
To:     James Houghton <jthoughton@...gle.com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Muchun Song <songmuchun@...edance.com>,
        Peter Xu <peterx@...hat.com>
Cc:     David Hildenbrand <david@...hat.com>,
        David Rientjes <rientjes@...gle.com>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        Mina Almasry <almasrymina@...gle.com>,
        Jue Wang <juew@...gle.com>,
        "Dr . David Alan Gilbert" <dgilbert@...hat.com>,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 02/26] hugetlb: sort hstates in hugetlb_init_hstates


On 24/06/22 11:06 pm, James Houghton wrote:
> When using HugeTLB high-granularity mapping, we need to go through the
> supported hugepage sizes in decreasing order so that we pick the largest
> size that works. Consider the case where we're faulting in a 1G hugepage
> for the first time: we want hugetlb_fault/hugetlb_no_page to map it with
> a PUD. By going through the sizes in decreasing order, we will find that
> PUD_SIZE works before finding out that PMD_SIZE or PAGE_SIZE work too.
>
> Signed-off-by: James Houghton <jthoughton@...gle.com>
> ---
>   mm/hugetlb.c | 40 +++++++++++++++++++++++++++++++++++++---
>   1 file changed, 37 insertions(+), 3 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index a57e1be41401..5df838d86f32 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -33,6 +33,7 @@
>   #include <linux/migrate.h>
>   #include <linux/nospec.h>
>   #include <linux/delayacct.h>
> +#include <linux/sort.h>
>   
>   #include <asm/page.h>
>   #include <asm/pgalloc.h>
> @@ -48,6 +49,10 @@
>   
>   int hugetlb_max_hstate __read_mostly;
>   unsigned int default_hstate_idx;
> +/*
> + * After hugetlb_init_hstates is called, hstates will be sorted from largest
> + * to smallest.
> + */
>   struct hstate hstates[HUGE_MAX_HSTATE];
>   
>   #ifdef CONFIG_CMA
> @@ -3144,14 +3149,43 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
>   	kfree(node_alloc_noretry);
>   }
>   
> +static int compare_hstates_decreasing(const void *a, const void *b)
> +{
> +	const int shift_a = huge_page_shift((const struct hstate *)a);
> +	const int shift_b = huge_page_shift((const struct hstate *)b);
> +
> +	if (shift_a < shift_b)
> +		return 1;
> +	if (shift_a > shift_b)
> +		return -1;
> +	return 0;
> +}
> +
> +static void sort_hstates(void)
> +{
> +	unsigned long default_hstate_sz = huge_page_size(&default_hstate);
> +
> +	/* Sort from largest to smallest. */
> +	sort(hstates, hugetlb_max_hstate, sizeof(*hstates),
> +	     compare_hstates_decreasing, NULL);
> +
> +	/*
> +	 * We may have changed the location of the default hstate, so we need to
> +	 * update it.
> +	 */
> +	default_hstate_idx = hstate_index(size_to_hstate(default_hstate_sz));
> +}
> +
>   static void __init hugetlb_init_hstates(void)
>   {
>   	struct hstate *h, *h2;
>   
> -	for_each_hstate(h) {
> -		if (minimum_order > huge_page_order(h))
> -			minimum_order = huge_page_order(h);
> +	sort_hstates();
>   
> +	/* The last hstate is now the smallest. */
> +	minimum_order = huge_page_order(&hstates[hugetlb_max_hstate - 1]);
> +
> +	for_each_hstate(h) {
>   		/* oversize hugepages were init'ed in early boot */
>   		if (!hstate_is_gigantic(h))
>   			hugetlb_hstate_alloc_pages(h);

As now hstates are ordered can code which does calculation of demot_order

can too be optimised, i mean it can be value of order of hstate at next index?