lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <15428BF6-A7DD-44ED-B225-AECD7866394C@nvidia.com>
Date: Thu, 01 May 2025 14:49:21 -0400
From: Zi Yan <ziy@...dia.com>
To: Juan Yescas <jyescas@...gle.com>
Cc: Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
 Andrew Morton <akpm@...ux-foundation.org>,
 linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
 linux-mm@...ck.org, tjmercier@...gle.com, isaacmanjarres@...gle.com,
 surenb@...gle.com, kaleshsingh@...gle.com, Vlastimil Babka <vbabka@...e.cz>,
 "Liam R. Howlett" <Liam.Howlett@...cle.com>,
 Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
 David Hildenbrand <david@...hat.com>, Mike Rapoport <rppt@...nel.org>,
 Minchan Kim <minchan@...nel.org>
Subject: Re: [PATCH] mm: Add ARCH_FORCE_PAGE_BLOCK_ORDER to select page block
 order

On 1 May 2025, at 1:25, Juan Yescas wrote:

> Problem: On large page size configurations (16KiB, 64KiB), the CMA
> alignment requirement (CMA_MIN_ALIGNMENT_BYTES) increases considerably,
> and this causes the CMA reservations to be larger than necessary.
> This means that system will have less available MIGRATE_UNMOVABLE and
> MIGRATE_RECLAIMABLE page blocks since MIGRATE_CMA can't fallback to them.
>
> The CMA_MIN_ALIGNMENT_BYTES increases because it depends on
> MAX_PAGE_ORDER which depends on ARCH_FORCE_MAX_ORDER. The value of
> ARCH_FORCE_MAX_ORDER increases on 16k and 64k kernels.
>
> For example, the CMA alignment requirement when:
>
> - CONFIG_ARCH_FORCE_MAX_ORDER default value is used
> - CONFIG_TRANSPARENT_HUGEPAGE is set:
>
> PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order | CMA_MIN_ALIGNMENT_BYTES
> -----------------------------------------------------------------------
>    4KiB   |      10        |      10         |  4KiB * (2 ^ 10)  =  4MiB
>   16Kib   |      11        |      11         | 16KiB * (2 ^ 11) =  32MiB
>   64KiB   |      13        |      13         | 64KiB * (2 ^ 13) = 512MiB
>
> There are some extreme cases for the CMA alignment requirement when:
>
> - CONFIG_ARCH_FORCE_MAX_ORDER maximum value is set
> - CONFIG_TRANSPARENT_HUGEPAGE is NOT set:
> - CONFIG_HUGETLB_PAGE is NOT set
>
> PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order |  CMA_MIN_ALIGNMENT_BYTES
> ------------------------------------------------------------------------
>    4KiB   |      15        |      15         |  4KiB * (2 ^ 15) = 128MiB
>   16Kib   |      13        |      13         | 16KiB * (2 ^ 13) = 128MiB
>   64KiB   |      13        |      13         | 64KiB * (2 ^ 13) = 512MiB
>
> This affects the CMA reservations for the drivers. If a driver in a
> 4KiB kernel needs 4MiB of CMA memory, in a 16KiB kernel, the minimal
> reservation has to be 32MiB due to the alignment requirements:
>
> reserved-memory {
>     ...
>     cma_test_reserve: cma_test_reserve {
>         compatible = "shared-dma-pool";
>         size = <0x0 0x400000>; /* 4 MiB */
>         ...
>     };
> };
>
> reserved-memory {
>     ...
>     cma_test_reserve: cma_test_reserve {
>         compatible = "shared-dma-pool";
>         size = <0x0 0x2000000>; /* 32 MiB */
>         ...
>     };
> };
>
> Solution: Add a new config ARCH_FORCE_PAGE_BLOCK_ORDER that
> allows to set the page block order. The maximum page block
> order will be given by ARCH_FORCE_MAX_ORDER.
>
> By default, ARCH_FORCE_PAGE_BLOCK_ORDER will have the same
> value that ARCH_FORCE_MAX_ORDER. This will make sure that
> current kernel configurations won't be affected by this
> change. It is a opt-in change.
>
> This patch will allow to have the same CMA alignment
> requirements for large page sizes (16KiB, 64KiB) as that
> in 4kb kernels by setting a lower pageblock_order.
>
> Tests:
>
> - Verified that HugeTLB pages work when pageblock_order is 1, 7, 10
> on 4k and 16k kernels.
>
> - Verified that Transparent Huge Pages work when pageblock_order
> is 1, 7, 10 on 4k and 16k kernels.
>
> - Verified that dma-buf heaps allocations work when pageblock_order
> is 1, 7, 10 on 4k and 16k kernels.
>
> Benchmarks:
>
> The benchmarks compare 16kb kernels with pageblock_order 10 and 7. The
> reason for the pageblock_order 7 is because this value makes the min
> CMA alignment requirement the same as that in 4kb kernels (2MB).
>
> - Perform 100K dma-buf heaps (/dev/dma_heap/system) allocations of
> SZ_8M, SZ_4M, SZ_2M, SZ_1M, SZ_64, SZ_8, SZ_4. Use simpleperf
> (https://developer.android.com/ndk/guides/simpleperf) to measure
> the # of instructions and page-faults on 16k kernels.
> The benchmark was executed 10 times. The averages are below:
>
>            # instructions         |     #page-faults
>     order 10     |  order 7       | order 10 | order 7
> --------------------------------------------------------
>  13,891,765,770	 | 11,425,777,314 |    220   |   217
>  14,456,293,487	 | 12,660,819,302 |    224   |   219
>  13,924,261,018	 | 13,243,970,736 |    217   |   221
>  13,910,886,504	 | 13,845,519,630 |    217   |   221
>  14,388,071,190	 | 13,498,583,098 |    223   |   224
>  13,656,442,167	 | 12,915,831,681 |    216   |   218
>  13,300,268,343	 | 12,930,484,776 |    222   |   218
>  13,625,470,223	 | 14,234,092,777 |    219   |   218
>  13,508,964,965	 | 13,432,689,094 |    225   |   219
>  13,368,950,667	 | 13,683,587,37  |    219   |   225
> -------------------------------------------------------------------
>  13,803,137,433  | 13,131,974,268 |    220   |   220    Averages
>
> There were 4.85% #instructions when order was 7, in comparison
> with order 10.
>
>      13,803,137,433 - 13,131,974,268 = -671,163,166 (-4.86%)
>
> The number of page faults in order 7 and 10 were the same.
>
> These results didn't show any significant regression when the
> pageblock_order is set to 7 on 16kb kernels.
>
> - Run speedometer 3.1 (https://browserbench.org/Speedometer3.1/) 5 times
>  on the 16k kernels with pageblock_order 7 and 10.
>
> order 10 | order 7  | order 7 - order 10 | (order 7 - order 10) %
> -------------------------------------------------------------------
>   15.8	 |  16.4    |         0.6        |     3.80%
>   16.4	 |  16.2    |        -0.2        |    -1.22%
>   16.6	 |  16.3    |        -0.3        |    -1.81%
>   16.8	 |  16.3    |        -0.5        |    -2.98%
>   16.6	 |  16.8    |         0.2        |     1.20%
> -------------------------------------------------------------------
>   16.44     16.4            -0.04	          -0.24%   Averages
>
> The results didn't show any significant regression when the
> pageblock_order is set to 7 on 16kb kernels.
>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Cc: Vlastimil Babka <vbabka@...e.cz>
> Cc: Liam R. Howlett <Liam.Howlett@...cle.com>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> Cc: David Hildenbrand <david@...hat.com>
> CC: Mike Rapoport <rppt@...nel.org>
> Cc: Zi Yan <ziy@...dia.com>
> Cc: Suren Baghdasaryan <surenb@...gle.com>
> Cc: Minchan Kim <minchan@...nel.org>
> Signed-off-by: Juan Yescas <jyescas@...gle.com>
> ---
>  arch/arm64/Kconfig              | 14 ++++++++++++++
>  include/linux/pageblock-flags.h | 12 +++++++++---
>  2 files changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index a182295e6f08..d784049e1e01 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1658,6 +1658,20 @@ config ARCH_FORCE_MAX_ORDER
>
>  	  Don't change if unsure.
>
> +config ARCH_FORCE_PAGE_BLOCK_ORDER
> +	int "Page Block Order"
> +	range 1 ARCH_FORCE_MAX_ORDER
> +	default ARCH_FORCE_MAX_ORDER
> +	help
> +	  The page block order refers to the power of two number of pages that
> +	  are physically contiguous and can have a migrate type associated to them.
> +	  The maximum size of the page block order is limited by ARCH_FORCE_MAX_ORDER.

Since memory compaction operates at pageblock granularity and pageblock size
usually matches THP size, a smaller pageblock size degrades kernel
anti-fragmentation mechanism for THP significantly. Can you add something like
the text below to the help section?

"Reducing pageblock order can negatively impact THP generation successful rate.
If your workloads uses THP heavily, please use this option with caution."

Otherwise, Acked-by: Zi Yan <ziy@...dia.com>

I am also OK if you move this to mm/Kconfig.

> +
> +	  This option allows overriding the default setting when the page
> +	  block order requires to be smaller than ARCH_FORCE_MAX_ORDER.
> +
> +	  Don't change if unsure.
> +
>  config UNMAP_KERNEL_AT_EL0
>  	bool "Unmap kernel when running in userspace (KPTI)" if EXPERT
>  	default y
> diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flags.h
> index fc6b9c87cb0a..ab3de96bb50c 100644
> --- a/include/linux/pageblock-flags.h
> +++ b/include/linux/pageblock-flags.h
> @@ -28,6 +28,12 @@ enum pageblock_bits {
>  	NR_PAGEBLOCK_BITS
>  };
>
> +#if defined(CONFIG_ARCH_FORCE_PAGE_BLOCK_ORDER)
> +#define PAGE_BLOCK_ORDER CONFIG_ARCH_FORCE_PAGE_BLOCK_ORDER
> +#else
> +#define PAGE_BLOCK_ORDER MAX_PAGE_ORDER
> +#endif /* CONFIG_ARCH_FORCE_PAGE_BLOCK_ORDER */
> +
>  #if defined(CONFIG_HUGETLB_PAGE)
>
>  #ifdef CONFIG_HUGETLB_PAGE_SIZE_VARIABLE
> @@ -41,18 +47,18 @@ extern unsigned int pageblock_order;
>   * Huge pages are a constant size, but don't exceed the maximum allocation
>   * granularity.
>   */
> -#define pageblock_order		MIN_T(unsigned int, HUGETLB_PAGE_ORDER, MAX_PAGE_ORDER)
> +#define pageblock_order		MIN_T(unsigned int, HUGETLB_PAGE_ORDER, PAGE_BLOCK_ORDER)
>
>  #endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
>
>  #elif defined(CONFIG_TRANSPARENT_HUGEPAGE)
>
> -#define pageblock_order		MIN_T(unsigned int, HPAGE_PMD_ORDER, MAX_PAGE_ORDER)
> +#define pageblock_order		MIN_T(unsigned int, HPAGE_PMD_ORDER, PAGE_BLOCK_ORDER)
>
>  #else /* CONFIG_TRANSPARENT_HUGEPAGE */
>
>  /* If huge pages are not used, group by MAX_ORDER_NR_PAGES */
> -#define pageblock_order		MAX_PAGE_ORDER
> +#define pageblock_order		PAGE_BLOCK_ORDER
>
>  #endif /* CONFIG_HUGETLB_PAGE */
>
> -- 
> 2.49.0.906.g1f30a19c02-goog


--
Best Regards,
Yan, Zi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ