lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAC_TJvf36Qr3r_LJ0Knf7WtozUZ_YVxLxF7bEAPC+87J-QEd6Q@mail.gmail.com>
Date: Thu, 1 May 2025 11:21:51 -0700
From: Kalesh Singh <kaleshsingh@...gle.com>
To: Juan Yescas <jyescas@...gle.com>
Cc: Zi Yan <ziy@...dia.com>, Catalin Marinas <catalin.marinas@....com>, 
	Will Deacon <will@...nel.org>, Andrew Morton <akpm@...ux-foundation.org>, 
	linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org, 
	linux-mm@...ck.org, tjmercier@...gle.com, isaacmanjarres@...gle.com, 
	surenb@...gle.com, Vlastimil Babka <vbabka@...e.cz>, 
	"Liam R. Howlett" <Liam.Howlett@...cle.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, 
	David Hildenbrand <david@...hat.com>, Mike Rapoport <rppt@...nel.org>, Minchan Kim <minchan@...nel.org>
Subject: Re: [PATCH] mm: Add ARCH_FORCE_PAGE_BLOCK_ORDER to select page block order

On Thu, May 1, 2025 at 10:11 AM Juan Yescas <jyescas@...gle.com> wrote:
>
> On Thu, May 1, 2025 at 7:24 AM Zi Yan <ziy@...dia.com> wrote:
> >
> > On 1 May 2025, at 1:25, Juan Yescas wrote:
> >
> > > Problem: On large page size configurations (16KiB, 64KiB), the CMA
> > > alignment requirement (CMA_MIN_ALIGNMENT_BYTES) increases considerably,
> > > and this causes the CMA reservations to be larger than necessary.
> > > This means that system will have less available MIGRATE_UNMOVABLE and
> > > MIGRATE_RECLAIMABLE page blocks since MIGRATE_CMA can't fallback to them.
> > >
> > > The CMA_MIN_ALIGNMENT_BYTES increases because it depends on
> > > MAX_PAGE_ORDER which depends on ARCH_FORCE_MAX_ORDER. The value of
> > > ARCH_FORCE_MAX_ORDER increases on 16k and 64k kernels.
> > >
> > > For example, the CMA alignment requirement when:
> > >
> > > - CONFIG_ARCH_FORCE_MAX_ORDER default value is used
> > > - CONFIG_TRANSPARENT_HUGEPAGE is set:
> > >
> > > PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order | CMA_MIN_ALIGNMENT_BYTES
> > > -----------------------------------------------------------------------
> > >    4KiB   |      10        |      10         |  4KiB * (2 ^ 10)  =  4MiB
> > >   16Kib   |      11        |      11         | 16KiB * (2 ^ 11) =  32MiB
> > >   64KiB   |      13        |      13         | 64KiB * (2 ^ 13) = 512MiB
> > >
> > > There are some extreme cases for the CMA alignment requirement when:
> > >
> > > - CONFIG_ARCH_FORCE_MAX_ORDER maximum value is set
> > > - CONFIG_TRANSPARENT_HUGEPAGE is NOT set:
> > > - CONFIG_HUGETLB_PAGE is NOT set
> > >
> > > PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order |  CMA_MIN_ALIGNMENT_BYTES
> > > ------------------------------------------------------------------------
> > >    4KiB   |      15        |      15         |  4KiB * (2 ^ 15) = 128MiB
> > >   16Kib   |      13        |      13         | 16KiB * (2 ^ 13) = 128MiB
> > >   64KiB   |      13        |      13         | 64KiB * (2 ^ 13) = 512MiB
> > >
> > > This affects the CMA reservations for the drivers. If a driver in a
> > > 4KiB kernel needs 4MiB of CMA memory, in a 16KiB kernel, the minimal
> > > reservation has to be 32MiB due to the alignment requirements:
> > >
> > > reserved-memory {
> > >     ...
> > >     cma_test_reserve: cma_test_reserve {
> > >         compatible = "shared-dma-pool";
> > >         size = <0x0 0x400000>; /* 4 MiB */
> > >         ...
> > >     };
> > > };
> > >
> > > reserved-memory {
> > >     ...
> > >     cma_test_reserve: cma_test_reserve {
> > >         compatible = "shared-dma-pool";
> > >         size = <0x0 0x2000000>; /* 32 MiB */
> > >         ...
> > >     };
> > > };
> > >
> > > Solution: Add a new config ARCH_FORCE_PAGE_BLOCK_ORDER that
> > > allows to set the page block order. The maximum page block
> > > order will be given by ARCH_FORCE_MAX_ORDER.
> >
> > Why not use a boot time parameter to change page block order?
>
> That is a good option. The main tradeoff is:
>
> - The bootloader would have to be updated on the devices to pass the right
> pageblock_order value depending on the kernel page size. Currently,
> We can boot 4k/16k kernels without any change in the bootloader.

Once we change the page block order we likely need to update the CMA
reservations in the device tree to match the new min alignment, which
needs to be recompiled and flashed to the device. So there is likely
not a significant process saving by making the page block order a boot
parameter.

-- Kalesh

>
> > Otherwise, you will need to maintain an additional kernel
> > binary for your use case.
> >
>
> Unfortunately, we still need 2 kernel binaries, one for 4k and another for 16k.
> There are several data structures that are aligned at compile time based on the
> PAGE_SIZE (__aligned(PAGE_SIZE)) that makes it difficult to have only one
> binary.
>
> For example:
>
> static u8 idmap_ptes[IDMAP_LEVELS - 1][PAGE_SIZE] __aligned(PAGE_SIZE)
> __ro_after_init,
>  kpti_ptes[IDMAP_LEVELS - 1][PAGE_SIZE] __aligned(PAGE_SIZE) __ro_after_init;
>
> https://elixir.bootlin.com/linux/v6.14.4/source/arch/arm64/mm/mmu.c#L780
>
> Thanks
> Juan
>
> > --
> > Best Regards,
> > Yan, Zi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ