lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 14 Jan 2023 15:30:11 +0200
From:   Mike Rapoport <rppt@...nel.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Aaron Thompson <dev@...ont.org>, Mike Rapoport <rppt@...nel.org>,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: [GIT PULL] memblock: fix release of deferred pages in
 memblock_free_late()

Hi Linus,

The following changes since commit fa81ab49bbe4e1ce756581c970486de0ddb14309:

  memblock: Fix doc for memblock_phys_free (2023-01-04 12:31:22 +0200)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock tags/fixes-2023-01-14

for you to fetch changes up to 115d9d77bb0f9152c60b6e8646369fa7f6167593:

  mm: Always release pages to the buddy allocator in memblock_free_late(). (2023-01-08 18:49:33 +0200)

----------------------------------------------------------------
memblock: always release pages to the buddy allocator in memblock_free_late()

If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, memblock_free_pages()
only releases pages to the buddy allocator if they are not in the
deferred range. This is correct for free pages (as defined by
for_each_free_mem_pfn_range_in_zone()) because free pages in the
deferred range will be initialized and released as part of the deferred
init process. memblock_free_pages() is called by memblock_free_late(),
which is used to free reserved ranges after memblock_free_all() has
run. All pages in reserved ranges have been initialized at that point,
and accordingly, those pages are not touched by the deferred init
process. This means that currently, if the pages that
memblock_free_late() intends to release are in the deferred range, they
will never be released to the buddy allocator. They will forever be
reserved.

In addition, memblock_free_pages() calls kmsan_memblock_free_pages(),
which is also correct for free pages but is not correct for reserved
pages. KMSAN metadata for reserved pages is initialized by
kmsan_init_shadow(), which runs shortly before memblock_free_all().

For both of these reasons, memblock_free_pages() should only be called
for free pages, and memblock_free_late() should call __free_pages_core()
directly instead.

One case where this issue can occur in the wild is EFI boot on
x86_64. The x86 EFI code reserves all EFI boot services memory ranges
via memblock_reserve() and frees them later via memblock_free_late()
(efi_reserve_boot_services() and efi_free_boot_services(),
respectively). If any of those ranges happens to fall within the
deferred init range, the pages will not be released and that memory will
be unavailable.

For example, on an Amazon EC2 t3.micro VM (1 GB) booting via EFI:

v6.2-rc2:
Node 0, zone      DMA
      spanned  4095
      present  3999
      managed  3840
Node 0, zone    DMA32
      spanned  246652
      present  245868
      managed  178867

v6.2-rc2 + patch:
Node 0, zone      DMA
      spanned  4095
      present  3999
      managed  3840
Node 0, zone    DMA32
      spanned  246652
      present  245868
      managed  222816   # +43,949 pages

----------------------------------------------------------------
Aaron Thompson (1):
      mm: Always release pages to the buddy allocator in memblock_free_late().

 mm/memblock.c                     | 8 +++++++-
 tools/testing/memblock/internal.h | 4 ++++
 2 files changed, 11 insertions(+), 1 deletion(-)
-- 
Sincerely yours,
Mike.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ