lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250429142237.22138-1-arkamar@atlas.cz>
Date: Tue, 29 Apr 2025 16:22:36 +0200
From: Petr Vaněk <arkamar@...as.cz>
To: linux-kernel@...r.kernel.org
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	Ryan Roberts <ryan.roberts@....com>,
	David Hildenbrand <david@...hat.com>,
	linux-mm@...ck.org,
	stable@...r.kernel.org,
	Petr Vaněk <arkamar@...as.cz>
Subject: [PATCH 0/1] mm: Fix regression after THP PTE optimization on Xen PV Dom0

Hi all,

I have found a problem in recent kernels 6.9+ when running under the Xen
hypervisor as a PV dom0. In my setup (PV Dom0 with 4G RAM, using XFS), I
shrink memory to 256M via the balloon driver after boot:

  xl mem-set Domain-0 256m

Once memory is reduced, even running a simple command like ls usually
triggers the following warning:

[   27.963562] [ T2553] page: refcount:88 mapcount:21 mapping:ffff88813ff6f6a8 index:0x110 pfn:0x10085c
[   27.963564] [ T2553] head: order:2 mapcount:83 entire_mapcount:0 nr_pages_mapped:4 pincount:0
[   27.963565] [ T2553] memcg:ffff888003573000
[   27.963567] [ T2553] aops:0xffffffff8226fd20 ino:82467c dentry name(?):"libc.so.6"
[   27.963570] [ T2553] flags: 0x2000000000416c(referenced|uptodate|lru|active|private|head|node=0|zone=2)
[   27.963573] [ T2553] raw: 002000000000416c ffffea0004021e88 ffffea0004021908 ffff88813ff6f6a8
[   27.963574] [ T2553] raw: 0000000000000110 ffff88811bf06b60 0000005800000014 ffff888003573000
[   27.963576] [ T2553] head: 002000000000416c ffffea0004021e88 ffffea0004021908 ffff88813ff6f6a8
[   27.963577] [ T2553] head: 0000000000000110 ffff88811bf06b60 0000005800000014 ffff888003573000
[   27.963578] [ T2553] head: 0020000000000202 ffffea0004021701 0000000400000052 00000000ffffffff
[   27.963580] [ T2553] head: 0000000300000003 8000000300000002 0000000000000014 0000000000000004
[   27.963581] [ T2553] page dumped because: VM_WARN_ON_FOLIO((_Generic((page + nr_pages - 1), const struct page *: (const struct folio *)_compound_head(page + nr_pages - 1), struct page *: (struct folio *)_compound_head(page + nr_pages - 1))) != folio)
[   27.963590] [ T2553] ------------[ cut here ]------------
[   27.963591] [ T2553] WARNING: CPU: 1 PID: 2553 at include/linux/rmap.h:427 __folio_rmap_sanity_checks+0x1a0/0x270
[   27.963596] [ T2553] CPU: 1 UID: 0 PID: 2553 Comm: ls Not tainted 6.15.0-rc4-00002-ge0363f942651 #270 PREEMPT(undef)
[   27.963599] [ T2553] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
[   27.963601] [ T2553] RIP: e030:__folio_rmap_sanity_checks+0x1a0/0x270
[   27.963603] [ T2553] Code: 89 df e8 b3 7d fd ff 0f 0b e9 eb fe ff ff 48 8d 42 ff 48 39 c3 0f 84 0e ff ff ff 48 c7 c6 e8 49 5b 82 48 89 df e8 90 7d fd ff <0f> 0b e9 f8 fe ff ff 41 f7 c4 ff 0f 00 00 0f 85 b2 fe ff ff 49 8b
[   27.963605] [ T2553] RSP: e02b:ffffc90040f8fac0 EFLAGS: 00010246
[   27.963608] [ T2553] RAX: 00000000000000e5 RBX: ffffea0004021700 RCX: 0000000000000000
[   27.963609] [ T2553] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00000000ffffffff
[   27.963610] [ T2553] RBP: ffffc90040f8fae0 R08: 00000000ffffdfff R09: ffffffff82929308
[   27.963612] [ T2553] R10: ffffffff82879360 R11: 0000000000000002 R12: ffffea0004021700
[   27.963613] [ T2553] R13: 0000000000000005 R14: 0000000000000000 R15: 0000000000000005
[   27.963625] [ T2553] FS:  00007ff06dafe740(0000) GS:ffff8880b7ccb000(0000) knlGS:0000000000000000
[   27.963627] [ T2553] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[   27.963628] [ T2553] CR2: 000055deb932f630 CR3: 0000000002844000 CR4: 0000000000050660
[   27.963630] [ T2553] Call Trace:
[   27.963632] [ T2553]  <TASK>
[   27.963633] [ T2553]  folio_remove_rmap_ptes+0x24/0x2b0
[   27.963635] [ T2553]  unmap_page_range+0x132e/0x17e0
[   27.963638] [ T2553]  unmap_single_vma+0x81/0xd0
[   27.963640] [ T2553]  unmap_vmas+0xb5/0x180
[   27.963642] [ T2553]  exit_mmap+0x10c/0x460
[   27.963644] [ T2553]  mmput+0x59/0x120
[   27.963647] [ T2553]  do_exit+0x2d1/0xbd0
[   27.963649] [ T2553]  do_group_exit+0x2f/0x90
[   27.963651] [ T2553]  __x64_sys_exit_group+0x13/0x20
[   27.963652] [ T2553]  x64_sys_call+0x126b/0x1d70
[   27.963655] [ T2553]  do_syscall_64+0x54/0x120
[   27.963657] [ T2553]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   27.963659] [ T2553] RIP: 0033:0x7ff06dbdbe9d
[   27.963660] [ T2553] Code: Unable to access opcode bytes at 0x7ff06dbdbe73.
[   27.963661] [ T2553] RSP: 002b:00007ffde44f9ac8 EFLAGS: 00000206 ORIG_RAX: 00000000000000e7
[   27.963664] [ T2553] RAX: ffffffffffffffda RBX: 00007ff06dce4fa8 RCX: 00007ff06dbdbe9d
[   27.963665] [ T2553] RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI: 0000000000000000
[   27.963666] [ T2553] RBP: 0000000000000000 R08: 00007ffde44f9a70 R09: 00007ffde44f99ff
[   27.963667] [ T2553] R10: 00007ffde44f9980 R11: 0000000000000206 R12: 00007ff06dce3680
[   27.963668] [ T2553] R13: 00007ff06dd010e0 R14: 0000000000000002 R15: 00007ff06dce4fc0
[   27.963670] [ T2553]  </TASK>
[   27.963671] [ T2553] ---[ end trace 0000000000000000 ]---

Later on bigger problems start to appear:

[   69.035059] [ T2593] BUG: Bad page map in process ls  pte:10006c125 pmd:1003dc067
[   69.035061] [ T2593] page: refcount:8 mapcount:20 mapping:ffff88813fd736a8 index:0x110 pfn:0x10006c
[   69.035064] [ T2593] head: order:2 mapcount:-2 entire_mapcount:0 nr_pages_mapped:8388532 pincount:0
[   69.035066] [ T2593] memcg:ffff888003573000
[   69.035067] [ T2593] aops:0xffffffff8226fd20 ino:82467c dentry name(?):"libc.so.6"
[   69.035069] [ T2593] flags: 0x2000000000416c(referenced|uptodate|lru|active|private|head|node=0|zone=2)
[   69.035072] [ T2593] raw: 002000000000416c ffffea0004002348 ffffea0004001d08 ffff88813fd736a8
[   69.035074] [ T2593] raw: 0000000000000110 ffff888100d19860 0000000800000013 ffff888003573000
[   69.035076] [ T2593] head: 002000000000416c ffffea0004002348 ffffea0004001d08 ffff88813fd736a8
[   69.035078] [ T2593] head: 0000000000000110 ffff888100d19860 0000000800000013 ffff888003573000
[   69.035079] [ T2593] head: 0020000000000202 ffffea0004001b01 ffffffb4fffffffd 00000000ffffffff
[   69.035081] [ T2593] head: 0000000300000003 0000000500000002 0000000000000013 0000000000000004
[   69.035082] [ T2593] page dumped because: bad pte
[   69.035083] [ T2593] addr:00007f6524b96000 vm_flags:00000075 anon_vma:0000000000000000 mapping:ffff88813fd736a8 index:110
[   69.035086] [ T2593] file:libc.so.6 fault:xfs_filemap_fault mmap:xfs_file_mmap read_folio:xfs_vm_read_folio

The system eventually becomes unusable and typically crashes.

I was able to bisect this issue to the commit: 10ebac4f95e7 ("mm/memory:
optimize unmap/zap with PTE-mapped THP").

If I understand correctly, the folio from the first warning has 4 pages,
but folio_pte_batch incorrectly counts 5 pages, because expected_pte and
pte are both zero-valued PTEs, and the loop is not broken in such a case
because pte_same() returns true.

[   27.963266] [ T2553] folio_pte_batch debug: printing PTE values starting at addr=0x7ff06dc11000
[   27.963268] [ T2553]   PTE[ 0] = 000000010085c125
[   27.963272] [ T2553]   PTE[ 1] = 000000010085d125
[   27.963274] [ T2553]   PTE[ 2] = 000000010085e125
[   27.963276] [ T2553]   PTE[ 3] = 000000010085f125
[   27.963277] [ T2553]   PTE[ 4] = 0000000000000000 <-- not present
[   27.963279] [ T2553]   PTE[ 5] = 0000000102e47125
[   27.963281] [ T2553]   PTE[ 6] = 0000000102e48125
[   27.963283] [ T2553]   PTE[ 7] = 0000000102e49125

As a consequence, zap_present_folio_ptes() is called with nr = 5, which
later calls folio_remove_rmap_ptes(), where __folio_rmap_sanity_checks()
emits the warning log.

The patch in the following message fixes the problem for me. It adds a
check that breaks the loop when encountering non-present PTE, before the
!pte_same() check.

I still don't fully understand how the zero-valued PTE appears there or
why this issue is triggered only when xen-balloon driver shrinks the
memory. I wasn't able to reproduce it without shrinking.

Cheers,
Petr

Petr Vaněk (1):
  mm: Fix folio_pte_batch() overcount with zero PTEs

 mm/internal.h | 2 ++
 1 file changed, 2 insertions(+)

-- 
2.48.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ