[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87ecx2nj0o.ffs@tglx>
Date: Tue, 06 May 2025 12:14:15 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Borislav Petkov <bp@...en8.de>, "Kirill A. Shutemov"
<kirill@...temov.name>, Andrew Morton <akpm@...ux-foundation.org>
Cc: linux-kernel@...r.kernel.org, linux-tip-commits@...r.kernel.org, Ingo
Molnar <mingo@...nel.org>, Arnd Bergmann <arnd@...db.de>, David Woodhouse
<dwmw@...zon.co.uk>, Dionna Amalie Glaze <dionnaglaze@...gle.com>, "H.
Peter Anvin" <hpa@...or.com>, Kees Cook <keescook@...omium.org>, Kevin
Loughlin <kevinloughlin@...gle.com>, Len Brown <len.brown@...el.com>,
Linus Torvalds <torvalds@...ux-foundation.org>, "Rafael J. Wysocki"
<rafael.j.wysocki@...el.com>, Tom Lendacky <thomas.lendacky@....com>,
linux-efi@...r.kernel.org, x86@...nel.org, Ard Biesheuvel
<ardb@...nel.org>, Peter Zijlstra <peterz@...radead.org>
Subject: Re: 4067196a5227 ("mm/page_alloc: fix deadlock on cpu_hotplug_lock
in __accept_page()") (was: Re: [tip: x86/boot] x86/boot: Provide
__pti_set_user_pgtbl() to startup code)
On Tue, May 06 2025 at 11:24, Borislav Petkov wrote:
> [ 1.329836] ---[ end Kernel panic - not syncing: kernel: panic_on_warn set ... ]---
>
> Fun stuff.
The only workable fix for this is:
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -894,7 +894,7 @@ config INTEL_TDX_GUEST
select ARCH_HAS_CC_PLATFORM
select X86_MEM_ENCRYPT
select X86_MCE
- select UNACCEPTED_MEMORY
+ select UNACCEPTED_MEMORY if BROKEN
help
Support running as a guest under Intel TDX. Without this support,
the guest kernel can not boot or run under TDX.
@@ -1515,7 +1515,7 @@ config AMD_MEM_ENCRYPT
select INSTRUCTION_DECODER
select ARCH_HAS_CC_PLATFORM
select X86_MEM_ENCRYPT
- select UNACCEPTED_MEMORY
+ select UNACCEPTED_MEMORY if BROKEN
select CRYPTO_LIB_AESGCM
help
Say yes to enable support for the encryption of system memory.
Why?
The whole logic of this static branch _is_ completely broken. It wants to
track whether any zone has unaccepted memory, but all of that lacks any
form of global serialization.
__accept_page()
last = list_empty(&zone->unaccepted_pages);
spin_unlock_irqrestore(&zone->lock, *flags);
if (last)
static_branch_dec();
__free_unaccepted()
spin_lock_irqsave(&zone->lock, flags);
first = list_empty(&zone->unaccepted_pages);
spin_unlock_irqrestore(&zone->lock, flags);
if (first)
static_branch_inc();
So anything which operates on the same zone after the lock was dropped
can race against the actual dec/inc operations. It's not even consistent
on the same zone, so how is that supposed to be consistent even across
different zones?
The work deferement:
__accept_page()
last = list_empty(&zone->unaccepted_pages);
spin_unlock_irqrestore(&zone->lock, *flags);
if (last) {
if (...)
schedule_work(...);
else
static_branch_dec();
made this actually worse as the work is fully async and therefore
decrements can be lost:
__accept_page()
schedule_work()
....
__accept_page()
schedule_work()
-> As the work is already scheduled, but not yet processed,
it is not queued again, which means the second decrement
is lost.
But that is just the tip of the iceberg as everything underneath which
"manages" and depends on that static branch is completely broken
already.
Seriously?
Thanks,
tglx
Powered by blists - more mailing lists