lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <80622f99-0ef4-491b-87f6-c9790dfecef6@gmail.com>
Date: Tue, 25 Nov 2025 14:31:54 +0000
From: Usama Arif <usamaarif642@...il.com>
To: Pratyush Yadav <pratyush@...nel.org>
Cc: Changyuan Lyu <changyuanl@...gle.com>, akpm@...ux-foundation.org,
 linux-kernel@...r.kernel.org, Mike Rapoport <rppt@...nel.org>,
 anthony.yznaga@...cle.com, arnd@...db.de, ashish.kalra@....com,
 benh@...nel.crashing.org, bp@...en8.de, catalin.marinas@....com,
 corbet@....net, dave.hansen@...ux.intel.com, devicetree@...r.kernel.org,
 dwmw2@...radead.org, ebiederm@...ssion.com, graf@...zon.com, hpa@...or.com,
 jgowans@...zon.com, kexec@...ts.infradead.org, krzk@...nel.org,
 linux-arm-kernel@...ts.infradead.org, linux-doc@...r.kernel.org,
 linux-mm@...ck.org, luto@...nel.org, mark.rutland@....com, mingo@...hat.com,
 pasha.tatashin@...een.com, pbonzini@...hat.com, peterz@...radead.org,
 robh@...nel.org, rostedt@...dmis.org, saravanak@...gle.com,
 skinsburskii@...ux.microsoft.com, tglx@...utronix.de,
 thomas.lendacky@....com, will@...nel.org, x86@...nel.org,
 Breno Leitao <leitao@...ian.org>, thevlad@...a.com
Subject: Re: [PATCH v8 12/17] x86/e820: temporarily enable KHO scratch for
 memory below 1M



On 25/11/2025 13:15, Pratyush Yadav wrote:
> On Mon, Nov 24 2025, Usama Arif wrote:
> 
>> On 09/05/2025 08:46, Changyuan Lyu wrote:
>>> From: Alexander Graf <graf@...zon.com>
>>>
>>> KHO kernels are special and use only scratch memory for memblock
>>> allocations, but memory below 1M is ignored by kernel after early boot
>>> and cannot be naturally marked as scratch.
>>>
>>> To allow allocation of the real-mode trampoline and a few (if any) other
>>> very early allocations from below 1M forcibly mark the memory below 1M
>>> as scratch.
>>>
>>> After real mode trampoline is allocated, clear that scratch marking.
>>>
>>> Signed-off-by: Alexander Graf <graf@...zon.com>
>>> Co-developed-by: Mike Rapoport (Microsoft) <rppt@...nel.org>
>>> Signed-off-by: Mike Rapoport (Microsoft) <rppt@...nel.org>
>>> Co-developed-by: Changyuan Lyu <changyuanl@...gle.com>
>>> Signed-off-by: Changyuan Lyu <changyuanl@...gle.com>
>>> Acked-by: Dave Hansen <dave.hansen@...ux.intel.com>
>>> ---
>>>  arch/x86/kernel/e820.c   | 18 ++++++++++++++++++
>>>  arch/x86/realmode/init.c |  2 ++
>>>  2 files changed, 20 insertions(+)
>>>
>>> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
>>> index 9920122018a0b..c3acbd26408ba 100644
>>> --- a/arch/x86/kernel/e820.c
>>> +++ b/arch/x86/kernel/e820.c
>>> @@ -1299,6 +1299,24 @@ void __init e820__memblock_setup(void)
>>>  		memblock_add(entry->addr, entry->size);
>>>  	}
>>>  
>>> +	/*
>>> +	 * At this point memblock is only allowed to allocate from memory
>>> +	 * below 1M (aka ISA_END_ADDRESS) up until direct map is completely set
>>> +	 * up in init_mem_mapping().
>>> +	 *
>>> +	 * KHO kernels are special and use only scratch memory for memblock
>>> +	 * allocations, but memory below 1M is ignored by kernel after early
>>> +	 * boot and cannot be naturally marked as scratch.
>>> +	 *
>>> +	 * To allow allocation of the real-mode trampoline and a few (if any)
>>> +	 * other very early allocations from below 1M forcibly mark the memory
>>> +	 * below 1M as scratch.
>>> +	 *
>>> +	 * After real mode trampoline is allocated, we clear that scratch
>>> +	 * marking.
>>> +	 */
>>> +	memblock_mark_kho_scratch(0, SZ_1M);
>>> +
>>>  	/*
>>>  	 * 32-bit systems are limited to 4BG of memory even with HIGHMEM and
>>>  	 * to even less without it.
>>> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
>>> index f9bc444a3064d..9b9f4534086d2 100644
>>> --- a/arch/x86/realmode/init.c
>>> +++ b/arch/x86/realmode/init.c
>>> @@ -65,6 +65,8 @@ void __init reserve_real_mode(void)
>>>  	 * setup_arch().
>>>  	 */
>>>  	memblock_reserve(0, SZ_1M);
>>> +
>>> +	memblock_clear_kho_scratch(0, SZ_1M);
>>>  }
>>>  
>>>  static void __init sme_sev_setup_real_mode(struct trampoline_header *th)
>>
>> Hello!
>>
>> I am working with Breno who reported that we are seeing the below warning at boot
>> when rolling out 6.16 in Meta fleet. It is difficult to reproduce on a single host
>> manually but we are seeing this several times a day inside the fleet.
>>
>>  20:16:33  ------------[ cut here ]------------
>>  20:16:33  WARNING: CPU: 0 PID: 0 at mm/memblock.c:668 memblock_add_range+0x316/0x330
>>  20:16:33  Modules linked in:
>>  20:16:33  CPU: 0 UID: 0 PID: 0 Comm: swapper Tainted: G S                  6.16.1-0_fbk0_0_gc0739ee5037a #1 NONE 
>>  20:16:33  Tainted: [S]=CPU_OUT_OF_SPEC
>>  20:16:33  RIP: 0010:memblock_add_range+0x316/0x330
>>  20:16:33  Code: ff ff ff 89 5c 24 08 41 ff c5 44 89 6c 24 10 48 63 74 24 08 48 63 54 24 10 e8 26 0c 00 00 e9 41 ff ff ff 0f 0b e9 af fd ff ff <0f> 0b e9 b7 fd ff ff 0f 0b 0f 0b cc cc cc cc cc cc cc cc cc cc cc
>>  20:16:33  RSP: 0000:ffffffff83403dd8 EFLAGS: 00010083 ORIG_RAX: 0000000000000000
>>  20:16:33  RAX: ffffffff8476ff90 RBX: 0000000000001c00 RCX: 0000000000000002
>>  20:16:33  RDX: 00000000ffffffff RSI: 0000000000000000 RDI: ffffffff83bad4d8
>>  20:16:33  RBP: 000000000009f000 R08: 0000000000000020 R09: 8000000000097101
>>  20:16:33  R10: ffffffffff2004b0 R11: 203a6d6f646e6172 R12: 000000000009ec00
>>  20:16:33  R13: 0000000000000002 R14: 0000000000100000 R15: 000000000009d000
>>  20:16:33  FS:  0000000000000000(0000) GS:0000000000000000(0000) knlGS:0000000000000000
>>  20:16:33  CR2: ffff888065413ff8 CR3: 00000000663b7000 CR4: 00000000000000b0
>>  20:16:33  Call Trace:
>>  20:16:33   <TASK>
>>  20:16:33   ? __memblock_reserve+0x75/0x80
>>  20:16:33   ? setup_arch+0x30f/0xb10
>>  20:16:33   ? start_kernel+0x58/0x960
>>  20:16:33   ? x86_64_start_reservations+0x20/0x20
>>  20:16:33   ? x86_64_start_kernel+0x13d/0x140
>>  20:16:33   ? common_startup_64+0x13e/0x140
>>  20:16:33   </TASK>
>>  20:16:33  ---[ end trace 0000000000000000 ]--- 
>>
>>
>> Rolling out with memblock=debug is not really an option in a large scale fleet due to the
>> time added to boot. But I did try on one of the hosts (without reproducing the issue) and I see:
>>
>> [    0.000616]  memory.cnt  = 0x6
>> [    0.000617]  memory[0x0]	[0x0000000000001000-0x000000000009bfff], 0x000000000009b000 bytes flags: 0x40
>> [    0.000620]  memory[0x1]	[0x000000000009f000-0x000000000009ffff], 0x0000000000001000 bytes flags: 0x40
>> [    0.000621]  memory[0x2]	[0x0000000000100000-0x000000005ed09fff], 0x000000005ec0a000 bytes flags: 0x0
>> ...
>>
>> The 0x40 (MEMBLOCK_KHO_SCRATCH) is coming from memblock_mark_kho_scratch in e820__memblock_setup. I believe this
>> should be under ifdef like the diff at the end? (Happy to send this as a patch for review if it makes sense).
>> We have KEXEC_HANDOVER disabled in our defconfig, therefore MEMBLOCK_KHO_SCRATCH shouldnt be selected and
>> we shouldnt have any MEMBLOCK_KHO_SCRATCH type regions in our memblock reservations.
>>
>> The other thing I did was insert a while(1) just before the warning and inspected the registers in qemu.
>> R14 held the base register, and R15 held the size at that point.
>> In the warning R14 is 0x100000 meaning that someone is reserving a region with a different flag to MEMBLOCK_NONE
>> at the boundary of MEMBLOCK_KHO_SCRATCH.
> 
> I don't get this... The WARN_ON() is only triggered when the regions
> overlap. Here, there should be no overlap, since the scratch region
> should end at 0x100000 (SZ_1M) and the new region starts at 0x100000
> (SZ_1M).
> 

Yes, this is likely a separate problem. I just discovered flags = 0x40 while trying to
debug it with KEXEC_HANDOVER disabled.

> Anyway, you do indeed point at a bug. memblock_mark_kho_scratch() should
> only be called on a KHO boot, not unconditionally. So even with
> CONFIG_MEMBLOCK_KHO_SCRATCH enabled, this should only be called on a KHO
> boot, not every time.
> 
> I think the below diff should fix the warning for you by making sure the
> scratch areas are not present on non-KHO boot. I still don't know why
> you hit the warning in the first place though. If you'd be willing to
> dig deeper into that, it would be great.
> 
> Can you give the below a try and if it fixes the problem for you I can
> send it on the list.

Is there a reason for compiling this code with is_kho_boot, when we have disabled
KEXEC_HANDOVER and dont want this in? i.e. why not just ifdef it with MEMBLOCK_KHO_SCRATCH
when that defconfig is designed for it?

> 
> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> index c3acbd26408ba..0a34dc011bf91 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -16,6 +16,7 @@
>  #include <linux/firmware-map.h>
>  #include <linux/sort.h>
>  #include <linux/memory_hotplug.h>
> +#include <linux/kexec_handover.h>
>  
>  #include <asm/e820/api.h>
>  #include <asm/setup.h>
> @@ -1315,7 +1316,8 @@ void __init e820__memblock_setup(void)
>  	 * After real mode trampoline is allocated, we clear that scratch
>  	 * marking.
>  	 */
> -	memblock_mark_kho_scratch(0, SZ_1M);
> +	if (is_kho_boot())
> +		memblock_mark_kho_scratch(0, SZ_1M);
>  
>  	/*
>  	 * 32-bit systems are limited to 4BG of memory even with HIGHMEM and
> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
> index 88be32026768c..4e9b4dff17216 100644
> --- a/arch/x86/realmode/init.c
> +++ b/arch/x86/realmode/init.c
> @@ -4,6 +4,7 @@
>  #include <linux/memblock.h>
>  #include <linux/cc_platform.h>
>  #include <linux/pgtable.h>
> +#include <linux/kexec_handover.h>
>  
>  #include <asm/set_memory.h>
>  #include <asm/realmode.h>
> @@ -67,7 +68,8 @@ void __init reserve_real_mode(void)
>  	 */
>  	memblock_reserve(0, SZ_1M);
>  
> -	memblock_clear_kho_scratch(0, SZ_1M);
> +	if (is_kho_boot())
> +		memblock_clear_kho_scratch(0, SZ_1M);
>  }
>  
>  static void __init sme_sev_setup_real_mode(struct trampoline_header *th)
> 
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ