[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <113b914f-1597-41ca-b714-7ea048c3c6df@huawei.com>
Date: Tue, 5 Aug 2025 16:47:31 +0800
From: mawupeng <mawupeng1@...wei.com>
To: <rppt@...nel.org>, <ardb@...nel.org>
CC: <mawupeng1@...wei.com>, <akpm@...ux-foundation.org>, <linux-mm@...ck.org>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mm: ignore nomap memory during mirror init
On 2025/7/22 16:17, Mike Rapoport wrote:
> Hi Ard,
>
> On Mon, Jul 21, 2025 at 03:08:48PM +1000, Ard Biesheuvel wrote:
>> On Sun, 20 Jul 2025 at 22:38, Mike Rapoport <rppt@...nel.org> wrote:
>>>
>> ...
>>>
>>>> w/o this patch
>>>> [root@...alhost ~]# lsmem --output-all
>>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
>>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 67584-67839 0 Movable
>>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 68096-68607 0 Movable
>>>>
>>>> w/ this patch
>>>> [root@...alhost ~]# lsmem --output-all
>>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
>>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 8448-8479 0 Normal
>>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 8512-8575 0 Movable
>>>
>>> As I see the problem, you have a problematic firmware that fails to report
>>> memory as mirrored because it reserved for firmware own use. This causes
>>> for non-mirrored memory to appear before mirrored memory. And this breaks
>>> an assumption in find_zone_movable_pfns_for_nodes() that mirrored memory
>>> always has lower addresses than non-mirrored memory and you end up wiht
>>> having all the memory in movable zone.
>>>
>>
>> That assumption seems highly problematic to me on non-x86
>> architectures: why should mirrored (or 'more reliable' in EFI speak)
>> memory always appear before ordinary memory in the physical memory
>> map?
>
> It's not really x86, although historically it probably comes from there.
> ZONE_NORMAL is always before ZONE_MOVABLE, so in order to have ZONE_NORMAL
> with mirrored (more reliable) memory, the mirrored memory should be before
> non-mirrored.
>
>>> So to workaround this firmware issue you propose a hack that would skip
>>> NOMAP regions while calculating zone_movable_pfn because your particular
>>> firmware reports the reserved mirrored memory as NOMAP.
>>>
>>
>> NOMAP is a Linux construct - the particular firmware reports a
>> 'reserved' memory region, but other more widely used memory types such
>> as EfiRuntimeServicesCode or *Data would result in an omitted region
>> as well, and can appear anywhere in the physical memory map. There is
>> no requirement for the firmware to do anything here wrt the
>> MORE_RELIABLE attribute even though such regions may be carved out of
>> a block of memory that is reported as such to the OS.
>>
>> So I agree with Wupeng Ma that there is an issue here: reporting it as
>> mirrored even though it is reserved should not be needed to prevent
>> the kernel from mishandling it.
>
> But a check for NOMAP won't actually fix it in the general case, especially
> if it can appear anywhere in the physical memory map. E.g. if there's an MR
> region followed by two reserved regions and one of these regions is not
> NOMAP and then MR region again, ZONE_NORMAL will only include the first MR
> region.
What kind of memory is reserved and is not nomap.
>
> We may want to consider scanning the entire memblock.memory to find all
> mirrored regions in a and than make a decision where to cut ZONE_NORMAL
> based on that.
AFICT, mirrored memory should always locate at the top of numa memory
region due the linux's zone management. there maybe no good decision
based on memblock.memory rather that use the the first non-mirror
usable memory pfn to cut.
>
Powered by blists - more mailing lists