[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aJM1RFjpQxQzfASv@kernel.org>
Date: Wed, 6 Aug 2025 13:58:12 +0300
From: Mike Rapoport <rppt@...nel.org>
To: mawupeng <mawupeng1@...wei.com>
Cc: ardb@...nel.org, akpm@...ux-foundation.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: ignore nomap memory during mirror init
On Tue, Aug 05, 2025 at 04:47:31PM +0800, mawupeng wrote:
>
> On 2025/7/22 16:17, Mike Rapoport wrote:
> > Hi Ard,
> >
> > On Mon, Jul 21, 2025 at 03:08:48PM +1000, Ard Biesheuvel wrote:
> >> On Sun, 20 Jul 2025 at 22:38, Mike Rapoport <rppt@...nel.org> wrote:
> >>>
> >> ...
> >>>
> >>>> w/o this patch
> >>>> [root@...alhost ~]# lsmem --output-all
> >>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
> >>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 67584-67839 0 Movable
> >>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 68096-68607 0 Movable
> >>>>
> >>>> w/ this patch
> >>>> [root@...alhost ~]# lsmem --output-all
> >>>> RANGE SIZE STATE REMOVABLE BLOCK NODE ZONES
> >>>> 0x0000084000000000-0x00000847ffffffff 32G online yes 8448-8479 0 Normal
> >>>> 0x0000085000000000-0x0000085fffffffff 64G online yes 8512-8575 0 Movable
> >>>
> >>> As I see the problem, you have a problematic firmware that fails to report
> >>> memory as mirrored because it reserved for firmware own use. This causes
> >>> for non-mirrored memory to appear before mirrored memory. And this breaks
> >>> an assumption in find_zone_movable_pfns_for_nodes() that mirrored memory
> >>> always has lower addresses than non-mirrored memory and you end up wiht
> >>> having all the memory in movable zone.
> >>>
> >>
> >> That assumption seems highly problematic to me on non-x86
> >> architectures: why should mirrored (or 'more reliable' in EFI speak)
> >> memory always appear before ordinary memory in the physical memory
> >> map?
> >
> > It's not really x86, although historically it probably comes from there.
> > ZONE_NORMAL is always before ZONE_MOVABLE, so in order to have ZONE_NORMAL
> > with mirrored (more reliable) memory, the mirrored memory should be before
> > non-mirrored.
> >
> >>> So to workaround this firmware issue you propose a hack that would skip
> >>> NOMAP regions while calculating zone_movable_pfn because your particular
> >>> firmware reports the reserved mirrored memory as NOMAP.
> >>>
> >>
> >> NOMAP is a Linux construct - the particular firmware reports a
> >> 'reserved' memory region, but other more widely used memory types such
> >> as EfiRuntimeServicesCode or *Data would result in an omitted region
> >> as well, and can appear anywhere in the physical memory map. There is
> >> no requirement for the firmware to do anything here wrt the
> >> MORE_RELIABLE attribute even though such regions may be carved out of
> >> a block of memory that is reported as such to the OS.
> >>
> >> So I agree with Wupeng Ma that there is an issue here: reporting it as
> >> mirrored even though it is reserved should not be needed to prevent
> >> the kernel from mishandling it.
> >
> > But a check for NOMAP won't actually fix it in the general case, especially
> > if it can appear anywhere in the physical memory map. E.g. if there's an MR
> > region followed by two reserved regions and one of these regions is not
> > NOMAP and then MR region again, ZONE_NORMAL will only include the first MR
> > region.
>
> What kind of memory is reserved and is not nomap.
EFI_ACPI_RECLAIM_MEMORY is surely reserved and it won't be nomap if it can
be mapped WB. I believe other types may be treated the same, I don't
familiar with efi code enough to tell.
> > We may want to consider scanning the entire memblock.memory to find all
> > mirrored regions in a and than make a decision where to cut ZONE_NORMAL
> > based on that.
>
> AFICT, mirrored memory should always locate at the top of numa memory
> region due the linux's zone management. there maybe no good decision
> based on memblock.memory rather that use the the first non-mirror
> usable memory pfn to cut.
Thinking out loud, if nomap is not usable to Linux why would efi add it to
memblock.memory at all?
--
Sincerely yours,
Mike.
Powered by blists - more mailing lists