[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4b282848-d2d7-6156-4726-ce974b2dff41@redhat.com>
Date: Tue, 22 Dec 2020 10:11:40 +0100
From: David Hildenbrand <david@...hat.com>
To: Anshuman Khandual <anshuman.khandual@....com>,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Cc: catalin.marinas@....com, will@...nel.org, ardb@...nel.org,
Mark Rutland <mark.rutland@....com>,
James Morse <james.morse@....com>,
Robin Murphy <robin.murphy@....com>,
Jérôme Glisse <jglisse@...hat.com>,
Dan Williams <dan.j.williams@...el.com>,
Mike Rapoport <rppt@...ux.ibm.com>,
Michael Ellerman <michael@...erman.id.au>
Subject: Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
On 22.12.20 08:12, Anshuman Khandual wrote:
> pfn_valid() validates a pfn but basically it checks for a valid struct page
> backing for that pfn. It should always return positive for memory ranges
> backed with struct page mapping. But currently pfn_valid() fails for all
> ZONE_DEVICE based memory types even though they have struct page mapping.
>
> pfn_valid() asserts that there is a memblock entry for a given pfn without
> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
> that they do not have memblock entries. Hence memblock_is_map_memory() will
> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
> into the system via memremap_pages() called from a driver, their respective
> memory sections will not have SECTION_IS_EARLY set.
>
> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
> for firmware reserved memory regions. memblock_is_map_memory() can just be
> skipped as its always going to be positive and that will be an optimization
> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
> normal memory too will not have SECTION_IS_EARLY set for their sections.
>
> Skipping memblock_is_map_memory() for all non early memory sections would
> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
> performance for normal hotplug memory as well.
>
> Cc: Catalin Marinas <catalin.marinas@....com>
> Cc: Will Deacon <will@...nel.org>
> Cc: Ard Biesheuvel <ardb@...nel.org>
> Cc: Robin Murphy <robin.murphy@....com>
> Cc: linux-arm-kernel@...ts.infradead.org
> Cc: linux-kernel@...r.kernel.org
> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
> Signed-off-by: Anshuman Khandual <anshuman.khandual@....com>
> ---
> arch/arm64/mm/init.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 75addb36354a..ee23bda00c28 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -225,6 +225,18 @@ int pfn_valid(unsigned long pfn)
>
> if (!valid_section(__pfn_to_section(pfn)))
> return 0;
> +
> + /*
> + * ZONE_DEVICE memory does not have the memblock entries.
> + * memblock_is_map_memory() check for ZONE_DEVICE based
> + * addresses will always fail. Even the normal hotplugged
> + * memory will never have MEMBLOCK_NOMAP flag set in their
> + * memblock entries. Skip memblock search for all non early
> + * memory sections covering all of hotplug memory including
> + * both normal and ZONE_DEVIE based.
> + */
> + if (!early_section(__pfn_to_section(pfn)))
> + return 1;
Actually, I think we want to check for partial present sections.
Maybe we can rather switch to generic pfn_valid() and tweak it to
something like
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fb3bf696c05e..7b1fcce5bd5a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1382,9 +1382,13 @@ static inline int pfn_valid(unsigned long pfn)
return 0;
/*
* Traditionally early sections always returned pfn_valid() for
- * the entire section-sized span.
+ * the entire section-sized span. Some archs might have holes in
+ * early sections, so double check with memblock if configured.
*/
- return early_section(ms) || pfn_section_valid(ms, pfn);
+ if (early_section(ms))
+ return IS_ENABLED(CONFIG_EARLY_SECTION_MEMMAP_HOLES) ?
+ memblock_is_map_memory(pfn << PAGE_SHIFT) : 1;
+ return pfn_section_valid(ms, pfn);
}
#endif
Which users are remaining that require us to add/remove memblocks when
hot(un)plugging memory
$ git grep KEEP_MEM | grep memory_hotplug
mm/memory_hotplug.c: if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
mm/memory_hotplug.c: if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
mm/memory_hotplug.c: if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
I think one user we would have to handle is
arch/arm64/mm/mmap.c:valid_phys_addr_range(). AFAIS, powerpc at least
does not rely on memblock_is_map_memory.
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists