[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c7b92ff8-44db-4e48-b0af-eb1b6818b16b@redhat.com>
Date: Tue, 21 Nov 2023 20:24:48 +0100
From: David Hildenbrand <david@...hat.com>
To: Sumanth Korikkar <sumanthk@...ux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@...ux.ibm.com>,
linux-mm <linux-mm@...ck.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Oscar Salvador <osalvador@...e.de>,
Michal Hocko <mhocko@...e.com>,
"Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>,
Anshuman Khandual <anshuman.khandual@....com>,
Alexander Gordeev <agordeev@...ux.ibm.com>,
Heiko Carstens <hca@...ux.ibm.com>,
Vasily Gorbik <gor@...ux.ibm.com>,
linux-s390 <linux-s390@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 0/8] implement "memmap on memory" feature on s390
On 21.11.23 14:13, Sumanth Korikkar wrote:
> On Fri, Nov 17, 2023 at 04:37:29PM +0100, David Hildenbrand wrote:
>>>
>>> Maybe there is also already a common code bug with that, s390 might be
>>> special but that is often also good for finding bugs in common code ...
>>
>> If it's only the page_init_poison() as noted by Sumanth, we could disable
>> that on s390x with an altmap some way or the other; should be possible.
>>
>> I mean, you effectively have your own poisoning if the altmap is effectively
>> inaccessible and makes your CPU angry on access :)
>>
>> Last but not least, support for an inaccessible altmap might come in handy
>> for virtio-mem eventually, and make altmap support eventually simpler. So
>> added bonus points.
>
> We tried out two possibilities dealing with vmemmap altmap inaccessibilty.
> Approach 1: Add MHP_ALTMAP_INACCESSIBLE flag and pass it in add_memory()
>
> diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c
> index 075094ca59b4..ab2dfcc7e9e4 100644
> --- a/drivers/s390/char/sclp_cmd.c
> +++ b/drivers/s390/char/sclp_cmd.c
> @@ -358,6 +358,13 @@ static int sclp_mem_notifier(struct notifier_block *nb,
> * buddy allocator later.
> */
> __arch_set_page_nodat((void *)__va(start), memory_block->altmap->free);
> + /*
> + * Poison the struct pages after memory block is accessible.
> + * This is needed for only altmap. Without altmap, the struct
> + * pages are poisoined in sparse_add_section().
> + */
> + if (memory_block->altmap->inaccessible)
> + page_init_poison(pfn_to_page(arg->start_pfn), memory_block->altmap->free);
See below, likely it's best if the core will do that.
> break;
> case MEM_FINISH_OFFLINE:
> sclp_mem_change_state(start, size, 0);
> @@ -412,7 +419,7 @@ static void __init add_memory_merged(u16 rn)
> goto skip_add;
> for (addr = start; addr < start + size; addr += block_size)
> add_memory(0, addr, block_size,
> - MACHINE_HAS_EDAT1 ? MHP_MEMMAP_ON_MEMORY : MHP_NONE);
> + MACHINE_HAS_EDAT1 ? MHP_MEMMAP_ON_MEMORY|MHP_ALTMAP_INACCESSIBLE : MHP_NONE);
> skip_add:
> first_rn = rn;
> num = 1;
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 7d2076583494..5c70707e706f 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -106,6 +106,11 @@ typedef int __bitwise mhp_t;
> * implies the node id (nid).
> */
> #define MHP_NID_IS_MGID ((__force mhp_t)BIT(2))
> +/*
> + * Mark memmap on memory (struct pages array) as inaccessible during memory
> + * hotplug addition phase.
> + */
> +#define MHP_ALTMAP_INACCESSIBLE ((__force mhp_t)BIT(3))
If we go that path, maybe rather MHP_OFFLINE_INACCESSIBLE and document
how this relates to/interacts with the memory notifier callbacks and the
altmap.
Then, we can logically abstract this from altmap handling. Simply, the
memory should not be read/written before the memory notifier was called.
In the core, you can do the poisioning for the altmap in that case after
calling the notifier, probably in mhp_init_memmap_on_memory() as you do
below.
>
> /*
> * Extended parameters for memory hotplug:
> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> index 744c830f4b13..9837f3e6fb95 100644
> --- a/include/linux/memremap.h
> +++ b/include/linux/memremap.h
> @@ -25,6 +25,7 @@ struct vmem_altmap {
> unsigned long free;
> unsigned long align;
> unsigned long alloc;
> + bool inaccessible;
We should then likely remember that information for the memory block,
not the altmap.
[...]
>
>
> Approach 2:
> ===========
> Shouldnt kasan zero shadow mapping performed first before
> accessing/initializing memmap via page_init_poisining()? If that is
Likely the existing way. The first access to the poisoned memmap should
be a write, not a read. But I didn't look into the details.
Can you try reproducing?
> true, then it is a problem for all architectures and should could be
> fixed like:
>
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 7a5fc89a8652..eb3975740537 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1093,6 +1093,7 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages,
> if (ret)
> return ret;
>
> + page_init_poison(pfn_to_page(pfn), sizeof(struct page) * nr_pages);
> move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_UNMOVABLE);
>
> for (i = 0; i < nr_pages; i++)
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 77d91e565045..4ddf53f52075 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -906,8 +906,11 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
> /*
> * Poison uninitialized struct pages in order to catch invalid flags
> * combinations.
> + * For altmap, do this later when onlining the memory, as it might
> + * not be accessible at this point.
> */
> - page_init_poison(memmap, sizeof(struct page) * nr_pages);
> + if (!altmap)
> + page_init_poison(memmap, sizeof(struct page) * nr_pages);
>
> ms = __nr_to_section(section_nr);
> set_section_nid(section_nr, nid);
>
That's too generic when it comes to other altmap users, especially DAX
or when the altmap is accessible while memory is offlining, and we want
to catch that.
>
>
> Also, if this approach is taken, should page_init_poison() be performed
> with cond_resched() as mentioned in commit d33695b16a9f
> ("mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()") ?
Likely as soon as possible after it is accessible :)
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists