[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201005132445.GA4555@dhcp22.suse.cz>
Date: Mon, 5 Oct 2020 15:24:45 +0200
From: Michal Hocko <mhocko@...e.com>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Pavel Tatashin <pasha.tatashin@...een.com>,
David Hildenbrand <david@...hat.com>,
Oscar Salvador <osalvador@...e.de>,
Joonsoo Kim <iamjoonsoo.kim@....com>
Subject: Re: [PATCH 5/9] mm, page_alloc: make per_cpu_pageset accessible only
after init
On Tue 22-09-20 16:37:08, Vlastimil Babka wrote:
> setup_zone_pageset() replaces the boot_pageset by allocating and initializing a
> proper percpu one. Currently it assigns zone->pageset with the newly allocated
> one before initializing it. That's currently not an issue, because the zone
> should not be in any zonelist, thus not visible to allocators at this point.
>
> Memory ordering between the pcplist contents and its visibility is also not
> guaranteed here, but that also shouldn't be an issue because online_pages()
> does a spin_unlock(pgdat->node_size_lock) before building the zonelists.
>
> However it's best that we don't silently rely on operations that can be changed
> in the future. Make sure only properly initialized pcplists are visible, using
> smp_store_release(). The read side has a data dependency via the zone->pageset
> pointer instead of an explicit read barrier.
Heh, this looks like inveting a similar trap the previous patch was
removing. But more seriously considering that we need a locking for the
whole setup, wouldn't it be better to simply document the locking
requirements rather than adding scary looking barriers future ourselves
or somebody else will need to scratch heads about. I am pretty sure we
don't do anything like that when initializating numa node or other data
structures that might be allocated during the memory hotadd.
> Signed-off-by: Vlastimil Babka <vbabka@...e.cz>
> ---
> mm/page_alloc.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 99b74c1c2b0a..de3b48bda45c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6246,15 +6246,17 @@ static void zone_set_pageset_high_and_batch(struct zone *zone)
>
> void __meminit setup_zone_pageset(struct zone *zone)
> {
> + struct per_cpu_pageset __percpu * new_pageset;
> struct per_cpu_pageset *p;
> int cpu;
>
> - zone->pageset = alloc_percpu(struct per_cpu_pageset);
> + new_pageset = alloc_percpu(struct per_cpu_pageset);
> for_each_possible_cpu(cpu) {
> - p = per_cpu_ptr(zone->pageset, cpu);
> + p = per_cpu_ptr(new_pageset, cpu);
> pageset_init(p);
> }
>
> + smp_store_release(&zone->pageset, new_pageset);
> zone_set_pageset_high_and_batch(zone);
> }
>
> --
> 2.28.0
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists