linux-kernel - Re: [PATCH 5/9] mm, page_alloc: make per_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <bbe42c37-a442-52cb-b620-b1753be1d643@suse.cz>
Date:   Wed, 7 Oct 2020 00:28:03 +0200
From:   Vlastimil Babka <vbabka@...e.cz>
To:     Michal Hocko <mhocko@...e.com>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Pavel Tatashin <pasha.tatashin@...een.com>,
        David Hildenbrand <david@...hat.com>,
        Oscar Salvador <osalvador@...e.de>,
        Joonsoo Kim <iamjoonsoo.kim@....com>
Subject: Re: [PATCH 5/9] mm, page_alloc: make per_cpu_pageset accessible only
 after init

On 10/5/20 3:24 PM, Michal Hocko wrote:
> On Tue 22-09-20 16:37:08, Vlastimil Babka wrote:
>> setup_zone_pageset() replaces the boot_pageset by allocating and initializing a
>> proper percpu one. Currently it assigns zone->pageset with the newly allocated
>> one before initializing it. That's currently not an issue, because the zone
>> should not be in any zonelist, thus not visible to allocators at this point.
>> 
>> Memory ordering between the pcplist contents and its visibility is also not
>> guaranteed here, but that also shouldn't be an issue because online_pages()
>> does a spin_unlock(pgdat->node_size_lock) before building the zonelists.
>> 
>> However it's best that we don't silently rely on operations that can be changed
>> in the future. Make sure only properly initialized pcplists are visible, using
>> smp_store_release(). The read side has a data dependency via the zone->pageset
>> pointer instead of an explicit read barrier.
> 
> Heh, this looks like inveting a similar trap the previous patch was
> removing. But more seriously considering that we need a locking for the
> whole setup, wouldn't it be better to simply document the locking
> requirements rather than adding scary looking barriers future ourselves
> or somebody else will need to scratch heads about. I am pretty sure we
> don't do anything like that when initializating numa node or other data
> structures that might be allocated during the memory hotadd.

Yeah it will be best to drop this for now. I just looked closed at 
build_zonelist() implementation and got scared how it just modifies the list 
that others might be reading at the same time. zoneref_set_zone() sets 
zoneref->zone and zoneref->zone_idx without any sync, what if somebody, e.g. 
for_next_zone_zonelist_nodemask() makes decision based on zone_idx and then 
picks wrong zone pointer? etc.

>> Signed-off-by: Vlastimil Babka <vbabka@...e.cz>
>> ---
>>  mm/page_alloc.c | 6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>> 
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 99b74c1c2b0a..de3b48bda45c 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -6246,15 +6246,17 @@ static void zone_set_pageset_high_and_batch(struct zone *zone)
>>  
>>  void __meminit setup_zone_pageset(struct zone *zone)
>>  {
>> +	struct per_cpu_pageset __percpu * new_pageset;
>>  	struct per_cpu_pageset *p;
>>  	int cpu;
>>  
>> -	zone->pageset = alloc_percpu(struct per_cpu_pageset);
>> +	new_pageset = alloc_percpu(struct per_cpu_pageset);
>>  	for_each_possible_cpu(cpu) {
>> -		p = per_cpu_ptr(zone->pageset, cpu);
>> +		p = per_cpu_ptr(new_pageset, cpu);
>>  		pageset_init(p);
>>  	}
>>  
>> +	smp_store_release(&zone->pageset, new_pageset);
>>  	zone_set_pageset_high_and_batch(zone);
>>  }
>>  
>> -- 
>> 2.28.0
>