lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 27 May 2024 09:53:45 +0200
From: David Hildenbrand <david@...hat.com>
To: Brendan Jackman <jackmanb@...gle.com>
Cc: Oscar Salvador <osalvador@...e.de>,
 Andrew Morton <akpm@...ux-foundation.org>, Mike Rapoport <rppt@...nel.org>,
 Michal Hocko <mhocko@...e.com>, Anshuman Khandual
 <anshuman.khandual@....com>, Vlastimil Babka <vbabka@...e.cz>,
 Pavel Tatashin <pasha.tatashin@...een.com>, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] mm,memory_hotplug: Remove un-taken lock

Am 24.05.24 um 14:02 schrieb Brendan Jackman:
> On Wed, May 22, 2024 at 05:24:17PM +0200, David Hildenbrand wrote:
>> On 22.05.24 16:27, Brendan Jackman wrote:
>>> On Wed, May 22, 2024 at 04:09:41PM +0200, David Hildenbrand wrote:
> 
>>> By the way, some noob questions: am I OK with my assumption that it's
>>> fine for reader code to operate on zone spans that are both stale and
>>> "from the future"? thinking abstractly I guess that seeing a stale
>>> value when racing with offline_pages is roughly the same as seeing a
>>> value "from the future" when racing with online_pages?
>>
>> Right. PFN walkers should be using pfn_to_online_page(), where races are
>> possible but barely seen in practice.
>>
>> zone handlers like mm/compaction.c can likely deal with races, although it
>> might all be cleaner (and safer?) when using start+end. I recall it also
>> recalls on pfn_to_online_page().
>>
>> Regarding page_outside_zone_boundaries(), it should be fine if we can read
>> start+end atomically, that way we would not accidentally report "page
>> outside ..." when changing the start address. I think with your current
>> patch that might happen (although likely extremely hard to trigger) when
>> growing the zone at the start, reducing zone_start_pfn.
> 
> Thanks a lot, this is very helpful
> 
>>> Also, is it ever possible for pages to get removed and then added back
>>> and end up in a different zone than before?
>>
>> Yes. Changing between MOVABLE and NORMAL is possible and can easily be
>> triggered by offlining+re-onlining memory blocks.
> 
> So, even if we make it impossible to see a totally bogus zone span,
> you can observe a stale/futuristic span which currently contains pages
> from a different zone?

Yes. Note that zones/nodes can easily overlap, so a page being spanned by 
another zones is common and supported already.

> 
> That seems to imply you could look up a page page from a PFN within
> zone A's apparent span, lock zone A and assume you can safely modify
> the freelist the page is on, but actually that page is now in zone B.

That's why we obtain the zone/node always from the page itself (stored in page 
flags). This data can only change when offlining+reonlining memory (and 
pfn_to_online_page() would refuse to hand out the page while temporarily online).

There were discussions around using RCU to improve pfn_to_online_page() racing 
with memory offlining, but the motivation to do that has been rather small: we 
barely see such races in practice. Memory offlining+re-onlining simply takes too 
long.

> 
> So for example:
> 
> 1. compact_zone() sets cc->free_pfn based on zone_end_pfn
>    2. isolate_freepages() sets isolate_start_pfn = cc->free_pfn
>      3. isolate_freepages_block() looks up a page based on that PFN
>      3. ... then takes the cc->zone lock
>      4. ... then calls __isolate_free_page which removes the page from
>         whatever freelist it's on.
> 
> Is anything stopping part 4 from modifying a zone that wasn't locked
> in part 3?

Likely that overlapping zones already exist and are handled accordingly.

-- 
Thanks,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ