lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 2 Oct 2023 13:43:30 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Zi Yan <ziy@...dia.com>
Cc:     Johannes Weiner <hannes@...xchg.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Kefeng Wang <wangkefeng.wang@...wei.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene

>>> I can do it after I fix this. That change might or might not help only if we make
>>> some redesign on how migratetype is managed. If MIGRATE_ISOLATE does not
>>> overwrite existing migratetype, the code might not need to split a page and move
>>> it to MIGRATE_ISOLATE freelist?
>>
>> Did someone test how memory offlining plays along with that? (I can try myself
>> within the next 1-2 weeks)
>>
>> There [mm/memory_hotplug.c:offline_pages] we always cover full MAX_ORDER ranges,
>> though.
>>
>> ret = start_isolate_page_range(start_pfn, end_pfn,
>> 			       MIGRATE_MOVABLE,
>> 			       MEMORY_OFFLINE | REPORT_FAILURE,
>> 			       GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL);
> 
> Since a full MAX_ORDER range is passed, no free page split will happen.

Okay, thanks for verifying that it should not be affected!

> 
>>
>>>
>>> The fundamental issue in alloc_contig_range() is that to work at
>>> pageblock level, a page (>pageblock_order) can have one part is isolated and
>>> the rest is a different migratetype. {add_to,move_to,del_page_from}_free_list()
>>> now checks first pageblock migratetype, so such a page needs to be removed
>>> from its free_list, set MIGRATE_ISOLATE on one of the pageblock, split, and
>>> finally put back to multiple free lists. This needs to be done at isolation stage
>>> before free pages are removed from their free lists (the stage after isolation).
>>
>> One idea was to always isolate larger chunks, and handle movability checks/split/etc
>> at a later stage. Once isolation would be decoupled from the actual/original migratetype,
>> the could have been easier to handle (especially some corner cases I had in mind back then).
> 
> I think it is a good idea. When I coded alloc_contig_range() up, I tried to
> accommodate existing set_migratetype_isolate(), which calls has_unmovable_pages().
> If these two are decoupled, set_migrateype_isolate() can work on MAX_ORDER-aligned
> ranges and has_unmovable_pages() can still work on pageblock-aligned ranges.
> Let me give this a try.
> 

But again, just some thought I had back then, maybe it doesn't help for 
anything; I found more time to look into the whole thing in more detail.

>>
>>> If MIGRATE_ISOLATE is a separate flag and we are OK with leaving isolated pages
>>> in their original migratetype and check migratetype before allocating a page,
>>> that might help. But that might add extra work (e.g., splitting a partially
>>> isolated free page before allocation) in the really hot code path, which is not
>>> desirable.
>>
>> With MIGRATE_ISOLATE being a separate flag, one idea was to have not a single
>> separate isolate list, but one per "proper migratetype". But again, just some random
>> thoughts I had back then, I never had sufficient time to think it all through.
> 
> Got it. I will think about it.
> 
> One question on separate MIGRATE_ISOLATE:
> 
> the implementation I have in mind is that MIGRATE_ISOLATE will need a dedicated flag
> bit instead of being one of migratetype. But now there are 5 migratetypes +

Exactly what I was concerned about back then ...

> MIGRATE_ISOLATE and PB_migratetype_bits is 3, so an extra migratetype_bit is needed.
> But current migratetype implementation is a word-based operation, requiring
> NR_PAGEBLOCK_BITS to be divisor of BITS_PER_LONG. This means NR_PAGEBLOCK_BITS
> needs to be increased from 4 to 8 to meet the requirement, wasting a lot of space.

... until I did the math. Let's assume a pageblock is 2 MiB.

4/(2* 1024 * 1024 * 8) = 0,00002384185791016 %

8/(2* 1024 * 1024 * 8) -> 1 / (2* 1024 * 1024) = 0,00004768371582031 %

For a 1 TiB machine that means 256 KiB vs. 512 KiB

I concluded that "wasting a lot of space" is not really the right word 
to describe that :)

Just to put it into perspective, the memmap (64/4096) for a 1 TiB 
machine is ... 16 GiB.

> An alternative is to have a separate array for MIGRATE_ISOLATE, which requires
> additional changes. Let me know if you have a better idea. Thanks.

It would probably be cleanest to just use one byte per pageblock. That 
would cleanup the whole machinery eventually as well.

-- 
Cheers,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ