lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e46ab5c6-8b95-4d2f-b820-14659c615e81@redhat.com>
Date: Tue, 19 Nov 2024 17:52:06 +0100
From: David Hildenbrand <david@...hat.com>
To: Zi Yan <ziy@...dia.com>
Cc: Yu Zhao <yuzhao@...gle.com>, Andrew Morton <akpm@...ux-foundation.org>,
 Muchun Song <muchun.song@...ux.dev>,
 "Matthew Wilcox (Oracle)" <willy@...radead.org>, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH mm-unstable v2 1/3] mm/contig_alloc: support __GFP_COMP

On 19.11.24 17:49, David Hildenbrand wrote:
> On 19.11.24 17:41, Zi Yan wrote:
>> On 19 Nov 2024, at 11:31, David Hildenbrand wrote:
>>
>>> On 19.11.24 17:12, Zi Yan wrote:
>>>> On 19 Nov 2024, at 10:29, David Hildenbrand wrote:
>>>>
>>>>>> +/* Split a multi-block free page into its individual pageblocks. */
>>>>>> +static void split_large_buddy(struct zone *zone, struct page *page,
>>>>>> +			      unsigned long pfn, int order, fpi_t fpi)
>>>>>> +{
>>>>>> +	unsigned long end = pfn + (1 << order);
>>>>>> +
>>>>>> +	VM_WARN_ON_ONCE(!IS_ALIGNED(pfn, 1 << order));
>>>>>> +	/* Caller removed page from freelist, buddy info cleared! */
>>>>>> +	VM_WARN_ON_ONCE(PageBuddy(page));
>>>>>> +
>>>>>> +	if (order > pageblock_order)
>>>>>> +		order = pageblock_order;
>>>>>> +
>>>>>> +	while (pfn != end) {
>>>>>> +		int mt = get_pfnblock_migratetype(page, pfn);
>>>>>> +
>>>>>> +		__free_one_page(page, pfn, zone, order, mt, fpi);
>>>>>> +		pfn += 1 << order;
>>>>>> +		page = pfn_to_page(pfn);
>>>>>> +	}
>>>>>> +}
>>>>>
>>>>> Hi,
>>>>>
>>>>> stumbling over this while digging through the code ....
>>>>>
>>>>>> +
>>>>>>      static void free_one_page(struct zone *zone, struct page *page,
>>>>>>      			  unsigned long pfn, unsigned int order,
>>>>>>      			  fpi_t fpi_flags)
>>>>>>      {
>>>>>>      	unsigned long flags;
>>>>>> -	int migratetype;
>>>>>>       	spin_lock_irqsave(&zone->lock, flags);
>>>>>> -	migratetype = get_pfnblock_migratetype(page, pfn);
>>>>>> -	__free_one_page(page, pfn, zone, order, migratetype, fpi_flags);
>>>>>
>>>>> This change is rather undesired:
>>>>>
>>>>> via __free_pages_core()->__free_pages_ok() we can easily end up here with order=MAX_PAGE_ORDER.
>>>>
>>>> Do you have a concrete example? PMD THP on x86_64 is pageblock_order.
>>>> We do not have PMD level mTHP yet. Any other possible source?
>>>>
>>>>>
>>>>> What your new code will do is split this perfectly reasonable MAX_PAGE_ORDER chunk via split_large_buddy() into pageblock-sized chunks, and let the buddy merging logic undo our unnecessary splitting.
>>>>>
>>>>> Is there a way to avoid this and just process the whole MAX_PAGE_ORDER chunk like we used to?
>>>>
>>>> Probably split_large_buddy() can check the migratetypes of the to-be-freed
>>>> page, if order > pageblock_order. If all migratetypes are the same, the page
>>>> can be freed at MAX_PAGE_ORDER, otherwise pageblock_order.
>>>
>>> Thinking about this: why do we care about the migratetype?
>>>
>>> We only have to fallback to pageblocks if any pageblock is "MIGRATE_ISOLATE" (and maybe MIGRATE_CMA), but not all. Otherwise, we can just ignore the migratetype (or rather overwrite it)
>>
>> There are VM_WARN_ONCEs around *_free_list() operations to make sure
>> page migratetype matches the migratetype of the list it is on. Ignoring
>> migratetype would trigger these WARNs. Overwriting it would work but
>> free page migratetype accounting needs to be taken care of.
>>
>> An implicit reason is that __free_one_page() does not support >MAX_PAGE_ORDER
>> and gigantic hugetlb folios are freed via free_one_page(), where
>> split_large_buddy() is used to work around the limitation.
> 
> Yes, I saw that change. But that can be easily identified cased by
> unlikely(order > MAX_PAGE_ORDER) to handle that rare case in a special way.
> 
>   > > For the two memory init cases you mentioned in the other email,
> maybe a new
>> fpi flag to make free_one_page() use __free_one_page() for them,
>> since migratetypes should be the same across MAX_PAGE_ORDER range there, right?
> 
> In the context of alloc_frozen_range()/free_frozen_range() I want to
> free MAX_PAGE_ORDER chunks when possible, and not have some odd logic in
> the freeing path undo some of that effort.

Adding a pointer to that discussion:

https://lkml.kernel.org/r/ZzZdnuZBf-xgiwD2@casper.infradead.org

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ