linux-kernel - Re: [PATCH v4 4/7] mm: make alloc_contig

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <E188FF7F-F52E-4DB6-B258-B9CCD7058E56@nvidia.com>
Date:   Fri, 04 Feb 2022 10:19:55 -0500
From:   Zi Yan <ziy@...dia.com>
To:     Oscar Salvador <osalvador@...e.de>
Cc:     David Hildenbrand <david@...hat.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org,
        Michael Ellerman <mpe@...erman.id.au>,
        Christoph Hellwig <hch@....de>,
        Marek Szyprowski <m.szyprowski@...sung.com>,
        Robin Murphy <robin.murphy@....com>,
        linuxppc-dev@...ts.ozlabs.org,
        virtualization@...ts.linux-foundation.org,
        iommu@...ts.linux-foundation.org, Vlastimil Babka <vbabka@...e.cz>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Eric Ren <renzhengeek@...il.com>
Subject: Re: [PATCH v4 4/7] mm: make alloc_contig_range work at pageblock granularity

On 4 Feb 2022, at 8:56, Oscar Salvador wrote:

> On Wed, Jan 19, 2022 at 02:06:20PM -0500, Zi Yan wrote:
>> From: Zi Yan <ziy@...dia.com>
>>
>> alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging
>> pageblocks with different migratetypes. It might unnecessarily convert
>> extra pageblocks at the beginning and at the end of the range. Change
>> alloc_contig_range() to work at pageblock granularity.
>>
>> It is done by restoring pageblock types and split >pageblock_order free
>> pages after isolating at MAX_ORDER-1 granularity and migrating pages
>> away at pageblock granularity. The reason for this process is that
>> during isolation, some pages, either free or in-use, might have >pageblock
>> sizes and isolating part of them can cause free accounting issues.
>> Restoring the migratetypes of the pageblocks not in the interesting
>> range later is much easier.
>
> Hi Zi Yan,
>
> Due to time constraints I only glanced over, so some comments below
> about stuff that caught my eye:

Thanks for looking at the patches!

>
>> +static inline void split_free_page_into_pageblocks(struct page *free_page,
>> +				int order, struct zone *zone)
>> +{
>> +	unsigned long pfn;
>> +
>> +	spin_lock(&zone->lock);
>> +	del_page_from_free_list(free_page, zone, order);
>> +	for (pfn = page_to_pfn(free_page);
>> +	     pfn < page_to_pfn(free_page) + (1UL << order);
>
> It migt make sense to have a end_pfn variable so that does not have to
> be constantly evaluated. Or maybe the compiler is clever enough to only
> evualuate it once.

Sure. Will add end_pfn.

>
>> +	     pfn += pageblock_nr_pages) {
>> +		int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
>> +
>> +		__free_one_page(pfn_to_page(pfn), pfn, zone, pageblock_order,
>> +				mt, FPI_NONE);
>> +	}
>> +	spin_unlock(&zone->lock);
>
> It is possible that free_page's order is already pageblock_order, so I
> would add a one-liner upfront to catch that case and return, otherwise
> we do the delete_from_freelist-and-free_it_again dance.

Make sense. Will do.

>
>> +	/* Save the migratepages of the pageblocks before start and after end */
>> +	num_pageblock_to_save = (alloc_start - isolate_start) / pageblock_nr_pages
>> +				+ (isolate_end - alloc_end) / pageblock_nr_pages;
>> +	saved_mt =
>> +		kmalloc_array(num_pageblock_to_save,
>> +			      sizeof(unsigned char), GFP_KERNEL);
>> +	if (!saved_mt)
>> +		return -ENOMEM;
>> +
>> +	num = save_migratetypes(saved_mt, isolate_start, alloc_start);
>> +
>> +	num = save_migratetypes(&saved_mt[num], alloc_end, isolate_end);
>
> I really hope we can put all this magic within start_isolate_page_range,
> and the counterparts in undo_isolate_page_range.
>

That is my plan too.

> Also, I kinda dislike the &saved_mt thing. I thought about some other
> approaches but nothing that wasn't too specific for this case, and I
> guess we want that function to be as generic as possible.
>
I do not like it either. This whole save and restore thing should go away
once I make MIGRATE_ISOLATE a standalone bit, so that the original
migrateypes will not be overwritten during isolation. Hopefully, I can
work on it soon to get rid of this hunk completely.

>> +	/*
>> +	 * Split free page spanning [alloc_end, isolate_end) and put the
>> +	 * pageblocks in the right migratetype list
>> +	 */
>> +	for (outer_end = alloc_end; outer_end < isolate_end;) {
>> +		unsigned long begin_pfn = outer_end;
>> +
>> +		order = 0;
>> +		while (!PageBuddy(pfn_to_page(outer_end))) {
>> +			if (++order >= MAX_ORDER) {
>> +				outer_end = begin_pfn;
>> +				break;
>> +			}
>> +			outer_end &= ~0UL << order;
>> +		}
>> +
>> +		if (outer_end != begin_pfn) {
>> +			order = buddy_order(pfn_to_page(outer_end));
>> +
>> +			/*
>> +			 * split the free page has start page and put the pageblocks
>> +			 * in the right migratetype list
>> +			 */
>> +			VM_BUG_ON(outer_end + (1UL << order) <= begin_pfn);
>
> How could this possibily happen?

Right. It is not possible. Will remove it.

>
>> +			{
>> +				struct page *free_page = pfn_to_page(outer_end);
>> +
>> +				split_free_page_into_pageblocks(free_page, order, cc.zone);
>> +			}
>> +			outer_end += 1UL << order;
>> +		} else
>> +			outer_end = begin_pfn + 1;
>>  	}
>
> I think there are cases could optimize for. If the page has already been
> split in pageblock by the outer_start loop, we could skip this outer_end
> logic altogether.
>
> E.g: An order-10 page is split in two pageblocks. There's nothing else
> to be done, right? We could skip this.

Yes. I will think about it more and do some optimization.


--
Best Regards,
Yan, Zi

Download attachment "signature.asc" of type "application/pgp-signature" (855 bytes)