[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4c5311c7-bae4-4e48-7006-787f2ee19fb3@redhat.com>
Date: Tue, 23 Nov 2021 17:54:31 +0100
From: David Hildenbrand <david@...hat.com>
To: Zi Yan <ziy@...dia.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Michael Ellerman <mpe@...erman.id.au>,
Christoph Hellwig <hch@....de>,
Marek Szyprowski <m.szyprowski@...sung.com>,
Robin Murphy <robin.murphy@....com>,
linuxppc-dev@...ts.ozlabs.org,
virtualization@...ts.linux-foundation.org,
iommu@...ts.linux-foundation.org
Subject: Re: [RFC PATCH 0/3] Use pageblock_order for cma and
alloc_contig_range alignment.
On 17.11.21 04:04, Zi Yan wrote:
> On 16 Nov 2021, at 3:58, David Hildenbrand wrote:
>
>> On 15.11.21 20:37, Zi Yan wrote:
>>> From: Zi Yan <ziy@...dia.com>
>>>
>>> Hi David,
>>
>> Hi,
>>
>> thanks for looking into this.
>>
Hi,
sorry for the delay, I wasn't "actually working" last week, so now I'm
back from holiday :)
>>>
>>> You suggested to make alloc_contig_range() deal with pageblock_order instead of
>>> MAX_ORDER - 1 and get rid of MAX_ORDER - 1 dependency in virtio_mem[1]. This
>>> patchset is my attempt to achieve that. Please take a look and let me know if
>>> I am doing it correctly or not.
>>>
>>> From what my understanding, cma required alignment of
>>> max(MAX_ORDER - 1, pageblock_order), because when MIGRATE_CMA was introduced,
>>> __free_one_page() does not prevent merging two different pageblocks, when
>>> MAX_ORDER - 1 > pageblock_order. But current __free_one_page() implementation
>>> does prevent that. It should be OK to just align cma to pageblock_order.
>>> alloc_contig_range() relies on MIGRATE_CMA to get free pages, so it can use
>>> pageblock_order as alignment too.
>>
>> I wonder if that's sufficient. Especially the outer_start logic in
>> alloc_contig_range() might be problematic. There are some ugly corner
>> cases with free pages/allocations spanning multiple pageblocks and we
>> only isolated a single pageblock.
>
> Thank you a lot for writing the list of these corner cases. They are
> very helpful!
>
>>
>>
>> Regarding CMA, we have to keep the following cases working:
>>
>> a) Different pageblock types (MIGRATE_CMA and !MIGRATE_CMA) in MAX_ORDER
>> - 1 page:
>> [ MAX_ ORDER - 1 ]
>> [ pageblock 0 | pageblock 1]
>>
>> Assume either pageblock 0 is MIGRATE_CMA or pageblock 1 is MIGRATE_CMA,
>> but not both. We have to make sure alloc_contig_range() keeps working
>> correctly. This should be the case even with your change, as we won't
>> merging pages accross differing migratetypes.
>
> Yes.
>
>>
>> b) Migrating/freeing a MAX_ ORDER - 1 page while partially isolated:
>> [ MAX_ ORDER - 1 ]
>> [ pageblock 0 | pageblock 1]
>>
>> Assume both are MIGRATE_CMA. Assume we want to either allocate from
>> pageblock 0 or pageblock 1. Especially, assume we want to allocate from
>> pageblock 1. While we would isolate pageblock 1, we wouldn't isolate
>> pageblock 0.
>>
>> What happens if we either have a free page spanning the MAX_ORDER - 1
>> range already OR if we have to migrate a MAX_ORDER - 1 page, resulting
>> in a free MAX_ORDER - 1 page of which only the second pageblock is
>> isolated? We would end up essentially freeing a page that has mixed
>> pageblocks, essentially placing it in !MIGRATE_ISOLATE free lists ... I
>> might be wrong but I have the feeling that this would be problematic.
>>
>
> This could happen when start_isolate_page_range() stumbles upon a compound
> page with order >= pageblock_order or a free page with order >=
> pageblock_order, but should not. start_isolate_page_range() should check
> the actual page size, either compound page size or free page size, and set
> the migratetype across pageblocks if the page is bigger than pageblock size.
> More precisely set_migratetype_isolate() should do that.
Right -- but then we have to extend the isolation range from within
set_migratetype_isolate() :/ E.g., how should we know what we have to
unisolate later ..
>
>
>> c) Concurrent allocations:
>> [ MAX_ ORDER - 1 ]
>> [ pageblock 0 | pageblock 1]
>>
>> Assume b) but we have two concurrent CMA allocations to pageblock 0 and
>> pageblock 1, which would now be possible as start_isolate_page_range()
>> isolate would succeed on both.
>
> Two isolations will be serialized by the zone lock taken by
> set_migratetype_isolate(), so the concurrent allocation would not be a problem.
> If it is a MAX_ORDER-1 free page, the first comer should split it and only
> isolate one of the pageblock then second one can isolate the other pageblock.
Right, the issue is essentially b) above.
> If it is a MAX_ORDER-1 compound page, the first comer should isolate both
> pageblocks, then the second one would fail. WDYT?
Possibly we could even have two independent CMA areas "colliding" within
a MAX_ ORDER - 1 page. I guess the surprise would be getting an
"-EAGAIN" from isolation, but the caller should properly handle that.
Maybe it's really easier to do something along the lines I proposed
below and always isolate the complete MAX_ORDER-1 range in
alloc_contig_range(). We just have to "fix" isolation as I drafted.
>
>
> In sum, it seems to me that the issue is page isolation code only sees
> pageblock without check the actual page. When there are multiple pageblocks
> belonging to one page, the problem appears. This should be fixed.
>
>>
>>
>> Regarding virtio-mem, we care about the following cases:
>>
>> a) Allocating parts from completely movable MAX_ ORDER - 1 page:
>> [ MAX_ ORDER - 1 ]
>> [ pageblock 0 | pageblock 1]
>>
>> Assume pageblock 0 and pageblock 1 are either free or contain only
>> movable pages. Assume we allocated pageblock 0. We have to make sure we
>> can allocate pageblock 1. The other way around, assume we allocated
>> pageblock 1, we have to make sure we can allocate pageblock 0.
>>
>> Free pages spanning both pageblocks might be problematic.
>
> Can you elaborate a bit? If either of pageblock 0 and 1 is used by
> virtio-mem, why do we care the other? If pageblock 0 and 1 belong to
> the same page (either free or compound), they should have the same
> migratetype. If we want to just allocate one of them, we can split
> the free page or migrate the compound page then split the remaining
> free page.
The thing is: it has to work on ZONE_NORMAL and ZONE_MOVABLE as well.
It's essentially the same issue as a) and b) in the CMA case, so it
should be covered by these.
>
>>
>> b) Allocate parts of partially movable MAX_ ORDER - 1 page:
>> [ MAX_ ORDER - 1 ]
>> [ pageblock 0 | pageblock 1]
>>
>> Assume pageblock 0 contains unmovable data but pageblock 1 not: we have
>> to make sure we can allocate pageblock 1. Similarly, assume pageblock 1
>> contains unmovable data but pageblock 0 no: we have to make sure we can
>> allocate pageblock 1. has_unmovable_pages() might allow for that.
>>
>> But, we want to fail early in case we want to allocate a single
>> pageblock but it contains unmovable data. This could be either directly
>> or indirectly.
>>
>> If we have an unmovable (compound) MAX_ ORDER - 1 and we'd try isolating
>> pageblock 1, has_unmovable_pages() would always return "false" because
>> we'd simply be skiping over any tail pages, and not detect the
>> un-movability.
>
> OK. It seems to me that has_unmovable_pages() needs to be fixed to handle
> such a situation.
Right.
>
>>
>> c) Migrating/freeing a MAX_ ORDER - 1 page while partially isolated:
>>
>> Same concern as for CMA b)
>>
>>
>> So the biggest concern I have is dealing with migrating/freeing >
>> pageblock_order pages while only having isolated a single pageblock.
>
> I agree. I think isolation code needs to be aware of >pageblock_order
> pages and act accordingly. If it is a free page, split the page to
> avoid isolating a subset of the page. If it is a compound page, either
> fail the isolation or isolate the entire compound page instead.
>
>>
>>>
>>> In terms of virtio_mem, if I understand correctly, it relies on
>>> alloc_contig_range() to obtain contiguous free pages and offlines them to reduce
>>> guest memory size. As the result of alloc_contig_range() alignment change,
>>> virtio_mem should be able to just align PFNs to pageblock_order.
>>
>> For virtio-mem it will most probably be desirable to first try
>> allocating the MAX_ORDER -1 range if possible and then fallback to
>> pageblock_order. But that's an additional change on top in virtio-mem code.
>>
>
> Just to understand the motivation, is this because MAX_ORDER-1 range
> would be faster than pageblock_order? What if MAX_ORDER-1 goes beyond
> a memory section size (like my WIP patchset to increase MAX_ORDER
> beyond the memory section size)? virtio-mem could first try PAGES_PER_SECTION,
> then fall back to pageblock_order, right?
My comment is only in the context of this patch series, not in context
of eventually raising MAX_ORDER to exceed eventually a single memory
section or even a memory block.
Yes, it would be faster. What we do right now (if the complete memory
block is used by Linux and thus not allocated by virtio-mem yet):
a) Try allocating the the complete memory block.
b) If it fails, fallback to essentially MAX_ORDER - 1 chunks
In the context of this patch it would be reasonable to
a) Try allocating the the complete memory block.
b) If it fails, fallback to essentially MAX_ORDER - 1 chunks
c) If it fails, fallback to essentially pageblock order chunks
Things will be different if we change MAX_ORDER - 1.
>>
>>
>> My take to teach alloc_contig_range() to properly handle would be the
>> following:
>>
>> a) Convert MIGRATE_ISOLATE into a separate pageblock flag
>>
>> We would want to convert MIGRATE_ISOLATE into a separate pageblock
>> flags, such that when we isolate a page block we preserve the original
>> migratetype. While start_isolate_page_range() would set that bit,
>> undo_isolate_page_range() would simply clear that bit. The buddy would
>> use a single MIGRATE_ISOLATE queue as is: the original migratetype is
>> only used for restoring the correct migratetype. This would allow for
>> restoring e.g., MIGRATE_UNMOVABLE after isolating an unmovable pageblock
>> (below) and not simply setting all such pageblocks to MIGRATE_MOVABLE
>> when un-isolating.
>>
>> Ideally, we'd get rid of the "migratetype" parameter for
>> alloc_contig_range(). However, even with the above change we have to
>> make sure that memory offlining and ordinary alloc_contig_range() users
>> will fail on MIGRATE_CMA pageblocks (has_unmovable_page() checks that as
>> of today). We could achieve that differently, though (e.g., bool
>> cma_alloc parameter instead).
>
> This might need to be done in a separate patch, since pageblock bits require
> to be word aligned and it is 4 now. To convert MIGRATE_ISOLATE to a separate
> bit, either NR_PAGEBLOCK_BITS needs to be increased to 8 or a separate
> isolation bitmap array needs to be allocated. Or the migratetype information
> can be stored temporarily during isolation process. I can look into it later.
Right, we'd need 8 instead of 4 bits. But we could preserve the previous
migratettype (MOVABLE, UNMOVABLE, CMA) ... when isolating and wouldn't
have to magically punch in whatever someone told us.
>
>
>>
>>
>> b) Allow isolating pageblocks with unmovable pages
>>
>> We'd pass the actual range of interest to start_isolate_page_range() and
>> rework the code to check has_unmovable_pages() only on the range of
>> interest, but considering overlapping larger allocations. E.g., if we
>> stumble over a compound page, lookup the head an test if that page is
>> movable/unmovable.
>
> This is an optimization to reduce isolation failure rate, right? This only
> applies to the pageblocks at the beginning and the end of a range of interest.
Right. And with a) we can simply isolate+unisolate without always
changing the migratetype e.g., to MIGRATE_MOVABLE in case of virtio-mem.
>
>>
>> c) Change alloc_contig_range() to not "extend" the range of interest to
>> include pageblock of different type. Assume we're isolating a
>> MIGRATE_CMA pageblock, only isolate a neighboring MIGRATE_CMA pageblock,
>> not other pageblocks.
>
> But alloc_contig_range() would return these extended pageblocks at the end.
> And if pageblock migratetype can be preserved during isolation (item (a) above),
> this would not be a problem, right?
We have to make sure that ordinary alloc_contig_range() and memory
offlining don't allocate MIGRATE_CMA ranges. So when actually isolating
we have to refuse isolating MIGRATE_CMA pageblocks. Handling that in the
caller when adjusting the range keeps the logic in the actual isolation
code is one option (cma=false - bail out when wanting to isolate
MIGRATE_CMA).
But there might be alternatives. Most probably we'd just have to check
for the "range of interest". If cma=false we just have to make sure to
not isolate MIGRATE_CMA inside the "range of interest". Yes that should
work as well.
>
>>
>>
>> So we'd keep isolating complete MAX_ORDER - 1 pages unless c) prevents
>> it. We'd allow isolating even pageblocks that contain unmovable pages on
>> ZONE_NORMAL, and check via has_unmovable_pages() only if the range of
>> interest contains unmovable pages, not the whole MAX_ORDER -1 range or
>> even the whole pageblock. We'd not silently overwrite the pageblock type
>> when restoring but instead restore the old migratetype.
>>
> I assume MAX_ORDER - 1 is an optimization for faster isolation speed.
> If MAX_ORDER goes beyond a memory section size, I guess PAGES_PER_SECTION
> is what you want, right? FYI, I am preparing a follow-up patch to replace
> any MAX_ORDER use that is intended to indicate maximum physically contiguous
> size with a new variable, MAX_PHYS_CONTIG_ORDER, which is PFN_SECTION_SHIFT
> when SPARSEMEM and MAX_ORDER when FLATMEM. I would replace MAX_ORDER here
> with the new variable.
Yes, with MAX_ORDER changes it will be a different story. We could
detect at runtime what actually makes sense.
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists