[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5983130E.2070806@intel.com>
Date: Thu, 03 Aug 2017 20:11:58 +0800
From: Wei Wang <wei.w.wang@...el.com>
To: Michal Hocko <mhocko@...nel.org>
CC: linux-kernel@...r.kernel.org,
virtualization@...ts.linux-foundation.org, kvm@...r.kernel.org,
linux-mm@...ck.org, mst@...hat.com, mawilcox@...rosoft.com,
akpm@...ux-foundation.org, virtio-dev@...ts.oasis-open.org,
david@...hat.com, cornelia.huck@...ibm.com,
mgorman@...hsingularity.net, aarcange@...hat.com,
amit.shah@...hat.com, pbonzini@...hat.com,
liliang.opensource@...il.com, yang.zhang.wz@...il.com,
quan.xu@...yun.com
Subject: Re: [PATCH v13 4/5] mm: support reporting free page blocks
On 08/03/2017 07:28 PM, Michal Hocko wrote:
> On Thu 03-08-17 19:27:19, Wei Wang wrote:
>> On 08/03/2017 06:44 PM, Michal Hocko wrote:
>>> On Thu 03-08-17 18:42:15, Wei Wang wrote:
>>>> On 08/03/2017 05:11 PM, Michal Hocko wrote:
>>>>> On Thu 03-08-17 14:38:18, Wei Wang wrote:
>>> [...]
>>>>>> +static int report_free_page_block(struct zone *zone, unsigned int order,
>>>>>> + unsigned int migratetype, struct page **page)
>>>>> This is just too ugly and wrong actually. Never provide struct page
>>>>> pointers outside of the zone->lock. What I've had in mind was to simply
>>>>> walk free lists of the suitable order and call the callback for each one.
>>>>> Something as simple as
>>>>>
>>>>> for (i = 0; i < MAX_NR_ZONES; i++) {
>>>>> struct zone *zone = &pgdat->node_zones[i];
>>>>>
>>>>> if (!populated_zone(zone))
>>>>> continue;
>>>>> spin_lock_irqsave(&zone->lock, flags);
>>>>> for (order = min_order; order < MAX_ORDER; ++order) {
>>>>> struct free_area *free_area = &zone->free_area[order];
>>>>> enum migratetype mt;
>>>>> struct page *page;
>>>>>
>>>>> if (!free_area->nr_pages)
>>>>> continue;
>>>>>
>>>>> for_each_migratetype_order(order, mt) {
>>>>> list_for_each_entry(page,
>>>>> &free_area->free_list[mt], lru) {
>>>>>
>>>>> pfn = page_to_pfn(page);
>>>>> visit(opaque2, prn, 1<<order);
>>>>> }
>>>>> }
>>>>> }
>>>>>
>>>>> spin_unlock_irqrestore(&zone->lock, flags);
>>>>> }
>>>>>
>>>>> [...]
>>>> I think the above would take the lock for too long time. That's why we
>>>> prefer to take one free page block each time, and taking it one by one
>>>> also doesn't make a difference, in terms of the performance that we
>>>> need.
>>> I think you should start with simple approach and impove incrementally
>>> if this turns out to be not optimal. I really detest taking struct pages
>>> outside of the lock. You never know what might happen after the lock is
>>> dropped. E.g. can you race with the memory hotremove?
>>
>> The caller won't use pages returned from the function, so I think there
>> shouldn't be an issue or race if the returned pages are used (i.e. not free
>> anymore) or simply gone due to hotremove.
> No, this is just too error prone. Consider that struct page pointer
> itself could get invalid in the meantime. Please always keep robustness
> in mind first. Optimizations are nice but it is even not clear whether
> the simple variant will cause any problems.
how about this:
for_each_populated_zone(zone) {
for_each_migratetype_order_decend(min_order, order, type) {
do {
=> spin_lock_irqsave(&zone->lock, flags);
ret = report_free_page_block(zone, order, type,
&page)) {
pfn = page_to_pfn(page);
nr_pages = 1 << order;
visit(opaque1, pfn, nr_pages);
}
=> spin_unlock_irqrestore(&zone->lock, flags);
} while (!ret)
}
In this way, we can still keep the lock granularity at one free page block
while having the struct page operated under the lock.
Best,
Wei
Powered by blists - more mailing lists