[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9ca76893-dfe8-9a46-f2ec-6b3c663e848e@codeaurora.org>
Date: Thu, 13 Aug 2020 21:51:29 +0530
From: Charan Teja Kalla <charante@...eaurora.org>
To: Michal Hocko <mhocko@...e.com>
Cc: akpm@...ux-foundation.org, vbabka@...e.cz, david@...hat.com,
rientjes@...gle.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, vinmenon@...eaurora.org
Subject: Re: [PATCH V2] mm, page_alloc: fix core hung in free_pcppages_bulk()
Thanks Michal for comments.
On 8/13/2020 5:11 PM, Michal Hocko wrote:
> On Tue 11-08-20 18:28:23, Charan Teja Reddy wrote:
> [...]
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index e4896e6..839039f 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1304,6 +1304,11 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>> struct page *page, *tmp;
>> LIST_HEAD(head);
>>
>> + /*
>> + * Ensure proper count is passed which otherwise would stuck in the
>> + * below while (list_empty(list)) loop.
>> + */
>> + count = min(pcp->count, count);
>> while (count) {
>> struct list_head *list;
>
>
> How does this prevent the race actually?
This doesn't prevent the race. This only fixes the core hung(as this is
called with spin_lock_irq()) caused by the race condition. This core
hung is because of incorrect count value is passed to the
free_pcppages_bulk() function.
The actual race should be fixed by David's suggestion (isolate, online
and undo isolation).
Something needs to be improved in commit message? May be like below:
s/The following race is observed with the repeated ... / Cpu core hung
is observed as a result of race with the use case of repeated...
Don't we need something like
> the following instead?
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e028b87ce294..45bcc7ba37c4 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1317,9 +1317,16 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> * lists
> */
> do {
> + bool looped = false;
IIUC, this looped will always be initialzed to false thus never jumped
to free.
But I think I got your idea that looping of the pcp lists for any pages.
If not found despite MIGRATE_PCPTYPES count lists are traversed, just
break the loop. Does this checking really required? Shouldn't pcp->count
tells the same whether any pages left in the pcp lists?
> +
> batch_free++;
> - if (++migratetype == MIGRATE_PCPTYPES)
> + if (++migratetype == MIGRATE_PCPTYPES) {
> + if (looped)
> + goto free;
> +
> migratetype = 0;
> + looped = true;
> + }
> list = &pcp->lists[migratetype];
> } while (list_empty(list));
>
> @@ -1352,6 +1359,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> } while (--count && --batch_free && !list_empty(list));
> }
>
> +free:
> spin_lock(&zone->lock);
> isolated_pageblocks = has_isolate_pageblock(zone);
>
>
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum, a Linux Foundation Collaborative Project
Powered by blists - more mailing lists