lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 13 Aug 2020 22:57:32 +0530
From:   Charan Teja Kalla <charante@...eaurora.org>
To:     Michal Hocko <mhocko@...e.com>
Cc:     akpm@...ux-foundation.org, vbabka@...e.cz, david@...hat.com,
        rientjes@...gle.com, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, vinmenon@...eaurora.org
Subject: Re: [PATCH V2] mm, page_alloc: fix core hung in free_pcppages_bulk()

Thanks Michal.

On 8/13/2020 10:00 PM, Michal Hocko wrote:
> On Thu 13-08-20 21:51:29, Charan Teja Kalla wrote:
>> Thanks Michal for comments.
>>
>> On 8/13/2020 5:11 PM, Michal Hocko wrote:
>>> On Tue 11-08-20 18:28:23, Charan Teja Reddy wrote:
>>> [...]
>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>> index e4896e6..839039f 100644
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -1304,6 +1304,11 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>>>>  	struct page *page, *tmp;
>>>>  	LIST_HEAD(head);
>>>>  
>>>> +	/*
>>>> +	 * Ensure proper count is passed which otherwise would stuck in the
>>>> +	 * below while (list_empty(list)) loop.
>>>> +	 */
>>>> +	count = min(pcp->count, count);
>>>>  	while (count) {
>>>>  		struct list_head *list;
>>>
>>>
>>> How does this prevent the race actually?
>>
>> This doesn't prevent the race. This only fixes the core hung(as this is
>> called with spin_lock_irq()) caused by the race condition. This core
>> hung is because of incorrect count value is passed to the
>> free_pcppages_bulk() function.
> 
> Let me ask differently. What does enforce that the count and lists do
> not get out of sync in the loop. 

count value is updated whenever an order-0 page is being added to the
pcp lists through free_unref_page_commit(), which is being called with
both interrupts, premption disabled.
static void free_unref_page_commit(struct page *page, {
	....
	list_add(&page->lru, &pcp->lists[migratetype]);
	pcp->count++
}

As these are pcp lists, they only gets touched by another process when
this process is context switched, which happens only after enabling
premption or interrupts. So, as long as process is operating on these
pcp lists in free_unref_page_commit function, the count and lists are
always synced.

However, the problem here is not that the count and lists are being out
of sync. They do always in sync, as explained above. It is with the
asking free_pcppages_bulk() to free the pages more than what is present
in the pcp lists which is ending up in while(list_empty()).

> Your changelog says that the fix is to
> use the proper value without any specifics.
> 
Will change this to: Ensure the count value passed is not greater than
the pcp lists count. Any better you suggest?

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum, a Linux Foundation Collaborative Project

Powered by blists - more mailing lists