lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aaeec83d-bdf8-280c-b943-ad510f1d8db2@quicinc.com>
Date:   Tue, 19 Jul 2022 20:42:42 +0530
From:   Charan Teja Kalla <quic_charante@...cinc.com>
To:     Michal Hocko <mhocko@...e.com>
CC:     <akpm@...ux-foundation.org>, <pasha.tatashin@...een.com>,
        <sjpark@...zon.de>, <sieberf@...zon.com>, <shakeelb@...gle.com>,
        <dhowells@...hat.com>, <willy@...radead.org>, <vbabka@...e.cz>,
        <david@...hat.com>, <minchan@...nel.org>,
        <linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
        "iamjoonsoo.kim@....com" <iamjoonsoo.kim@....com>,
        Pavan Kondeti <quic_pkondeti@...cinc.com>
Subject: Re: [PATCH] mm: fix use-after free of page_ext after race with
 memory-offline

Thanks Michal!!

On 7/18/2022 8:24 PM, Michal Hocko wrote:
>>>> The above mentioned race is just one example __but the problem persists
>>>> in the other paths too involving page_ext->flags access(eg:
>>>> page_is_idle())__. Since offline waits till the last reference on the
>>>> page goes down i.e. any path that took the refcount on the page can make
>>>> the memory offline operation to wait. Eg: In the migrate_pages()
>>>> operation, we do take the extra refcount on the pages that are under
>>>> migration and then we do copy page_owner by accessing page_ext. For
>>>>
>>>> Fix those paths where offline races with page_ext access by maintaining
>>>> synchronization with rcu lock.
>>> Please be much more specific about the synchronization. How does RCU
>>> actually synchronize the offlining and access? Higher level description
>>> of all the actors would be very helpful not only for the review but also
>>> for future readers.
>> I will improve the commit message about this synchronization change
>> using RCU's.
> Thanks! The most imporant part is how the exclusion is actual achieved
> because that is not really clear at first sight
> 
> CPU1					CPU2
> lookup_page_ext(PageA)			offlining
> 					  offline_page_ext
> 					    __free_page_ext(addrA)
> 					      get_entry(addrA)
> 					      ms->page_ext = NULL
> 					      synchronize_rcu()
> 					      free_page_ext
> 					        free_pages_exact (now addrA is unusable)
> 					
>   rcu_read_lock()
>   entryA = get_entry(addrA)
>     base + page_ext_size * index # an address not invalidated by the freeing path
>   do_something(entryA)
>   rcu_read_unlock()
> 
> CPU1 never checks ms->page_ext so it cannot bail out early when the
> thing is torn down. Or maybe I am missing something. I am not familiar
> with page_ext much.


Thanks a lot for catching this Michal. You are correct that the proposed
code from me is still racy. I Will correct this along with the proper
commit message in the next version of this patch.

> 
>>> Also, more specifically
>>> [...]
>>>> diff --git a/mm/page_ext.c b/mm/page_ext.c
>>>> index 3dc715d..5ccd3ee 100644
>>>> --- a/mm/page_ext.c
>>>> +++ b/mm/page_ext.c
>>>> @@ -299,8 +299,9 @@ static void __free_page_ext(unsigned long pfn)
>>>>  	if (!ms || !ms->page_ext)
>>>>  		return;
>>>>  	base = get_entry(ms->page_ext, pfn);
>>>> -	free_page_ext(base);
>>>>  	ms->page_ext = NULL;
>>>> +	synchronize_rcu();
>>>> +	free_page_ext(base);
>>>>  }
>>> So you are imposing the RCU grace period for each page_ext! This can get
>>> really expensive. Have you tried to measure the effect?
> I was wrong here! This is for each memory section which is not as
> terrible as every single page_ext. This can be still quite a lot memory
> sections in a single memory block (e.g. on ppc memory sections are
> ridiculously small).
> 

On the ARM64, I see that the minimum a section size will go is 128MB. I
think 16MB is the section size on ppc. Any inputs on how frequently
offline/online operation is being done on this ppc arch?


>> I didn't really measure the effect. Let me measure it and post these in V2.
> I think it would be much more optimal to split the operation into 2
> phases. Invalidate all the page_ext metadata then synchronize_rcu and
> only then free them all. I am not very familiar with page_ext so I am
> not sure this is easy to be done. Maybe page_ext = NULL can be done in
> the first stage.
> 

Let me explore If this can be easily done.

>>> 3) Change the design where the page_ext is valid as long as the struct
>>> page is alive.
>> :/ Doesn't spark joy."
> I would be wondering why. It should only take to move the callback to
> happen at hotremove. So it shouldn't be very involved of a change. I can
> imagine somebody would be relying on releasing resources when offlining
> memory but is that really the case?

I don't find any hard need of the clients needs to release this page ext
memory.

What I can think of is that page_ext size is proportional to the debug
features(is what for being used on 64bit, as of now) we are enabling.
Eg: Enabling the page_owner requires additional 0x30 bytes per page
which memory is not required when the memory block is offlined. But then
it should be the same case for memory occupied by struct page too for
this offlined block.

One comment from the initial discussion : "It smells like page_ext
should use some mechanism during  MEM_OFFLINE to
synchronize against any users of its metadata. Generic memory offlining
code might be the wrong place for that."  -- I think the page_ext
creation and deletion should fit into the sparse code. I will try to
provide the changes on tomorrow and If it seems unfit there, I will work
on improving the current patch based on the rcu logic.

Thanks,
Charan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ