[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <909cc1b6-6f4f-4c45-f418-31d5dd5acaa3@oracle.com>
Date: Tue, 29 Mar 2022 14:42:57 -0600
From: Khalid Aziz <khalid.aziz@...cle.com>
To: David Hildenbrand <david@...hat.com>, linux-kernel@...r.kernel.org
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Hugh Dickins <hughd@...gle.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
David Rientjes <rientjes@...gle.com>,
Shakeel Butt <shakeelb@...gle.com>,
John Hubbard <jhubbard@...dia.com>,
Jason Gunthorpe <jgg@...dia.com>,
Mike Kravetz <mike.kravetz@...cle.com>,
Mike Rapoport <rppt@...ux.ibm.com>,
Yang Shi <shy828301@...il.com>,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
Matthew Wilcox <willy@...radead.org>,
Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
Michal Hocko <mhocko@...nel.org>,
Nadav Amit <namit@...are.com>, Rik van Riel <riel@...riel.com>,
Roman Gushchin <guro@...com>,
Andrea Arcangeli <aarcange@...hat.com>,
Peter Xu <peterx@...hat.com>,
Donald Dutile <ddutile@...hat.com>,
Christoph Hellwig <hch@....de>,
Oleg Nesterov <oleg@...hat.com>, Jan Kara <jack@...e.cz>,
Liang Zhang <zhangliang5@...wei.com>,
Pedro Gomes <pedrodemargomes@...il.com>,
Oded Gabbay <oded.gabbay@...il.com>, linux-mm@...ck.org
Subject: Re: [PATCH v2 01/15] mm/rmap: fix missing swap_free() in
try_to_unmap() after arch_unmap_one() failed
On 3/29/22 07:59, David Hildenbrand wrote:
> On 15.03.22 11:47, David Hildenbrand wrote:
>> In case arch_unmap_one() fails, we already did a swap_duplicate(). let's
>> undo that properly via swap_free().
>>
>> Fixes: ca827d55ebaa ("mm, swap: Add infrastructure for saving page metadata on swap")
>> Reviewed-by: Khalid Aziz <khalid.aziz@...cle.com>
>> Signed-off-by: David Hildenbrand <david@...hat.com>
>> ---
>> mm/rmap.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index 6a1e8c7f6213..f825aeef61ca 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1625,6 +1625,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>> break;
>> }
>> if (arch_unmap_one(mm, vma, address, pteval) < 0) {
>> + swap_free(entry);
>> set_pte_at(mm, address, pvmw.pte, pteval);
>> ret = false;
>> page_vma_mapped_walk_done(&pvmw);
>
> Hi Khalid,
>
> I'm a bit confused about the semantics if arch_unmap_one(), I hope you can clarify.
>
>
> See patch #11 in this series, were we can fail unmapping after arch_unmap_one() succeeded. E.g.,
>
> @@ -1623,6 +1634,24 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> page_vma_mapped_walk_done(&pvmw);
> break;
> }
> + if (anon_exclusive &&
> + page_try_share_anon_rmap(subpage)) {
> + swap_free(entry);
> + set_pte_at(mm, address, pvmw.pte, pteval);
> + ret = false;
> + page_vma_mapped_walk_done(&pvmw);
> + break;
> + }
> + /*
> + * Note: We don't remember yet if the page was mapped
> + * exclusively in the swap entry, so swapin code has
> + * to re-determine that manually and might detect the
> + * page as possibly shared, for example, if there are
> + * other references on the page or if the page is under
> + * writeback. We made sure that there are no GUP pins
> + * on the page that would rely on it, so for GUP pins
> + * this is fine.
> + */
> if (list_empty(&mm->mmlist)) {
> spin_lock(&mmlist_lock);
> if (list_empty(&mm->mmlist))
>
>
> For now, I was under the impression that we don't have to undo anything after
> arch_unmap_one() succeeded, because we seem to not do anything for two
> cases below. But looking into arch_unmap_one() and how it allocates stuff I do
> wonder what we would actually want to do here -- I'd assume we'd want to
> trigger the del_tag_store() somehow?
Hi David,
Currently once arch_unmap_one() completes successfully, we are at the point of no return for this pte. It will be
replaced by swap pte soon thereafter. Patch 11 adds another case where we may return without replacing current pte with
swap pte. For now could you resolve this by moving the above code block in patch 11 to before the call to
arch_unmap_one(). That still leaves open the issue having the flexibility of undoing what arch_unmap_one() does for some
other reason in future. That will require coming up with a properly architected way to do it.
>
> arch_unmap_one() calls adi_save_tags(), which allocates memory.
> adi_restore_tags()->del_tag_store() reverts that operation and ends up
> freeing memory conditionally; However, it's only
> called from arch_do_swap_page().
>
>
> Here is why I have to scratch my head:
>
> a) arch_do_swap_page() is only called from do_swap_page(). We don't do anything similar
> for mm/swapfile.c:unuse_pte(), aren't we missing something?
My understanding of this code path maybe flawed, so do correct me if this does not sound right. unused_pte() is called
upon user turning off swap on a device. unused_pte() is called by unused_pte_range() which swaps the page back in from
swap device before calling unuse_pte(). Once the page is read back in from swap, ultimately access to the va for the
page will result in call to __handle_mm_fault() which in turn will call handle_pte_fault() to insert a new pte for this
mapping and handle_pte_fault() will call arch_do_swap_page() which will restore the tags.
>
> b) try_to_migrate_one() does the arch_unmap_one(), but who will do the
> restore+free after migration succeeded or failed, aren't we missing something?
try_to_migrate_one() replaces the current pte with a migration pte after calling arch_unmap_one(). This causes
__handle_mm_fault() to be called when a reference to the va covered by migration pte is made. This will in turn finally
result in a call to arch_do_swap_page() which restores the tags.
>
>
> I assume that we won't be properly freeing the tag space in case of a), and won't
> be properly restoring/migrating the tags in case of a) and b).
I think both cases are covered correctly unless I am misunderstand the code.
--
Khalid
Powered by blists - more mailing lists