lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Thu, 31 Mar 2022 15:46:19 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Khalid Aziz <khalid.aziz@...cle.com>, linux-kernel@...r.kernel.org
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Hugh Dickins <hughd@...gle.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        David Rientjes <rientjes@...gle.com>,
        Shakeel Butt <shakeelb@...gle.com>,
        John Hubbard <jhubbard@...dia.com>,
        Jason Gunthorpe <jgg@...dia.com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Mike Rapoport <rppt@...ux.ibm.com>,
        Yang Shi <shy828301@...il.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Matthew Wilcox <willy@...radead.org>,
        Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
        Michal Hocko <mhocko@...nel.org>,
        Nadav Amit <namit@...are.com>, Rik van Riel <riel@...riel.com>,
        Roman Gushchin <guro@...com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Peter Xu <peterx@...hat.com>,
        Donald Dutile <ddutile@...hat.com>,
        Christoph Hellwig <hch@....de>,
        Oleg Nesterov <oleg@...hat.com>, Jan Kara <jack@...e.cz>,
        Liang Zhang <zhangliang5@...wei.com>,
        Pedro Gomes <pedrodemargomes@...il.com>,
        Oded Gabbay <oded.gabbay@...il.com>, linux-mm@...ck.org
Subject: Re: [PATCH v2 01/15] mm/rmap: fix missing swap_free() in
 try_to_unmap() after arch_unmap_one() failed

> Hi David,

Hi,

> 
> arch_remap_one() sounds reasonable. Would you like to add that in your patch series, or would you like me to create a 
> separate patch for adding this on top of your patch series?

Let's handle it separately, because I think there is still a lot to be
clarified. I also feel like the terminology ("arch_unmap_one()") is a
little too generic, if we really only care about anonymous pages (do we?).

> 
>>
>>>
>>>>
>>>> arch_unmap_one() calls adi_save_tags(), which allocates memory.
>>>> adi_restore_tags()->del_tag_store() reverts that operation and ends up
>>>> freeing memory conditionally; However, it's only
>>>> called from arch_do_swap_page().
>>>>
>>>>
>>>> Here is why I have to scratch my head:
>>>>
>>>> a) arch_do_swap_page() is only called from do_swap_page(). We don't do anything similar
>>>> for mm/swapfile.c:unuse_pte(), aren't we missing something?
>>>
>>> My understanding of this code path maybe flawed, so do correct me if this does not sound right. unused_pte() is called
>>> upon user turning off swap on a device. unused_pte() is called by unused_pte_range() which swaps the page back in from
>>> swap device before calling unuse_pte(). Once the page is read back in from swap, ultimately access to the va for the
>>> page will result in call to __handle_mm_fault() which in turn will call handle_pte_fault() to insert a new pte for this
>>> mapping and handle_pte_fault() will call arch_do_swap_page() which will restore the tags.
>>
>> unuse_pte() will replace a swap pte directly by a proper, present pte,
>> just like do_swap_page() would. You won't end up in do_swap_page()
>> anymore and arch_do_swap_page() won't be called, because there is no
>> swap PTE anymore.
>>
>>>
>>>>
>>>> b) try_to_migrate_one() does the arch_unmap_one(), but who will do the
>>>> restore+free after migration succeeded or failed, aren't we missing something?
>>>
>>> try_to_migrate_one() replaces the current pte with a migration pte after calling arch_unmap_one(). This causes
>>> __handle_mm_fault() to be called when a reference to the va covered by migration pte is made. This will in turn finally
>>> result in a call to arch_do_swap_page() which restores the tags.
>>
>> Migration PTEs are restore via mm/migrate.c:remove_migration_ptes().
>> arch_do_swap_page() won't be called.
>>
>> What you mention is if someone accesses the migration PTE while
>> migration is active and the migration PTEs have not been removed yet.
>> While we'll end up in do_swap_page(), we'll do a migration_entry_wait(),
>> followed by an effective immediate "return 0;". arch_do_swap_page()
>> won't get called.
>>
>>
>> So in essence, I think this doesn't work as expected yet. In the best
>> case we don't immediately free memory. In the worst case we lose the tags.
>>
> 
> I see what you mean. I can work on fixing these issues up.

Ideally, I think we'd handle migration differently: migrate the tags
directly instead of temporarily storing them

I also wonder, how to handle migration of THP (which uses migration
PMDs) and THP splitting (which also uses migration PTEs); maybe they
don't apply on sparc?

Further, I wonder if you have to handle zapping of swap/migration PTEs,
and if you'd also have to cleanup+freeup any allcoated tag memory?
E.g., zap_pte_range() can simply zap swap and migration entries, wouldn't

Last but not least, I do wonder if mremap() -- e.g., move_pte() -- and
fork() -- e.g., copy_pte_range() -- are handled properly, where we can
end up moving/copying swap+migration PTEs around.

-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ