lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e6b64991-65d7-0d25-3866-6b0b44f171b1@redhat.com>
Date:   Tue, 18 Apr 2023 17:57:47 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Peter Xu <peterx@...hat.com>
Cc:     linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mel Gorman <mgorman@...hsingularity.net>,
        "Kirill A. Shutemov" <kirill@...temov.name>
Subject: Re: [PATCH mm-unstable v1] mm: don't check VMA write permissions if
 the PTE/PMD indicates write permissions

On 18.04.23 17:56, Peter Xu wrote:
> On Tue, Apr 18, 2023 at 04:21:13PM +0200, David Hildenbrand wrote:
>> Staring at the comment "Recheck VMA as permissions can change since
>> migration started" in remove_migration_pte() can result in confusion,
>> because if the source PTE/PMD indicates write permissions, then there
>> should be no need to check VMA write permissions when restoring migration
>> entries or PTE-mapping a PMD.
>>
>> Commit d3cb8bf6081b ("mm: migrate: Close race between migration completion
>> and mprotect") introduced the maybe_mkwrite() handling in
>> remove_migration_pte() in 2014, stating that a race between mprotect() and
>> migration finishing would be possible, and that we could end up with
>> a writable PTE that should be readable.
>>
>> However, mprotect() code first updates vma->vm_flags / vma->vm_page_prot
>> and then walks the page tables to (a) set all present writable PTEs to
>> read-only and (b) convert all writable migration entries to readable
>> migration entries. While walking the page tables and modifying the
>> entries, migration code has to grab the PT locks to synchronize against
>> concurrent page table modifications.
> 
> Makes sense to me.
> 
>>
>> Assuming migration would find a writable migration entry (while holding
>> the PT lock) and replace it with a writable present PTE, surely mprotect()
>> code didn't stumble over the writable migration entry yet (converting it
>> into a readable migration entry) and would instead wait for the PT lock to
>> convert the now present writable PTE into a read-only PTE. As mprotect()
>> didn't finish yet, the behavior is just like migration didn't happen: a
>> writable PTE will be converted to a read-only PTE.
>>
>> So it's fine to rely on the writability information in the source
>> PTE/PMD and not recheck against the VMA as long as we're holding the PT
>> lock to synchronize with anyone who concurrently wants to downgrade write
>> permissions (like mprotect()) by first adjusting vma->vm_flags /
>> vma->vm_page_prot to then walk over the page tables to adjust the page
>> table entries.
>>
>> Running test cases that should reveal such races -- mprotect(PROT_READ)
>> racing with page migration or THP splitting -- for multiple hours did
>> not reveal an issue with this cleanup.
>>
>> Cc: Andrew Morton <akpm@...ux-foundation.org>
>> Cc: Mel Gorman <mgorman@...hsingularity.net>
>> Cc: Peter Xu <peterx@...hat.com>
>> Signed-off-by: David Hildenbrand <david@...hat.com>
>> ---
>>
>> This is a follow-up cleanup to [1]:
>> 	[PATCH v1 RESEND 0/6] mm: (pte|pmd)_mkdirty() should not
>> 	unconditionally allow for write access
>>
>> I wanted to be a bit careful and write some test cases to convince myself
>> that I am not missing something important. Of course, there is still the
>> possibility that my test cases are buggy ;)
>>
>> Test cases I'm running:
>> 	https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/test_mprotect_migration.c
>> 	https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/test_mprotect_thp_split.c
>>
>>
>> [1] https://lkml.kernel.org/r/20230411142512.438404-1-david@redhat.com
>>
>> ---
>>   mm/huge_memory.c | 4 ++--
>>   mm/migrate.c     | 5 +----
>>   2 files changed, 3 insertions(+), 6 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index c23fa39dec92..624671aaa60d 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2234,7 +2234,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>>   		} else {
>>   			entry = mk_pte(page + i, READ_ONCE(vma->vm_page_prot));
>>   			if (write)
>> -				entry = maybe_mkwrite(entry, vma);
>> +				entry = pte_mkwrite(entry);
> 
> This is another change besides page migration.  I also don't know why it's
> needed, but it's there since day 1 of thp split in eef1b3ba053, so maybe
> worthwhile to copy Kirill too (which I did).

Indeed (I wanted but forgot ...), thanks Peter!

-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ