lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <58025293-c70f-4377-b8be-39994136af83@redhat.com>
Date: Thu, 1 Aug 2024 08:36:32 +0200
From: David Hildenbrand <david@...hat.com>
To: Usama Arif <usamaarif642@...il.com>, akpm@...ux-foundation.org,
 linux-mm@...ck.org
Cc: hannes@...xchg.org, riel@...riel.com, shakeel.butt@...ux.dev,
 roman.gushchin@...ux.dev, yuzhao@...gle.com, baohua@...nel.org,
 ryan.roberts@....com, rppt@...nel.org, willy@...radead.org,
 cerasuolodomenico@...il.com, corbet@....net, linux-kernel@...r.kernel.org,
 linux-doc@...r.kernel.org, kernel-team@...a.com
Subject: Re: [PATCH 0/6] mm: split underutilized THPs

>> I just added a bunch of quick printfs to QEMU and ran a precopy+postcopy live migration. Looks like my assumption was right:
>>
>> On the destination:
>>
>> Writing received pages during precopy # ram_load_precopy()
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Writing received pages during precopy
>> Disabling THP: MADV_NOHUGEPAGE # postcopy_ram_prepare_discard()
>> Discarding pages # loadvm_postcopy_ram_handle_discard()
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Discarding pages
>> Registering UFFD # postcopy_ram_incoming_setup()
>>
> 
> Thanks for this, yes it makes sense after you mentioned postcopy_ram_incoming_setup.
> postcopy_ram_incoming_setup happens in the Listen phase, which is after the discard phase, so I was able to follow in code in qemu the same sequence of events that the above prints show.


I just added another printf to postcopy_ram_supported_by_host(), where 
we temporarily do a UFFDIO_REGISTER on some test area.

Sensing UFFD support # postcopy_ram_supported_by_host()
Sensing UFFD support
Writing received pages during precopy # ram_load_precopy()
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Disabling THP: MADV_NOHUGEPAGE # postcopy_ram_prepare_discard()
Discarding pages # loadvm_postcopy_ram_handle_discard()
Discarding pages
Discarding pages
Discarding pages
Discarding pages
Discarding pages
Discarding pages
Discarding pages
Discarding pages
Discarding pages
Discarding pages
Discarding pages
Discarding pages
Discarding pages
Discarding pages
Discarding pages
Registering UFFD # postcopy_ram_incoming_setup()

We could think about using this "ever user uffd" to avoid the shared 
zeropage in most processes.

Of course, there might be other applications where that wouldn't work, 
but I think this behavior (write to area before enabling uffd) might be 
fairly QEMU specific already.

Avoiding the shared zeropage has the benefit that a later write fault 
won't have to do a TLB flush and can simply install a fresh anon page.

>>
>> Let me know if you need more information.
>>
>>> Thanks for pointing to mm_forbids_zeropage. Incorporating that into the code, and if I am (hopefully :)) right about qemu and kernel above, then I believe the right code should be:
>>
>> I'm afraid you are not right about the qemu code :)
>>
> 
> Yes, and also didn't consider MADV_DONTNEED! Thanks for explaining both of these things clearly. Its clear that pte_clear won't work in this case.
> 
> We don't need to clear_pte, just use zero_page for all cases. The original series from Alex did tlb flush, but looking further at the code, thats not needed. try_to_migrate() flushes tlb and installs migration entries which are not ‘present’ so will never be tlb cached. remove_migration_ptes() restores page pointers so tlb flushing is not needed. When using zeropage, we don't need make a distinction if uffd is used or not. i.e. we can just do below:
> 
> 	if (contains_data || mm_forbids_zeropage(pvmw->vma->vm_mm))

It's worth noting that on s390x, MMs that forbid the zeropage also have 
THPs disabled. So we shouldn't really run into that that often (of 
course, it's subject to change in the future, so we better have this 
check here).

> 		return false;
> 
> 	newpte = pte_mkspecial(pfn_pte(page_to_pfn(ZERO_PAGE(pvmw->address)),
> 					pvmw->vma->vm_page_prot));
> 
> 	set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte);

We're replacing a present page by another present page without doing a 
TLB flush in between. I *think* this should be fine because the new 
present page is R/O and cannot possibly be written to.

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ