lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 9 Jun 2017 10:21:00 +0200 From: Vlastimil Babka <vbabka@...e.cz> To: Andrea Arcangeli <aarcange@...hat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>, Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org, linux-kernel@...r.kernel.org, Andy Lutomirski <luto@...nel.org>, Linus Torvalds <torvalds@...ux-foundation.org> Subject: Re: [PATCH 2/4] thp: fix MADV_DONTNEED vs. numa balancing race On 05/23/2017 02:42 PM, Vlastimil Babka wrote: > On 05/16/2017 10:29 PM, Andrea Arcangeli wrote: >> On Wed, Apr 12, 2017 at 03:33:35PM +0200, Vlastimil Babka wrote: >>> >>> pmdp_invalidate() does: >>> >>> pmd_t entry = *pmdp; >>> set_pmd_at(vma->vm_mm, address, pmdp, pmd_mknotpresent(entry)); >>> >>> so it's not atomic and if CPU sets dirty or accessed in the middle of >>> this, they will be lost? >> >> I agree it looks like the dirty bit can be lost. Furthermore this also >> loses a MMU notifier invalidate that will lead to corruption at the >> secondary MMU level (which will keep using the old protection >> permission, potentially keeping writing to a wrprotected page). > > Oh, I didn't paste the whole function, just the pmd manipulation. > There's also a third line: > > flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE); > > so there's no missing invalidate, AFAICS? Sorry for the confusion. Oh, tlb flush is not MMU notified invalidate... >>> >>> But I don't see how the other invalidate caller >>> __split_huge_pmd_locked() deals with this either. Andrea, any idea? >> >> The original code I wrote did this in __split_huge_page_map to create >> the "entry" to establish in the pte pagetables: >> >> entry = mk_pte(page + i, vma->vm_page_prot); >> entry = maybe_mkwrite(pte_mkdirty(entry), >> vma); >> >> For anonymous memory the dirty bit is only meaningful for swapping, >> and THP couldn't be swapped so setting it unconditional avoided any >> issue with the pmdp_invalidate; pmdp_establish. > > Yeah, but now we are going to swap THP's, and we have shmem THP's... > >> pmdp_invalidate is needed primarily to avoid aliasing of two different >> TLB translation pointing from the same virtual address to the the same >> physical address that triggered machine checks (while needing to keep >> the pmd huge at all times, back then it was also splitting huge, >> splitting is a software bit so userland could still access the data, >> splitting bit only blocked kernel code to manipulate on it similar to >> what migration entry does right now upstream, except those prevent >> userland to access the page during split which is less efficient than >> the splitting bit, but at least it's only used for the physical split, >> back then there was no difference between virtual and physical split >> and physical split is less frequent than the virtual one right now). > > This took me a while to grasp, but I think I understand now :) > >> It looks like this needs a pmdp_populate that atomically grabs the >> value of the pmd and returns it like pmdp_huge_get_and_clear_notify >> does > > pmdp_huge_get_and_clear_notify() is now gone... > >> and a _notify variant to use "freeze" is false (if freeze is true >> the MMU notifier invalidate must have happened when the pmd was set to >> a migration entry). If pmdp_populate_notify (freeze==true) >> /pmd_populate (freeze==false) would return the old pmd value >> atomically with xchg() (just instead of setting it to 0 we should set >> it to the mknotpresent one), then we can set the dirty bit on the ptes >> (__split_huge_pmd_locked) or in the pmd itself in the change_huge_pmd >> accordingly. I was trying to look into this yesterday, but e.g. I know next to nothing about MMU notifiers (see above :) so I would probably get it wrong. But it should get fixed, so... Kirill? > I think the confusion was partially caused by the comment at the > original caller of pmdp_invalidate(): > > we first mark the > * current pmd notpresent (atomically because here the pmd_trans_huge > * and pmd_trans_splitting must remain set at all times on the pmd > * until the split is complete for this pmd), > > It says "atomically" but in fact that only means that the pmd_trans_huge > and pmd_trans_splitting flags are not temporarily cleared at any point > of time, right? It's not a true atomic swap. > >> If the "dirty" flag information is obtained by the pmd read before >> calling pmdp_invalidate is not ok (losing _notify also not ok). > > Right. > >> Thanks! >> Andrea >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@...ck.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a> >> > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@...ck.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a> >
Powered by blists - more mailing lists