linux-kernel - Re: [PATCH v2] mm/hugetlb: fix a addressing exception caused by huge_pte

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200323164403.GZ20941@ziepe.ca>
Date:   Mon, 23 Mar 2020 13:44:03 -0300
From:   Jason Gunthorpe <jgg@...pe.ca>
To:     Sean Christopherson <sean.j.christopherson@...el.com>
Cc:     Mike Kravetz <mike.kravetz@...cle.com>,
        "Longpeng (Mike, Cloud Infrastructure Service Product Dept.)" 
        <longpeng2@...wei.com>, akpm@...ux-foundation.org,
        kirill.shutemov@...ux.intel.com, linux-kernel@...r.kernel.org,
        arei.gonglei@...wei.com, weidong.huang@...wei.com,
        weifuqiang@...wei.com, kvm@...r.kernel.org, linux-mm@...ck.org,
        Matthew Wilcox <willy@...radead.org>, stable@...r.kernel.org
Subject: Re: [PATCH v2] mm/hugetlb: fix a addressing exception caused by
 huge_pte_offset()

On Mon, Mar 23, 2020 at 07:40:31AM -0700, Sean Christopherson wrote:
> On Sun, Mar 22, 2020 at 07:54:32PM -0700, Mike Kravetz wrote:
> > On 3/22/20 7:03 PM, Longpeng (Mike, Cloud Infrastructure Service Product Dept.) wrote:
> > > 
> > > On 2020/3/22 7:38, Mike Kravetz wrote:
> > >> On 2/21/20 7:33 PM, Longpeng(Mike) wrote:
> > >>> From: Longpeng <longpeng2@...wei.com>
> > I have not looked closely at the generated code for lookup_address_in_pgd.
> > It appears that it would dereference p4d, pud and pmd multiple times.  Sean
> > seemed to think there was something about the calling context that would
> > make issues like those seen with huge_pte_offset less likely to happen.  I
> > do not know if this is accurate or not.
> 
> Only for KVM's calls to lookup_address_in_mm(), I can't speak to other
> calls that funnel into to lookup_address_in_pgd().
> 
> KVM uses a combination of tracking and blocking mmu_notifier calls to ensure
> PTE changes/invalidations between gup() and lookup_address_in_pgd() cause a
> restart of the faulting instruction, and that pending changes/invalidations
> are blocked until installation of the pfn in KVM's secondary MMU completes.
> 
> kvm_mmu_page_fault():
> 
> 	mmu_seq = kvm->mmu_notifier_seq;
> 	smp_rmb();
> 
> 	pfn = gup(hva);
> 
> 	spin_lock(&kvm->mmu_lock);
> 	smp_rmb();
> 	if (kvm->mmu_notifier_seq != mmu_seq)
> 		goto out_unlock: // Restart guest, i.e. retry the fault
> 
> 	lookup_address_in_mm(hva, ...);

It works because the mmu_lock spinlock is taken before and after any
change to the page table via invalidate_range_start/end() callbacks.

So if you are in the spinlock and mmu_notifier_count == 0, then nobody
can be writing to the page tables. 

It is effectively a full page table lock, so any page table read under
that lock do not need to worry about any data races.

Jason