linux-kernel - Re: [PATCH v3 03/14] mm: use pmd lock instead of racy checks in zap_pmd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170207174536.GC5578@node.shutemov.name>
Date:   Tue, 7 Feb 2017 20:45:36 +0300
From:   "Kirill A. Shutemov" <kirill@...temov.name>
To:     Zi Yan <zi.yan@...rutgers.edu>
Cc:     Zi Yan <zi.yan@...t.com>, Andrea Arcangeli <aarcange@...hat.com>,
        Minchan Kim <minchan@...nel.org>, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, kirill.shutemov@...ux.intel.com,
        akpm@...ux-foundation.org, vbabka@...e.cz,
        mgorman@...hsingularity.net, n-horiguchi@...jp.nec.com,
        khandual@...ux.vnet.ibm.com, Zi Yan <ziy@...dia.com>
Subject: Re: [PATCH v3 03/14] mm: use pmd lock instead of racy checks in
 zap_pmd_range()

On Tue, Feb 07, 2017 at 11:14:56AM -0600, Zi Yan wrote:
> 
> 
> Kirill A. Shutemov wrote:
> > On Tue, Feb 07, 2017 at 09:11:05AM -0600, Zi Yan wrote:
> >>>> This causes memory leak or kernel crashing, if VM_BUG_ON() is enabled.
> >>> The problem is that numabalancing calls change_huge_pmd() under
> >>> down_read(mmap_sem), not down_write(mmap_sem) as the rest of users do.
> >>> It makes numabalancing the only code path beyond page fault that can turn
> >>> pmd_none() into pmd_trans_huge() under down_read(mmap_sem).
> >>>
> >>> This can lead to race when MADV_DONTNEED miss THP. That's not critical for
> >>> pagefault vs. MADV_DONTNEED race as we will end up with clear page in that
> >>> case. Not so much for change_huge_pmd().
> >>>
> >>> Looks like we need pmdp_modify() or something to modify protection bits
> >>> inplace, without clearing pmd.
> >>>
> >>> Not sure how to get crash scenario.
> >>>
> >>> BTW, Zi, have you observed the crash? Or is it based on code inspection?
> >>> Any backtraces?
> >> The problem should be very rare in the upstream kernel. I discover the
> >> problem in my customized kernel which does very frequent page migration
> >> and uses numa_protnone.
> >>
> >> The crash scenario I guess is like:
> >> 1. A huge page pmd entry is in the middle of being changed into either a
> >> pmd_protnone or a pmd_migration_entry. It is cleared to pmd_none.
> >>
> >> 2. At the same time, the application frees the vma this page belongs to.
> > 
> > Em... no.
> > 
> > This shouldn't be possible: your 1. must be done under down_read(mmap_sem).
> > And we only be able to remove vma under down_write(mmap_sem), so the
> > scenario should be excluded.
> > 
> > What do I miss?
> 
> You are right. This problem will not happen in the upstream kernel.
> 
> The problem comes from my customized kernel, where I migrate pages away
> instead of reclaiming them when memory is under pressure. I did not take
> any mmap_sem when I migrate pages. So I got this error.
> 
> It is a false alarm. Sorry about that. Thanks for clarifying the problem.

I think there's still a race between MADV_DONTNEED and
change_huge_pmd(.prot_numa=1) resulting in skipping THP by
zap_pmd_range(). It need to be addressed.

And MADV_FREE requires a fix.

So, minus one non-bug, plus two bugs. 

-- 
 Kirill A. Shutemov