[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <589A0090.3050406@cs.rutgers.edu>
Date: Tue, 7 Feb 2017 11:14:56 -0600
From: Zi Yan <zi.yan@...rutgers.edu>
To: "Kirill A. Shutemov" <kirill@...temov.name>
CC: Zi Yan <zi.yan@...t.com>, Andrea Arcangeli <aarcange@...hat.com>,
Minchan Kim <minchan@...nel.org>,
<linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
<kirill.shutemov@...ux.intel.com>, <akpm@...ux-foundation.org>,
<vbabka@...e.cz>, <mgorman@...hsingularity.net>,
<n-horiguchi@...jp.nec.com>, <khandual@...ux.vnet.ibm.com>,
Zi Yan <ziy@...dia.com>
Subject: Re: [PATCH v3 03/14] mm: use pmd lock instead of racy checks in zap_pmd_range()
Kirill A. Shutemov wrote:
> On Tue, Feb 07, 2017 at 09:11:05AM -0600, Zi Yan wrote:
>>>> This causes memory leak or kernel crashing, if VM_BUG_ON() is enabled.
>>> The problem is that numabalancing calls change_huge_pmd() under
>>> down_read(mmap_sem), not down_write(mmap_sem) as the rest of users do.
>>> It makes numabalancing the only code path beyond page fault that can turn
>>> pmd_none() into pmd_trans_huge() under down_read(mmap_sem).
>>>
>>> This can lead to race when MADV_DONTNEED miss THP. That's not critical for
>>> pagefault vs. MADV_DONTNEED race as we will end up with clear page in that
>>> case. Not so much for change_huge_pmd().
>>>
>>> Looks like we need pmdp_modify() or something to modify protection bits
>>> inplace, without clearing pmd.
>>>
>>> Not sure how to get crash scenario.
>>>
>>> BTW, Zi, have you observed the crash? Or is it based on code inspection?
>>> Any backtraces?
>> The problem should be very rare in the upstream kernel. I discover the
>> problem in my customized kernel which does very frequent page migration
>> and uses numa_protnone.
>>
>> The crash scenario I guess is like:
>> 1. A huge page pmd entry is in the middle of being changed into either a
>> pmd_protnone or a pmd_migration_entry. It is cleared to pmd_none.
>>
>> 2. At the same time, the application frees the vma this page belongs to.
>
> Em... no.
>
> This shouldn't be possible: your 1. must be done under down_read(mmap_sem).
> And we only be able to remove vma under down_write(mmap_sem), so the
> scenario should be excluded.
>
> What do I miss?
You are right. This problem will not happen in the upstream kernel.
The problem comes from my customized kernel, where I migrate pages away
instead of reclaiming them when memory is under pressure. I did not take
any mmap_sem when I migrate pages. So I got this error.
It is a false alarm. Sorry about that. Thanks for clarifying the problem.
--
Best Regards,
Yan Zi
Download attachment "signature.asc" of type "application/pgp-signature" (538 bytes)
Powered by blists - more mailing lists