linux-kernel - Re: [PATCH] mm: fix account pmd page to the process

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <bf76cc6c-a0da-98f9-4a89-0bb6161f5adf@oracle.com>
Date:	Thu, 16 Jun 2016 09:47:46 -0700
From:	Mike Kravetz <mike.kravetz@...cle.com>
To:	Michal Hocko <mhocko@...nel.org>
Cc:	zhongjiang <zhongjiang@...wei.com>, akpm@...ux-foundation.org,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	"Kirill A. Shutemov" <kirill@...temov.name>
Subject: Re: [PATCH] mm: fix account pmd page to the process

On 06/16/2016 09:31 AM, Michal Hocko wrote:
> On Thu 16-06-16 09:05:23, Mike Kravetz wrote:
>> On 06/16/2016 08:43 AM, Michal Hocko wrote:
>>> [It seems that this patch has been sent several times and this
>>> particular copy didn't add Kirill who has added this code CC him now]
>>>
>>> On Thu 16-06-16 17:42:14, Michal Hocko wrote:
>>>> On Thu 16-06-16 19:36:11, zhongjiang wrote:
>>>>> From: zhong jiang <zhongjiang@...wei.com>
>>>>>
>>>>> when a process acquire a pmd table shared by other process, we
>>>>> increase the account to current process. otherwise, a race result
>>>>> in other tasks have set the pud entry. so it no need to increase it.
>>>>>
>>>>> Signed-off-by: zhong jiang <zhongjiang@...wei.com>
>>>>> ---
>>>>>  mm/hugetlb.c | 5 ++---
>>>>>  1 file changed, 2 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>>> index 19d0d08..3b025c5 100644
>>>>> --- a/mm/hugetlb.c
>>>>> +++ b/mm/hugetlb.c
>>>>> @@ -4189,10 +4189,9 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
>>>>>  	if (pud_none(*pud)) {
>>>>>  		pud_populate(mm, pud,
>>>>>  				(pmd_t *)((unsigned long)spte & PAGE_MASK));
>>>>> -	} else {
>>>>> +	} else 
>>>>>  		put_page(virt_to_page(spte));
>>>>> -		mm_inc_nr_pmds(mm);
>>>>> -	}
>>>>
>>>> The code is quite puzzling but is this correct? Shouldn't we rather do
>>>> mm_dec_nr_pmds(mm) in that path to undo the previous inc?
>>
>> I agree that the code is quite puzzling. :(
>>
>> However, if this were an issue I would have expected to see some reports.
>> Oracle DB makes use of this feature (shared page tables) and if the pmd
>> count is wrong we would catch it in check_mm() at exit time.
>>
>> Upon closer examination, I believe the code in question is never executed.
>> Note the callers of huge_pmd_share.  The calling code looks like:
>>
>>                         if (want_pmd_share() && pud_none(*pud))
>>                                 pte = huge_pmd_share(mm, addr, pud);
>>                         else
>>                                 pte = (pte_t *)pmd_alloc(mm, pud, addr);
>>
>> Therefore, we do not call huge_pmd_share unless pud_none(*pud).  The
>> code in question is only executed when !pud_none(*pud).
> 
> My understanding is that the check is needed after we retake page lock
> because we might have raced with other thread. But it's been quite some
> time since I've looked at hugetlb locking and page table sharing code.

That is correct, we could have raced. Duh!

In the case of a race, the other thread would have incremented the
PMD count already.  Your suggestion of decrementing pmd count in
this case seems to be the correct approach.  But, I need to think
about this some more.

-- 
Mike Kravetz