lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <718e1653-b273-096b-0ee3-f720cf794612@oracle.com>
Date:   Thu, 25 Jun 2020 14:33:28 -0700
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     kernel test robot <rong.a.chen@...el.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Michal Hocko <mhocko@...nel.org>,
        Hugh Dickins <hughd@...gle.com>,
        Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
        "Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        "Kirill A.Shutemov" <kirill.shutemov@...ux.intel.com>,
        Davidlohr Bueso <dave@...olabs.net>,
        Prakash Sangappa <prakash.sangappa@...cle.com>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org
Subject: Re: [hugetlbfs] c0d0381ade: vm-scalability.throughput -33.4%
 regression

On 6/22/20 3:01 PM, Mike Kravetz wrote:
> On 6/21/20 5:55 PM, kernel test robot wrote:
>> Greeting,
>>
>> FYI, we noticed a -33.4% regression of vm-scalability.throughput due to commit:
>>
>>
>> commit: c0d0381ade79885c04a04c303284b040616b116e ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>
>> in testcase: vm-scalability
>> on test machine: 288 threads Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G memory
>> with following parameters:
>>
>> 	runtime: 300s
>> 	size: 8T
>> 	test: anon-cow-seq-hugetlb
>> 	cpufreq_governor: performance
>> 	ucode: 0x11
>>
> 
> Some performance regression is not surprising as the change includes acquiring
> and holding the i_mmap_rwsem (in read mode) during hugetlb page faults.  33.4%
> seems a bit high.  But, the test is primarily exercising the hugetlb page
> fault path and little else.
> 
> The reason for taking the i_mmap_rwsem is to prevent PMD unsharing from
> invalidating the pmd we are operating on.  This specific test case is operating
> on anonymous private mappings.  So, PMD sharing is not possible and we can
> eliminate acquiring the mutex in this case.  In fact, we should check all
> mappings (even sharable) for the possibly of PMD sharing and only take the
> mutex if necessary.  It will make the code a bit uglier, but will take care
> of some of these regressions.  We still need to take the mutex in the case
> of PMD sharing.  I'm afraid a regression is unavoidable in that case.
> 
> I'll put together a patch.

Not acquiring the mutex on faults when sharing is not possible is quite
straight forward.  We can even use the existing routine vma_shareable()
to easily check.  However, the next patch in the series 87bf91d39bb5
"hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race" depends
on always acquiring the mutex.  If we break this assumption, then the
code to back out hugetlb reservations needs to be written.  A high level
view of what needs to be done is in the commit message for 87bf91d39bb5.

I'm working on the code to back out reservations.
-- 
Mike Kravetz

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ