linux-kernel - Re: [PATCH 5/5] hugetlbfs: Limit wait time when trying to share huge PMD

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <19d9ea18-bd20-e02f-c1de-70e7322f5f22@redhat.com>
Date:   Wed, 11 Sep 2019 16:44:32 +0100
From:   Waiman Long <longman@...hat.com>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Will Deacon <will.deacon@....com>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-mm@...ck.org, Davidlohr Bueso <dave@...olabs.net>
Subject: Re: [PATCH 5/5] hugetlbfs: Limit wait time when trying to share huge
 PMD

On 9/11/19 4:14 PM, Matthew Wilcox wrote:
> On Wed, Sep 11, 2019 at 04:05:37PM +0100, Waiman Long wrote:
>> When allocating a large amount of static hugepages (~500-1500GB) on a
>> system with large number of CPUs (4, 8 or even 16 sockets), performance
>> degradation (random multi-second delays) was observed when thousands
>> of processes are trying to fault in the data into the huge pages. The
>> likelihood of the delay increases with the number of sockets and hence
>> the CPUs a system has.  This only happens in the initial setup phase
>> and will be gone after all the necessary data are faulted in.
> Can;t the application just specify MAP_POPULATE?

Originally, I thought that this happened in the startup phase when the
pages were faulted in. The problem persists after steady state had been
reached though. Every time you have a new user process created, it will
have its own page table. It is the sharing of the of huge page shared
memory that is causing problem. Of course, it depends on how the
application is written.

Anyway, MAP_POPULATE will not be useful in this case.

Thanks,
Longman