lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YiYYvAWYgC+PKEx0@casper.infradead.org>
Date:   Mon, 7 Mar 2022 14:37:48 +0000
From:   Matthew Wilcox <willy@...radead.org>
To:     Dave Hansen <dave.hansen@...el.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Jarkko Sakkinen <jarkko@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Nathaniel McCallum <nathaniel@...fian.com>,
        Reinette Chatre <reinette.chatre@...el.com>,
        linux-sgx@...r.kernel.org, jaharkes@...cmu.edu,
        linux-mips@...r.kernel.org, linux-kernel@...r.kernel.org,
        intel-gfx@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org,
        codalist@...emann.coda.cs.cmu.edu, linux-unionfs@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH RFC v2] mm: Add f_ops->populate()

On Sun, Mar 06, 2022 at 03:41:54PM -0800, Dave Hansen wrote:
> In short: page faults stink.  The core kernel has lots of ways of
> avoiding page faults like madvise(MADV_WILLNEED) or mmap(MAP_POPULATE).
>  But, those only work on normal RAM that the core mm manages.
> 
> SGX is weird.  SGX memory is managed outside the core mm.  It doesn't
> have a 'struct page' and get_user_pages() doesn't work on it.  Its VMAs
> are marked with VM_IO.  So, none of the existing methods for avoiding
> page faults work on SGX memory.
> 
> This essentially helps extend existing "normal RAM" kernel ABIs to work
> for avoiding faults for SGX too.  SGX users want to enjoy all of the
> benefits of a delayed allocation policy (better resource use,
> overcommit, NUMA affinity) but without the cost of millions of faults.

We have a mechanism for dynamically reducing the number of page faults
already; it's just buried in the page cache code.  You have vma->vm_file,
which contains a file_ra_state.  You can use this to track where
recent faults have been and grow the size of the region you fault in
per page fault.  You don't have to (indeed probably don't want to) use
the same algorithm as the page cache, but the _principle_ is the same --
were recent speculative faults actually used; should we grow the number
of pages actually faulted in, or is this a random sparse workload where
we want to allocate individual pages.

Don't rely on the user to ask.  They don't know.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ