linux-kernel - Re: [PATCHv6 11/37] HACK: readahead: alloc huge pages, if allowed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170210145158.GA2267@bombadil.infradead.org>
Date:   Fri, 10 Feb 2017 06:51:58 -0800
From:   Matthew Wilcox <willy@...radead.org>
To:     Andreas Dilger <adilger@...ger.ca>
Cc:     "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Theodore Ts'o <tytso@....edu>,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        Jan Kara <jack@...e.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Hugh Dickins <hughd@...gle.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Ross Zwisler <ross.zwisler@...ux.intel.com>,
        linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        linux-block@...r.kernel.org
Subject: Re: [PATCHv6 11/37] HACK: readahead: alloc huge pages, if allowed

On Thu, Feb 09, 2017 at 05:23:31PM -0700, Andreas Dilger wrote:
> On Feb 9, 2017, at 4:34 PM, Matthew Wilcox <willy@...radead.org> wrote:
> > Well ... what if we made readahead 2 hugepages in size for inodes which
> > are using huge pages?  That's only 8x our current readahead window, and
> > if you're asking for hugepages, you're accepting that IOs are going to
> > be larger, and you probably have the kind of storage system which can
> > handle doing larger IOs.
> 
> It would be nice if the bdi had a parameter for the maximum readahead size.
> Currently, readahead is capped at 2MB chunks by force_page_cache_readahead()
> even if bdi->ra_pages and bdi->io_pages are much larger.
> 
> It should be up to the filesystem to decide how large the readahead chunks
> are rather than imposing some policy in the MM code.  For high-speed (network)
> storage access it is better to have at least 4MB read chunks, for RAID storage
> it is desirable to have stripe-aligned readahead to avoid read inflation when
> verifying the parity.  Any fixed size will eventually be inadequate as disks
> and filesystems change, so it may as well be a per-bdi tunable that can be set
> by the filesystem as needed, or possibly with a mount option if needed.

I think the filesystem should provide a hint, but ultimately it needs to
be up to the MM to decide how far to readahead.  The filesystem doesn't
(and shouldn't) have the global view into how much memory is available
for readahead, nor should it be tracking how well this app is being
served by readahead.

That 2MB chunk restriction is allegedly there "so that we don't pin too
much memory at once".  Maybe that should be scaled with the amount of
memory in the system (pinning 2MB of a 256MB system is a bit different
from pinning 2MB of a 1TB memory system).