linux-kernel - Re: [PATCHv6 11/37] HACK: readahead: alloc huge pages, if allowed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Thu, 9 Feb 2017 17:23:31 -0700
From:   Andreas Dilger <adilger@...ger.ca>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Theodore Ts'o <tytso@....edu>,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        Jan Kara <jack@...e.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Hugh Dickins <hughd@...gle.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Ross Zwisler <ross.zwisler@...ux.intel.com>,
        linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        linux-block@...r.kernel.org
Subject: Re: [PATCHv6 11/37] HACK: readahead: alloc huge pages, if allowed

On Feb 9, 2017, at 4:34 PM, Matthew Wilcox <willy@...radead.org> wrote:
> 
> On Thu, Jan 26, 2017 at 02:57:53PM +0300, Kirill A. Shutemov wrote:
>> Most page cache allocation happens via readahead (sync or async), so if
>> we want to have significant number of huge pages in page cache we need
>> to find a ways to allocate them from readahead.
>> 
>> Unfortunately, huge pages doesn't fit into current readahead design:
>> 128 max readahead window, assumption on page size, PageReadahead() to
>> track hit/miss.
>> 
>> I haven't found a ways to get it right yet.
>> 
>> This patch just allocates huge page if allowed, but doesn't really
>> provide any readahead if huge page is allocated. We read out 2M a time
>> and I would expect spikes in latancy without readahead.
>> 
>> Therefore HACK.
>> 
>> Having that said, I don't think it should prevent huge page support to
>> be applied. Future will show if lacking readahead is a big deal with
>> huge pages in page cache.
>> 
>> Any suggestions are welcome.
> 
> Well ... what if we made readahead 2 hugepages in size for inodes which
> are using huge pages?  That's only 8x our current readahead window, and
> if you're asking for hugepages, you're accepting that IOs are going to
> be larger, and you probably have the kind of storage system which can
> handle doing larger IOs.

It would be nice if the bdi had a parameter for the maximum readahead size.
Currently, readahead is capped at 2MB chunks by force_page_cache_readahead()
even if bdi->ra_pages and bdi->io_pages are much larger.

It should be up to the filesystem to decide how large the readahead chunks
are rather than imposing some policy in the MM code.  For high-speed (network)
storage access it is better to have at least 4MB read chunks, for RAID storage
it is desirable to have stripe-aligned readahead to avoid read inflation when
verifying the parity.  Any fixed size will eventually be inadequate as disks
and filesystems change, so it may as well be a per-bdi tunable that can be set
by the filesystem as needed, or possibly with a mount option if needed.


Cheers, Andreas






Download attachment "signature.asc" of type "application/pgp-signature" (196 bytes)