[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <7D35EB8E-29F8-41DA-BB46-8BCF7B6C5A72@dilger.ca>
Date: Thu, 9 Feb 2017 17:23:31 -0700
From: Andreas Dilger <adilger@...ger.ca>
To: Matthew Wilcox <willy@...radead.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Theodore Ts'o <tytso@....edu>,
Andreas Dilger <adilger.kernel@...ger.ca>,
Jan Kara <jack@...e.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Alexander Viro <viro@...iv.linux.org.uk>,
Hugh Dickins <hughd@...gle.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Dave Hansen <dave.hansen@...el.com>,
Vlastimil Babka <vbabka@...e.cz>,
Ross Zwisler <ross.zwisler@...ux.intel.com>,
linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
linux-block@...r.kernel.org
Subject: Re: [PATCHv6 11/37] HACK: readahead: alloc huge pages, if allowed
On Feb 9, 2017, at 4:34 PM, Matthew Wilcox <willy@...radead.org> wrote:
>
> On Thu, Jan 26, 2017 at 02:57:53PM +0300, Kirill A. Shutemov wrote:
>> Most page cache allocation happens via readahead (sync or async), so if
>> we want to have significant number of huge pages in page cache we need
>> to find a ways to allocate them from readahead.
>>
>> Unfortunately, huge pages doesn't fit into current readahead design:
>> 128 max readahead window, assumption on page size, PageReadahead() to
>> track hit/miss.
>>
>> I haven't found a ways to get it right yet.
>>
>> This patch just allocates huge page if allowed, but doesn't really
>> provide any readahead if huge page is allocated. We read out 2M a time
>> and I would expect spikes in latancy without readahead.
>>
>> Therefore HACK.
>>
>> Having that said, I don't think it should prevent huge page support to
>> be applied. Future will show if lacking readahead is a big deal with
>> huge pages in page cache.
>>
>> Any suggestions are welcome.
>
> Well ... what if we made readahead 2 hugepages in size for inodes which
> are using huge pages? That's only 8x our current readahead window, and
> if you're asking for hugepages, you're accepting that IOs are going to
> be larger, and you probably have the kind of storage system which can
> handle doing larger IOs.
It would be nice if the bdi had a parameter for the maximum readahead size.
Currently, readahead is capped at 2MB chunks by force_page_cache_readahead()
even if bdi->ra_pages and bdi->io_pages are much larger.
It should be up to the filesystem to decide how large the readahead chunks
are rather than imposing some policy in the MM code. For high-speed (network)
storage access it is better to have at least 4MB read chunks, for RAID storage
it is desirable to have stripe-aligned readahead to avoid read inflation when
verifying the parity. Any fixed size will eventually be inadequate as disks
and filesystems change, so it may as well be a per-bdi tunable that can be set
by the filesystem as needed, or possibly with a mount option if needed.
Cheers, Andreas
Download attachment "signature.asc" of type "application/pgp-signature" (196 bytes)
Powered by blists - more mailing lists