[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <638E01BE-FD45-465C-8464-2E2D96ED6787@dilger.ca>
Date: Sun, 14 Aug 2016 01:20:12 -0600
From: Andreas Dilger <adilger@...ger.ca>
To: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
Cc: Theodore Ts'o <tytso@....edu>,
Andreas Dilger <adilger.kernel@...ger.ca>,
Jan Kara <jack@...e.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Alexander Viro <viro@...iv.linux.org.uk>,
Hugh Dickins <hughd@...gle.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Dave Hansen <dave.hansen@...el.com>,
Vlastimil Babka <vbabka@...e.cz>,
Matthew Wilcox <willy@...radead.org>,
Ross Zwisler <ross.zwisler@...ux.intel.com>,
linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
linux-block@...r.kernel.org
Subject: Re: [PATCHv2, 00/41] ext4: support of huge pages
On Aug 12, 2016, at 12:37 PM, Kirill A. Shutemov <kirill.shutemov@...ux.intel.com> wrote:
>
> Here's stabilized version of my patchset which intended to bring huge pages
> to ext4.
>
> The basics are the same as with tmpfs[1] which is in Linus' tree now and
> ext4 built on top of it. The main difference is that we need to handle
> read out from and write-back to backing storage.
>
> Head page links buffers for whole huge page. Dirty/writeback tracking
> happens on per-hugepage level.
>
> We read out whole huge page at once. It required bumping BIO_MAX_PAGES to
> not less than HPAGE_PMD_NR. I defined BIO_MAX_PAGES to HPAGE_PMD_NR if
> huge pagecache enabled.
>
> On split_huge_page() we need to free buffers before splitting the page.
> Page buffers takes additional pin on the page and can be a vector to mess
> with the page during split. We want to avoid this.
> If try_to_free_buffers() fails, split_huge_page() would return -EBUSY.
>
> Readahead doesn't play with huge pages well: 128k max readahead window,
> assumption on page size, PageReadahead() to track hit/miss. I've got it
> to allocate huge pages, but it doesn't provide any readahead as such.
> I don't know how to do this right. It's not clear at this point if we
> really need readahead with huge pages. I guess it's good enough for now.
Typically read-ahead is a loss if you are able to get large allocations on
disk, since you can get at least seek_rate * chunk_size throughput from the
disks even with random IO at that size. With 1MB allocations and 7200 RPM drives this works out to be about 150MB/s, which is close to the throughput
of these drive already.
Cheers, Andreas
> Shadow entries ignored on allocation -- recently evicted page is not
> promoted to active list. Not sure if current workingset logic is adequate
> for huge pages. On eviction, we split the huge page and setup 4k shadow
> entries as usual.
>
> Unlike tmpfs, ext4 makes use of tags in radix-tree. The approach I used
> for tmpfs -- 512 entries in radix-tree per-hugepages -- doesn't work well
> if we want to have coherent view on tags. So the first 8 patches of the
> patchset converts tmpfs to use multi-order entries in radix-tree.
> The same infrastructure used for ext4.
>
> Encryption doesn't handle huge pages yet. To avoid regressions we just
> disable huge pages for the inode if it has EXT4_INODE_ENCRYPT.
>
> With this version I don't see any xfstests regressions with huge pages enabled.
> Patch with new configurations for xfstests-bld is below.
>
> Tested with 4k, 1k, encryption and bigalloc. All with and without
> huge=always. I think it's reasonable coverage.
>
> The patchset is also in git:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git hugeext4/v2
>
> Please review and consider applying.
>
> [1] http://lkml.kernel.org/r/1465222029-45942-1-git-send-email-kirill.shutemov@linux.intel.com
Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists