[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200521224906.GU2005@dread.disaster.area>
Date: Fri, 22 May 2020 08:49:06 +1000
From: Dave Chinner <david@...morbit.com>
To: Matthew Wilcox <willy@...radead.org>
Cc: linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 00/36] Large pages in the page cache
On Fri, May 15, 2020 at 06:16:20AM -0700, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@...radead.org>
>
> This patch set does not pass xfstests. Test at your own risk. It is
> based on the readahead rewrite which is in Andrew's tree. I've fixed a
> lot of issues in the last two weeks, but generic/013 will still crash it.
>
> The primary idea here is that a large part of the overhead in dealing
> with individual pages is that there's just so darned many of them.
> We would be better off dealing with fewer, larger pages, even if they
> don't get to be the size necessary for the CPU to use a larger TLB entry.
Ok, so the main issue I have with the filesystem/iomap side of
things is that it appears to be adding "transparent huge page"
awareness to the filesysetm code, not "large page support".
For people that aren't aware of the difference between the
transparent huge and and a normal compound page (e.g. I have no idea
what the difference is), this is likely to cause problems,
especially as you haven't explained at all in this description why
transparent huge pages are being used rather than bog standard
compound pages.
And, really, why should iomap or the filesystems care if the large
page is a THP or just a high order compound page? The interface
for operating on these things at the page cache level should be the
same. We already have page_size() and friends for operating on
high order compound pages, yet the iomap stuff has this new
thp_size() function instead of just using page_size(). THis is going
to lead to confusion and future bugs when people who don't know the
difference use the wrong page size function in their filesystem
code.
So, really, the "large page" API presented to the filesystems via
the page cache needs to be unified. Having to use compound_*() in
some places, thp_* in others, then page_* and Page*, not to mention
hpage_* just so that we can correctly support "large pages" is a
total non-starter.
Hence I'd suggest that this patch set needs to start by "hiding" all
the differences between different types of pages behind a unified,
consistent API, then it can introduce large page support into code
outside the mm/ infrastructure via that unified API. I don't care
what that API looks like so long as it is clear, consistenti, well
documented and means filesystem developers don't need to know
anything about how the page (large or not) is managed by the mm
subsystem.
Cheers,
Dave.
--
Dave Chinner
david@...morbit.com
Powered by blists - more mailing lists