[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160306230336.GE11282@dastard>
Date: Mon, 7 Mar 2016 10:03:36 +1100
From: Dave Chinner <david@...morbit.com>
To: "Kirill A. Shutemov" <kirill@...temov.name>
Cc: Hugh Dickins <hughd@...gle.com>,
Dave Hansen <dave.hansen@...el.com>,
linux-fsdevel@...r.kernel.org, linux-api@...r.kernel.org,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Vlastimil Babka <vbabka@...e.cz>,
Christoph Lameter <cl@...two.org>,
Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
Jerome Marchand <jmarchan@...hat.com>,
Yang Shi <yang.shi@...aro.org>,
Sasha Levin <sasha.levin@...cle.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: THP-enabled filesystem vs. FALLOC_FL_PUNCH_HOLE
On Sun, Mar 06, 2016 at 03:30:34AM +0300, Kirill A. Shutemov wrote:
> On Sun, Mar 06, 2016 at 09:38:11AM +1100, Dave Chinner wrote:
> > On Sat, Mar 05, 2016 at 02:24:12AM +0300, Kirill A. Shutemov wrote:
> > > Would it be acceptable for fallocate(FALLOC_FL_PUNCH_HOLE) to return
> > > -EBUSY (or other errno on your choice), if we cannot split the page
> > > right away?
> >
> > Which means THP are not transparent any more. What does an
> > application do when it gets an EBUSY, anyway?
>
> I guess it's reasonable to expect from an application to handle EOPNOTSUPP
> as FALLOC_FL_PUNCH_HOLE is not supported by some filesystems.
Yes, but this is usually done as a check at the program
initialisation to determine whether to issue hole punches at all.
It's not suppose to be a dynamic error.
> Although, non-consistent result from the same fd can be confusing.
Exactly.
> > And it's not just hole punching that has this problem. Direct IO is
> > going to have the same issue with invalidation of the mapped ranges
> > over the IO being done. XFS already WARNs when page cache
> > invalidation fails with EBUSY in direct IO, because that is
> > indicative of an application with a potential data corruption vector
> > and there's nothing we can do in the kernel code to prevent it.
>
> My current understanding is that for filesystems with persistent storage,
> in order to make THP any useful, we would need to implement writeback
> without splitting the huge page.
Algorithmically it is no different to filesytem block size < page
size writeback.
> At the moment, I have no idea how hard it would be..
THP support would effectively require us to remove PAGE_CACHE_SIZE
assumptions from all of the filesystem and buffer code. That's a
large chunk of work e.g. fs/buffer.c and any filesystem that uses
bufferheads for tracking filesystem block state through the page
cache.
Cheers,
Dave.
--
Dave Chinner
david@...morbit.com
Powered by blists - more mailing lists