[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPcyv4g2r=L3jfSDoRPt4VG7D_2CxCgv3s+JLu4FQRUSRWg+4Q@mail.gmail.com>
Date: Wed, 6 Feb 2019 15:30:27 -0800
From: Dan Williams <dan.j.williams@...el.com>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: Doug Ledford <dledford@...hat.com>,
Dave Chinner <david@...morbit.com>,
Christopher Lameter <cl@...ux.com>,
Matthew Wilcox <willy@...radead.org>, Jan Kara <jack@...e.cz>,
Ira Weiny <ira.weiny@...el.com>,
lsf-pc@...ts.linux-foundation.org,
linux-rdma <linux-rdma@...r.kernel.org>,
Linux MM <linux-mm@...ck.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
John Hubbard <jhubbard@...dia.com>,
Jerome Glisse <jglisse@...hat.com>,
Michal Hocko <mhocko@...nel.org>
Subject: Re: [LSF/MM TOPIC] Discuss least bad options for resolving
longterm-GUP usage by RDMA
On Wed, Feb 6, 2019 at 3:21 PM Jason Gunthorpe <jgg@...pe.ca> wrote:
>
> On Wed, Feb 06, 2019 at 02:44:45PM -0800, Dan Williams wrote:
>
> > > Do they need to stick with xfs?
> >
> > Can you clarify the motivation for that question? This problem exists
> > for any filesystem that implements an mmap that where the physical
> > page backing the mapping is identical to the physical storage location
> > for the file data.
>
> .. and needs to dynamicaly change that mapping. Which is not really
> something inherent to the general idea of a filesystem. A file system
> that had *strictly static* block assignments would work fine.
>
> Not all filesystem even implement hole punch.
>
> Not all filesystem implement reflink.
>
> ftruncate doesn't *have* to instantly return the free blocks to
> allocation pool.
>
> ie this is not a DAX & RDMA issue but a XFS & RDMA issue.
>
> Replacing XFS is probably not be reasonable, but I wonder if a XFS--
> operating mode could exist that had enough features removed to be
> safe?
You're describing the current situation, i.e. Linux already implements
this, it's called Device-DAX and some users of RDMA find it
insufficient. The choices are to continue to tell them "no", or say
"yes, but you need to submit to lease coordination".
> Ie turn off REFLINK. Change the semantic of ftruncate to be more like
> ETXTBUSY. Turn off hole punch.
>
> > > Are they really trying to do COW backed mappings for the RDMA
> > > targets? Or do they want a COW backed FS but are perfectly happy
> > > if the specific RDMA targets are *not* COW and are statically
> > > allocated?
> >
> > I would expect the COW to be broken at registration time. Only ODP
> > could possibly support reflink + RDMA. So I think this devolves the
> > problem back to just the "what to do about truncate/punch-hole"
> > problem in the specific case of non-ODP hardware combined with the
> > Filesystem-DAX facility.
>
> Usually the problem with COW is that you make a READ RDMA MR and on a
> COW'd file, and some other thread breaks the COW..
>
> This probably becomes a problem if the same process that has the MR
> triggers a COW break (ie by writing to the CPU mmap). This would cause
> the page to be reassigned but the MR would not be updated, which is
> not what the app expects.
>
> WRITE is simpler, once the COW is broken during GUP, the pages cannot
> be COW'd again until the DMA pin is released. So new reflinks would be
> blocked during the DMA pin period.
>
> To fix READ you'd have to treat it like WRITE and break the COW at GPU.
Right, that's what I'm proposing that any longterm-GUP break COW as if
it were a write.
Powered by blists - more mailing lists