[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190819063412.GA20455@quack2.suse.cz>
Date: Mon, 19 Aug 2019 08:34:12 +0200
From: Jan Kara <jack@...e.cz>
To: Dave Chinner <david@...morbit.com>
Cc: Ira Weiny <ira.weiny@...el.com>, Jan Kara <jack@...e.cz>,
Andrew Morton <akpm@...ux-foundation.org>,
Jason Gunthorpe <jgg@...pe.ca>,
Dan Williams <dan.j.williams@...el.com>,
Matthew Wilcox <willy@...radead.org>,
Theodore Ts'o <tytso@....edu>,
John Hubbard <jhubbard@...dia.com>,
Michal Hocko <mhocko@...e.com>, linux-xfs@...r.kernel.org,
linux-rdma@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-nvdimm@...ts.01.org,
linux-ext4@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [RFC PATCH v2 00/19] RDMA/FS DAX truncate proposal V1,000,002 ;-)
On Sat 17-08-19 12:26:03, Dave Chinner wrote:
> On Fri, Aug 16, 2019 at 12:05:28PM -0700, Ira Weiny wrote:
> > On Thu, Aug 15, 2019 at 03:05:58PM +0200, Jan Kara wrote:
> > > On Wed 14-08-19 11:08:49, Ira Weiny wrote:
> > > > On Wed, Aug 14, 2019 at 12:17:14PM +0200, Jan Kara wrote:
> > 2) Second reason is that I thought I did not have a good way to tell if the
> > lease was actually in use. What I mean is that letting the lease go should
> > be ok IFF we don't have any pins... I was thinking that without John's code
> > we don't have a way to know if there are any pins... But that is wrong...
> > All we have to do is check
> >
> > !list_empty(file->file_pins)
> >
> > So now with this detail I think you are right, we should be able to hold the
> > lease through the struct file even if the process no longer has any
> > "references" to it (ie closes and munmaps the file).
>
> I really, really dislike the idea of zombie layout leases. It's a
> nasty hack for poor application behaviour. This is a "we allow use
> after layout lease release" API, and I think encoding largely
> untraceable zombie objects into an API is very poor design.
>
> From the fcntl man page:
>
> LEASES
> Leases are associated with an open file description (see
> open(2)). This means that duplicate file descriptors
> (created by, for example, fork(2) or dup(2)) re‐ fer to
> the same lease, and this lease may be modified or
> released using any of these descriptors. Furthermore, the
> lease is released by either an explicit F_UNLCK operation on
> any of these duplicate file descriptors, or when all such
> file descriptors have been closed.
>
> Leases are associated with *open* file descriptors, not the
> lifetime of the struct file in the kernel. If the application closes
> the open fds that refer to the lease, then the kernel does not
> guarantee, and the application has no right to expect, that the
> lease remains active in any way once the application closes all
> direct references to the lease.
>
> IOWs, applications using layout leases need to hold the lease fd
> open for as long as the want access to the physical file layout. It
> is a also a requirement of the layout lease that the holder releases
> the resources it holds on the layout before it releases the layout
> lease, exclusive lease or not. Closing the fd indicates they do not
> need access to the file any more, and so the lease should be
> reclaimed at that point.
>
> I'm of a mind to make the last close() on a file block if there's an
> active layout lease to prevent processes from zombie-ing layout
> leases like this. i.e. you can't close the fd until resources that
> pin the lease have been released.
Yeah, so this was my initial though as well [1]. But as the discussion in
that thread revealed, the problem with blocking last close is that kernel
does not really expect close to block. You could easily deadlock e.g. if
the process gets SIGKILL, file with lease has fd 10, and the RDMA context
holding pages pinned has fd 15. Or you could wait for another process to
release page pins and blocking SIGKILL on that is also bad. So in the end
the least bad solution we've come up with were these "zombie" leases as you
call them and tracking them in /proc so that userspace at least has a way
of seeing them. But if you can come up with a different solution, I'm
certainly not attached to the current one...
Honza
[1] https://lore.kernel.org/lkml/20190606104203.GF7433@quack2.suse.cz
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists