[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJfpegv-ePQE9pNwZe6O+0LjJdq2aVk3bnhxeZ=y7P+iFq72XA@mail.gmail.com>
Date: Mon, 6 Feb 2017 15:18:42 +0100
From: Miklos Szeredi <miklos@...redi.hu>
To: Al Viro <viro@...iv.linux.org.uk>
Cc: linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
Linux NFS list <linux-nfs@...r.kernel.org>,
ceph-devel@...r.kernel.org, lustre-devel@...ts.lustre.org,
v9fs-developer@...ts.sourceforge.net,
Linus Torvalds <torvalds@...ux-foundation.org>,
Jan Kara <jack@...e.cz>,
Chris Wilson <chris@...is-wilson.co.uk>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Jeff Layton <jlayton@...hat.com>
Subject: Re: [PATCH v3 0/2] iov_iter: allow iov_iter_get_pages_alloc to
allocate more pages per call
On Mon, Feb 6, 2017 at 10:57 AM, Al Viro <viro@...iv.linux.org.uk> wrote:
> On Mon, Feb 06, 2017 at 10:08:06AM +0100, Miklos Szeredi wrote:
>
>> Yes, I think only page lock can be used to deadlock inside
>> fuse_dev_read/write(). So requests that don't have locked pages
>> should be okay with just waiting until copy_to/from_user() finishes
>> and only then proceeding with the abort.
>
> Actually, looking at that some more, this might be not true. Anything
> that takes ->mmap_sem exclusive and *not* killable makes for another
> source of deadlock.
>
> Initial page fault takes ->mmap_sem shared. OK, request sent to
> server and server tries to read() it. In the meanwhile, something
> has closed userfaultfd for the same mm_struct. We have userfaultfd_release()
> block on attempt to take ->mmap_sem exclusive and from now on any attempt
> to grab ->mmap_sem shared will deadlock. And get_user_pages(), as well
> as copy_to_user(), etc. can end up doing just that. It doesn't have to
> be an mmap of the same file, BTW - any page fault would do.
>
> All you really need is to have server sharing address space with the
> process that steps into original page fault, plus an evicted page
> of any nature (anon mmap, whatever) being used as a destination of
> read() in server.
>
> down_read() inside down_read() is fine, unless there had been down_write()
> in between. And there are unkillable down_write() on ->mmap_sem -
> userfaultfd_release() being one example of such. Many of those can and
> probably should become down_write_killable(), but this one can't - there
> might be nothing to deliver the signal to, if the final close() happens
> e.g. from exit(2).
>
> Warning: the above might be completely bogus - I'm on way too large
> uptime at the moment and most of the last day had been spent digging
> through various convoluted code, so take the above with a cartload of
> salt. _If_ it's true, that kind of deadlock won't be possible to
> break with killing anything or doing umount -f, though.
It's not bogus, the deadlock is there.
But I think it's breakable in the same way: if the deadlocked request
is aborted, the fault will release the page lock as well as mmap_sem,
and from there things will resolve themselves.
But you are definitely right about needing to clean up that mess in
fuse/dev.c and doing so by fixing up the arg refcounting for just the
read and write requests is going to be a lot simpler than having to do
that for all of them (which was my original plan).
So, I'll have a go at that sometime.
Thanks,
Miklos
Powered by blists - more mailing lists