linux-kernel - Re: [PATCH v3 0/2] iov_iter: allow iov_iter_get_pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Mon, 6 Feb 2017 15:18:42 +0100
From:   Miklos Szeredi <miklos@...redi.hu>
To:     Al Viro <viro@...iv.linux.org.uk>
Cc:     linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        Linux NFS list <linux-nfs@...r.kernel.org>,
        ceph-devel@...r.kernel.org, lustre-devel@...ts.lustre.org,
        v9fs-developer@...ts.sourceforge.net,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Jan Kara <jack@...e.cz>,
        Chris Wilson <chris@...is-wilson.co.uk>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Jeff Layton <jlayton@...hat.com>
Subject: Re: [PATCH v3 0/2] iov_iter: allow iov_iter_get_pages_alloc to
 allocate more pages per call

On Mon, Feb 6, 2017 at 10:57 AM, Al Viro <viro@...iv.linux.org.uk> wrote:
> On Mon, Feb 06, 2017 at 10:08:06AM +0100, Miklos Szeredi wrote:
>
>> Yes, I think only page lock can be used to deadlock inside
>> fuse_dev_read/write().  So requests that don't have locked pages
>> should be okay  with just waiting until copy_to/from_user() finishes
>> and only then proceeding with the abort.
>
> Actually, looking at that some more, this might be not true.  Anything
> that takes ->mmap_sem exclusive and *not* killable makes for another
> source of deadlock.
>
> Initial page fault takes ->mmap_sem shared.  OK, request sent to
> server and server tries to read() it.  In the meanwhile, something
> has closed userfaultfd for the same mm_struct.  We have userfaultfd_release()
> block on attempt to take ->mmap_sem exclusive and from now on any attempt
> to grab ->mmap_sem shared will deadlock.  And get_user_pages(), as well
> as copy_to_user(), etc. can end up doing just that.  It doesn't have to
> be an mmap of the same file, BTW - any page fault would do.
>
> All you really need is to have server sharing address space with the
> process that steps into original page fault, plus an evicted page
> of any nature (anon mmap, whatever) being used as a destination of
> read() in server.
>
> down_read() inside down_read() is fine, unless there had been down_write()
> in between.  And there are unkillable down_write() on ->mmap_sem -
> userfaultfd_release() being one example of such.  Many of those can and
> probably should become down_write_killable(), but this one can't - there
> might be nothing to deliver the signal to, if the final close() happens
> e.g. from exit(2).
>
> Warning: the above might be completely bogus - I'm on way too large
> uptime at the moment and most of the last day had been spent digging
> through various convoluted code, so take the above with a cartload of
> salt.  _If_ it's true, that kind of deadlock won't be possible to
> break with killing anything or doing umount -f, though.

It's not bogus, the deadlock is there.

But I think it's breakable in the same way: if the deadlocked request
is aborted, the fault will release the page lock as well as mmap_sem,
and from there things will resolve themselves.

But you are definitely right about needing to clean up that mess in
fuse/dev.c and doing so by fixing up the arg refcounting for just the
read and write requests is going to be a lot simpler than having to do
that for all of them (which was my original plan).

So, I'll have a go at that sometime.

Thanks,
Miklos