linux-kernel - Re: INFO: task hung in fuse_reverse_inval

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACT4Y+aGjM-7ZmKj2YpfD80a1Bjb4=X1r6efCfNXauec83wfjQ@mail.gmail.com>
Date:   Fri, 2 Nov 2018 20:31:07 +0100
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Miklos Szeredi <miklos@...redi.hu>
Cc:     linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        syzbot <syzbot+bb6d800770577a083f8c@...kaller.appspotmail.com>
Subject: Re: INFO: task hung in fuse_reverse_inval_entry

On Thu, Jul 26, 2018 at 11:12 AM, Miklos Szeredi <miklos@...redi.hu> wrote:
> On Thu, Jul 26, 2018 at 10:44 AM, Miklos Szeredi <miklos@...redi.hu> wrote:
>> On Wed, Jul 25, 2018 at 11:12 AM, Dmitry Vyukov <dvyukov@...gle.com> wrote:
>>> On Tue, Jul 24, 2018 at 5:17 PM, Miklos Szeredi <miklos@...redi.hu> wrote:
>
>>> Maybe more waits in fuse need to be interruptible? E.g. request_wait_answer?
>>
>> That's an interesting aspect.  Making request_wait_answer always be
>> killable would help with the issue you raise (killing set of processes
>> taking part in deadlock should resolve deadlock), but it breaks
>> another aspect of the interface.
>>
>> Namely that userspace filesystems expect some serialization from
>> kernel when performing operations.  If we allow killing of a process
>> in the middle of an fs operation, then that serialization is no longer
>> there, which can break the server.
>>
>> One solution to that is to duplicate all locking in the server
>> (libfuse normally), but it would not solve the issue for legacy
>> libfuse or legacy non-libfuse servers.  It would also be difficult to
>> test.  Also it doesn't solve the problem of killing the server, as
>> that alone doesn't resolve the deadlock.
>
> Umm, we can actually do better.   Duplicate all vfs locking in the
> fuse kernel implementation: when killing a task that has an
> outstanding request, return immediately (which results in releasing
> the VFS level lock and hence the deadlock) but hold onto our own lock
> until the reply from the userspace server comes back.
>
> Need to think about the details; this might not be easy to do this
> properly.   Notably memory management locks (page->lock, mmap_sem,
> etc) are notoriously tricky.

Hi Miklos,

Any updates on this?

syzbot recently found this hang in fuse, which looks real (totally unkillable):
https://syzkaller.appspot.com/bug?id=0d08132d6dac82ae63b7b8d4a9d027d30b46167d

but this one still happens, and it's hard to tell if it's real or not:
https://syzkaller.appspot.com/bug?id=76f8203fef423375d230f14b8f5b45617ab945e2