[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+aGjM-7ZmKj2YpfD80a1Bjb4=X1r6efCfNXauec83wfjQ@mail.gmail.com>
Date: Fri, 2 Nov 2018 20:31:07 +0100
From: Dmitry Vyukov <dvyukov@...gle.com>
To: Miklos Szeredi <miklos@...redi.hu>
Cc: linux-fsdevel <linux-fsdevel@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
syzbot <syzbot+bb6d800770577a083f8c@...kaller.appspotmail.com>
Subject: Re: INFO: task hung in fuse_reverse_inval_entry
On Thu, Jul 26, 2018 at 11:12 AM, Miklos Szeredi <miklos@...redi.hu> wrote:
> On Thu, Jul 26, 2018 at 10:44 AM, Miklos Szeredi <miklos@...redi.hu> wrote:
>> On Wed, Jul 25, 2018 at 11:12 AM, Dmitry Vyukov <dvyukov@...gle.com> wrote:
>>> On Tue, Jul 24, 2018 at 5:17 PM, Miklos Szeredi <miklos@...redi.hu> wrote:
>
>>> Maybe more waits in fuse need to be interruptible? E.g. request_wait_answer?
>>
>> That's an interesting aspect. Making request_wait_answer always be
>> killable would help with the issue you raise (killing set of processes
>> taking part in deadlock should resolve deadlock), but it breaks
>> another aspect of the interface.
>>
>> Namely that userspace filesystems expect some serialization from
>> kernel when performing operations. If we allow killing of a process
>> in the middle of an fs operation, then that serialization is no longer
>> there, which can break the server.
>>
>> One solution to that is to duplicate all locking in the server
>> (libfuse normally), but it would not solve the issue for legacy
>> libfuse or legacy non-libfuse servers. It would also be difficult to
>> test. Also it doesn't solve the problem of killing the server, as
>> that alone doesn't resolve the deadlock.
>
> Umm, we can actually do better. Duplicate all vfs locking in the
> fuse kernel implementation: when killing a task that has an
> outstanding request, return immediately (which results in releasing
> the VFS level lock and hence the deadlock) but hold onto our own lock
> until the reply from the userspace server comes back.
>
> Need to think about the details; this might not be easy to do this
> properly. Notably memory management locks (page->lock, mmap_sem,
> etc) are notoriously tricky.
Hi Miklos,
Any updates on this?
syzbot recently found this hang in fuse, which looks real (totally unkillable):
https://syzkaller.appspot.com/bug?id=0d08132d6dac82ae63b7b8d4a9d027d30b46167d
but this one still happens, and it's hard to tell if it's real or not:
https://syzkaller.appspot.com/bug?id=76f8203fef423375d230f14b8f5b45617ab945e2
Powered by blists - more mailing lists