linux-kernel - Re: INFO: task hung in fuse_reverse_inval

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACT4Y+ZZ_v6izsiCGym=r+MiRmxaFE=eH54dz7t0BOQ7VxTNiw@mail.gmail.com>
Date:   Wed, 25 Jul 2018 11:12:00 +0200
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Miklos Szeredi <miklos@...redi.hu>
Cc:     linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        syzbot <syzbot+bb6d800770577a083f8c@...kaller.appspotmail.com>
Subject: Re: INFO: task hung in fuse_reverse_inval_entry

On Tue, Jul 24, 2018 at 5:17 PM, Miklos Szeredi <miklos@...redi.hu> wrote:
>>>>> Biggest conceptual problem: your definition of fuse-server is weak.
>>>>> Take the following example: process A is holding the fuse device fd
>>>>> and is forwarding requests and replies to/from process B via a pipe.
>>>>> So basically A is just a proxy that does nothing interesting, the
>>>>> "real" server is B.  But according to your definition B is not a
>>>>> server, only A is.
>>>>
>>>> I proposed to abort fuse conn when all fuse device fd's are "killed"
>>>> (all processes having the fd opened are killed). So if _only_ process
>>>> B is killed, then, yes, it will still hang. However if A is killed or
>>>> both A and B (say, process group, everything inside of pid namespace,
>>>> etc) then the deadlock will be autoresolved without human
>>>> intervention.
>>>
>>> Okay, so you're saying:
>>>
>>> 1) when process gets SIGKILL and is uninterruptible sleep mark process as doomed
>>> 2) for a particular fuse instance find set of fuse device fd
>>> references that are in non-doomed tasks; if there are none then abort
>>> fuse instance
>>>
>>> Right?
>>
>>
>> Yes, something like this.
>> Perhaps checking for "uninterruptible sleep" is excessive. If it has
>> SIGKILL pending it's pretty much doomed already. This info should be
>> already available for tasks.
>> Not saying that it's better, but what I described was the other way
>> around: when a task killed it drops a reference to all opened fuse
>> fds, when the last fd is dropped, the connection can be aborted.
>
> struct task_struct {
> [...]
>     struct files_struct        *files;
> [...]
> };
>
> struct files_struct {
> [...]
>     struct fdtable __rcu *fdt;
> [...]
> };
>
> struct fdtable {
> [...]
>     struct file __rcu **fd;      /* current fd array */
> [...]
> };
>
> So there we have an array of pointers to struct files.  Suppose we'd
> magically be able to find files that point to fuse devices upon
> receiving SIGKILL, what would we do with them?  We can't close them:
> other tasks might still be pointing to the same files_struct.
>
> We could do a global search for non-doomed tasks referencing the same
> fuse device, but I have no clue how we'd go about doing that without
> racing with forks, fd sending, etc...


Good questions for which I don't have answers.

Maybe more waits in fuse need to be interruptible? E.g. request_wait_answer?