linux-kernel - Re: INFO: task hung in __sb_start

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <dd849e4f-a30f-e508-d4e9-0f3c6d11c89a@i-love.sakura.ne.jp>
Date:   Sat, 16 Jun 2018 04:40:14 +0900
From:   Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
To:     Dmitry Vyukov <dvyukov@...gle.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Will Deacon <will.deacon@....com>,
        syzbot <syzbot+7b2866454055e43c21e5@...kaller.appspotmail.com>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        Al Viro <viro@...iv.linux.org.uk>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: INFO: task hung in __sb_start_write

On 2018/06/15 18:19, Dmitry Vyukov wrote:
> On Thu, Jun 14, 2018 at 12:33 PM, Tetsuo Handa
> <penguin-kernel@...ove.sakura.ne.jp> wrote:
>> On 2018/06/11 16:39, Dmitry Vyukov wrote:
>>> On Mon, Jun 11, 2018 at 9:30 AM, Peter Zijlstra <peterz@...radead.org> wrote:
>>>> On Sun, Jun 10, 2018 at 11:47:56PM +0900, Tetsuo Handa wrote:
>>>>
>>>>> This looks quite strange that nobody is holding percpu_rw_semaphore for
>>>>> write but everybody is stuck trying to hold it for read. (Since there
>>>>> is no "X locks held by ..." line without followup "#0:" line, there is
>>>>> no possibility that somebody is in TASK_RUNNING state while holding
>>>>> percpu_rw_semaphore for write.)
>>>>>
>>>>> I feel that either API has a bug or API usage is wrong.
>>>>> Any idea for debugging this?
>>>>
>>>> Look at percpu_rwsem_release() and usage. The whole fs freezer thing is
>>>> magic.
>>>
>>> Do you mean that we froze fs? We tried to never-ever issue
>>> ioctl(FIFREEZE) during fuzzing. Are there other ways to do this?
>>>
>>
>> Dmitry, can you try this patch? If you can get
> 
> I've tried replying 5 logs with this patch, but I don't see that we
> return to user-space with locks held, nor deadlock reports.

Did you succeed to reproduce khungtaskd messages with this patch?
If yes, was one of sb_writers#X/sb_pagefaults/sb_internal printed there?
If no, we would want a git tree for testing under syzbot.

> 
> What I've noticed is that all these logs contain lots of error
> messages around block subsystem. Perhaps if we can identify the common
> denominator across error messages in different logs, we can find the
> one responsible for hangs.

While there are lots of error messages around block subsystem, how can
down_read() fail to continue unless up_write() somehow failed to wake up
waiters sleeping at down_read(), assuming that khungtaskd says that none
of sb_writers#X/sb_pagefaults/sb_internal was held?

Hmm, there might be other locations calling percpu_rwsem_release() ?