[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dd849e4f-a30f-e508-d4e9-0f3c6d11c89a@i-love.sakura.ne.jp>
Date: Sat, 16 Jun 2018 04:40:14 +0900
From: Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
To: Dmitry Vyukov <dvyukov@...gle.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Will Deacon <will.deacon@....com>,
syzbot <syzbot+7b2866454055e43c21e5@...kaller.appspotmail.com>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
Al Viro <viro@...iv.linux.org.uk>,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: INFO: task hung in __sb_start_write
On 2018/06/15 18:19, Dmitry Vyukov wrote:
> On Thu, Jun 14, 2018 at 12:33 PM, Tetsuo Handa
> <penguin-kernel@...ove.sakura.ne.jp> wrote:
>> On 2018/06/11 16:39, Dmitry Vyukov wrote:
>>> On Mon, Jun 11, 2018 at 9:30 AM, Peter Zijlstra <peterz@...radead.org> wrote:
>>>> On Sun, Jun 10, 2018 at 11:47:56PM +0900, Tetsuo Handa wrote:
>>>>
>>>>> This looks quite strange that nobody is holding percpu_rw_semaphore for
>>>>> write but everybody is stuck trying to hold it for read. (Since there
>>>>> is no "X locks held by ..." line without followup "#0:" line, there is
>>>>> no possibility that somebody is in TASK_RUNNING state while holding
>>>>> percpu_rw_semaphore for write.)
>>>>>
>>>>> I feel that either API has a bug or API usage is wrong.
>>>>> Any idea for debugging this?
>>>>
>>>> Look at percpu_rwsem_release() and usage. The whole fs freezer thing is
>>>> magic.
>>>
>>> Do you mean that we froze fs? We tried to never-ever issue
>>> ioctl(FIFREEZE) during fuzzing. Are there other ways to do this?
>>>
>>
>> Dmitry, can you try this patch? If you can get
>
> I've tried replying 5 logs with this patch, but I don't see that we
> return to user-space with locks held, nor deadlock reports.
Did you succeed to reproduce khungtaskd messages with this patch?
If yes, was one of sb_writers#X/sb_pagefaults/sb_internal printed there?
If no, we would want a git tree for testing under syzbot.
>
> What I've noticed is that all these logs contain lots of error
> messages around block subsystem. Perhaps if we can identify the common
> denominator across error messages in different logs, we can find the
> one responsible for hangs.
While there are lots of error messages around block subsystem, how can
down_read() fail to continue unless up_write() somehow failed to wake up
waiters sleeping at down_read(), assuming that khungtaskd says that none
of sb_writers#X/sb_pagefaults/sb_internal was held?
Hmm, there might be other locations calling percpu_rwsem_release() ?
Powered by blists - more mailing lists