lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+ZJTHfFOcbReD_nOYxvxaYcD2Sa=VToZL0RWYw1TemrPA@mail.gmail.com>
Date:   Tue, 19 Jun 2018 13:53:31 +0200
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Cc:     syzbot <syzbot+10007d66ca02b08f0e60@...kaller.appspotmail.com>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Al Viro <viro@...iv.linux.org.uk>
Subject: Re: INFO: task hung in __get_super

On Tue, Jun 19, 2018 at 1:44 PM, Tetsuo Handa
<penguin-kernel@...ove.sakura.ne.jp> wrote:
> This bug report is getting no feedback, but I guess that this bug is in
> block or mm or locking layer rather than fs layer.
>
> NMI backtrace for this bug tends to report that sb_bread() from fill_super()
>  from mount_bdev() is stalling is the cause of keep holding s_umount_key for
> more than 120 seconds. What is strange is that NMI backtrace for this bug tends
> to point at rcu_read_lock()/pagecache_get_page()/radix_tree_deref_slot()/
> rcu_read_unlock() which is expected not to stall.
>
> Since CONFIG_RCU_CPU_STALL_TIMEOUT is set to 120 (and actually +5 due to
> CONFIG_PROVE_RCU=y) which is longer than CONFIG_DEFAULT_HUNG_TASK_TIMEOUT,
> maybe setting CONFIG_RCU_CPU_STALL_TIMEOUT to smaller values (e.g. 25) can
> give us some hints...

If an rcu stall is the true root cause of this, then I guess would see
"rcu stall" bug too. Rcu stall is detected after 120 seconds, but task
hang after 120-240 seconds. So rcu stall has much higher chances to be
detected. Do you see the corresponding "rcu stall" bug?

But, yes, we need to tune all timeouts. There is
https://github.com/google/syzkaller/issues/516 for this.
We also need "kernel/hung_task.c: allow to set checking interval
separately from timeout" to be merged:
https://groups.google.com/forum/#!topic/syzkaller/rOr3WBE-POY
as currently it's very hard to tune task hung timeout. But maybe we
will need similar patches for other watchdogs too if they have the
same problem.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ