lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230131202635.GA3019407@paulmck-ThinkPad-P17-Gen-1>
Date:   Tue, 31 Jan 2023 12:26:35 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     kernel test robot <yujie.liu@...el.com>
Cc:     Liam Howlett <liam.howlett@...cle.com>, oe-lkp@...ts.linux.dev,
        lkp@...el.com, linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-fsdevel@...r.kernel.org
Subject: Re: [linus:master] [maple_tree] 120b116208:
 INFO:task_blocked_for_more_than#seconds

On Tue, Jan 31, 2023 at 03:18:22PM +0800, kernel test robot wrote:
> Hi Liam,
> 
> We caught a "task blocked" dmesg in maple tree test. Not sure if this
> is expected for maple tree test, so we are sending this report for
> your information. Thanks.
> 
> Greeting,
> 
> FYI, we noticed INFO:task_blocked_for_more_than#seconds due to commit (built with clang-14):
> 
> commit: 120b116208a0877227fc82e3f0df81e7a3ed4ab1 ("maple_tree: reorganize testing to restore module testing")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> in testcase: boot
> 
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> 
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> 
> 
> [   17.318428][    T1] calling  maple_tree_seed+0x0/0x15d0 @ 1
> [   17.319219][    T1] 
> [   17.319219][    T1] TEST STARTING
> [   17.319219][    T1] 
> [  999.249871][   T23] INFO: task rcu_scale_shutd:59 blocked for more than 491 seconds.
> [  999.253363][   T23]       Not tainted 6.1.0-rc4-00003-g120b116208a0 #1
> [  999.254249][   T23] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  999.255390][   T23] task:rcu_scale_shutd state:D stack:30968 pid:59    ppid:2      flags:0x00004000
> [  999.256934][   T23] Call Trace:
> [  999.257418][   T23]  <TASK>
> [  999.257900][   T23]  __schedule+0x169b/0x1f90
> [  999.261677][   T23]  schedule+0x151/0x300
> [  999.262281][   T23]  ? compute_real+0xe0/0xe0
> [  999.263364][   T23]  rcu_scale_shutdown+0xdd/0x130
> [  999.264093][   T23]  ? wake_bit_function+0x2c0/0x2c0
> [  999.268985][   T23]  kthread+0x309/0x3a0
> [  999.269958][   T23]  ? compute_real+0xe0/0xe0
> [  999.270552][   T23]  ? kthread_unuse_mm+0x200/0x200
> [  999.271281][   T23]  ret_from_fork+0x1f/0x30
> [  999.272385][   T23]  </TASK>
> [  999.272865][   T23] 
> [  999.272865][   T23] Showing all locks held in the system:
> [  999.273988][   T23] 2 locks held by swapper/0/1:
> [  999.274684][   T23] 1 lock held by khungtaskd/23:
> [  999.275400][   T23]  #0: ffffffff88346e00 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x8/0x30
> [  999.277171][   T23] 
> [  999.277525][   T23] =============================================
> [  999.277525][   T23] 
> [ 1049.050884][    T1] maple_tree: 12610686 of 12610686 tests passed
> 
> 
> If you fix the issue, kindly add following tag
> | Reported-by: kernel test robot <yujie.liu@...el.com>
> | Link: https://lore.kernel.org/oe-lkp/202301310940.4a37c7af-yujie.liu@intel.com

Liam brought this to my attention on IRC, and it looks like the root
cause is that the rcuscale code does not deal gracefully with grace
periods that are in much excess of a second in duration.

Now, it might well be worth looking into why the grace periods were taking
that long, but if you were running Maple Tree stress tests concurrently
with rcuscale, this might well be expected behavior.

So, does the patch below clear this up for you?

							Thanx, Paul

------------------------------------------------------------------------

commit 8e44d51e3411994091f7c7c136286d82c5757a4a
Author: Paul E. McKenney <paulmck@...nel.org>
Date:   Tue Jan 31 12:08:54 2023 -0800

    rcuscale: Move shutdown from wait_event() to wait_event_idle()
    
    The rcu_scale_shutdown() and kfree_scale_shutdown() kthreads/functions
    use wait_event() to wait for the rcuscale test to complete.  However,
    each updater thread in such a test waits for at least 100 grace periods.
    If each grace period takes more than 1.2 seconds, which is long, but
    not insanely so, this can trigger the hung-task timeout.
    
    This commit therefore replaces those wait_event() calls with calls to
    wait_event_idle(), which do not trigger the hung-task timeout.
    
    Reported-by: kernel test robot <yujie.liu@...el.com>
    Reported-by: Liam Howlett <liam.howlett@...cle.com>
    Signed-off-by: Paul E. McKenney <paulmck@...nel.org>

diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c
index 91fb5905a008f..4120f94030c3c 100644
--- a/kernel/rcu/rcuscale.c
+++ b/kernel/rcu/rcuscale.c
@@ -631,8 +631,7 @@ static int compute_real(int n)
 static int
 rcu_scale_shutdown(void *arg)
 {
-	wait_event(shutdown_wq,
-		   atomic_read(&n_rcu_scale_writer_finished) >= nrealwriters);
+	wait_event_idle(shutdown_wq, atomic_read(&n_rcu_scale_writer_finished) >= nrealwriters);
 	smp_mb(); /* Wake before output. */
 	rcu_scale_cleanup();
 	kernel_power_off();
@@ -771,8 +770,8 @@ kfree_scale_cleanup(void)
 static int
 kfree_scale_shutdown(void *arg)
 {
-	wait_event(shutdown_wq,
-		   atomic_read(&n_kfree_scale_thread_ended) >= kfree_nrealthreads);
+	wait_event_idle(shutdown_wq,
+			atomic_read(&n_kfree_scale_thread_ended) >= kfree_nrealthreads);
 
 	smp_mb(); /* Wake before output. */
 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ