linux-kernel - Fwd: rcu stall in __d_lookup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAEtum4kcs-c6RHFg1+nVcc+mc7DK2QuBWg9qrR9EJDnwAKDo1w@mail.gmail.com>
Date:   Sun, 12 Sep 2021 21:28:03 +0800
From:   lixiaofeng li <lixiaofeng427@...il.com>
To:     linux-kernel@...r.kernel.org
Subject: Fwd: rcu stall in __d_lookup_rcu

Hello,
      The rcu stall warming is reported randomly after the system was
uptime more than 100 days. Because it is an embedded system, the
vmcore can not be generated.  From the dumpstack of rcu stall warm,
the rcu stall was detected in __d_lookup_rcu() function.
The similar rcu stall warnings happened 6 times on different machines,
which all pointed to  d_alloc_parallel()-->__d_lookup_rcu().  Some of
them triggered the oops due to khungtaskd detection..
     And the commits 015555f and 8cc07c8 by Will Deacon  have already
been present in our linux  kernel.
Can someone help guide what the issue is and how to process this type
of issue further?

1. The linux version is 4.14.47 and the cpu is fsl,T1040RDB, 32bits.
    4.14.47+gb09b730 #1 SMP Fri Apr 2 01:34:59 UTC 2021 ppc GNU/Linux.
2. The tree rcu is used and preempt is disabled
CONFIG_TREE_RCU=y
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
CONFIG_HAVE_RCU_TABLE_FREE=y
CONFIG_RCU_CPU_STALL_TIMEOUT=60
CONFIG_RCU_TRACE=y

CONFIG_PREEMPT_NONE=y

3. Below is the rcu warming:
[10953024.068601] INFO: rcu_sched self-detected stall on CPU
[10953024.074035]    0-...: (59998 ticks this GP)
idle=cc2/140000000000001/0 softirq=1466139644/1466139644 fqs=14337
[10953024.084226]     (t=60000 jiffies g=872597753 c=872597752 q=83688)
[10953024.090418] Task dump for CPU 0:
[10953024.090421] pidof           R  running task     6096 29584
29580 0x00000006
[10953024.090439] Call Trace:
[10953024.090450] [e871fb00] [c00cc2f4] rcu_dump_cpu_stacks+0xa4/0x100
(unreliable)
[10953024.090465] [e871fb20] [c00cb52c] rcu_check_callbacks+0x7dc/0xa00
[10953024.090476] [e871fb90] [c00d3ec0] update_process_times+0x40/0x80
[10953024.090481] [e871fba0] [c00eccac] tick_sched_handle.isra.5+0x4c/0x70
[10953024.090486] [e871fbb0] [c00ecd44] tick_sched_timer+0x74/0x110
[10953024.090493] [e871fbd0] [c00d5698] __hrtimer_run_queues+0x138/0x380
[10953024.090499] [e871fc20] [c00d5cc8] hrtimer_interrupt+0x108/0x2c0
[10953024.090510] [e871fc60] [c000bee8] __timer_interrupt+0xf8/0x340
[10953024.090516] [e871fc90] [c000c360] timer_interrupt+0xd0/0x110
[10953024.090525] [e871fcb0] [c0013928] ret_from_except+0x0/0x18
[10953024.090537] --- interrupt: 901 at __d_lookup_rcu+0x8c/0x210
[10953024.090537]     LR = d_alloc_parallel+0xe4/0x520
[10953024.090541] [e871fd70] [c02537bc] __d_alloc+0x10c/0x230 (unreliable)
[10953024.090547] [e871fda0] [c02540b4] d_alloc_parallel+0xe4/0x520
[10953024.090556] [e871fe10] [c02ae7b0] proc_fill_cache+0x120/0x1e0
[10953024.090562] [e871fe70] [c02af940] proc_pid_readdir+0x1a0/0x300
[10953024.090567] [e871fed0] [c024c4e4] iterate_dir+0x1d4/0x240
[10953024.090572] [e871ff00] [c024cbd8] SyS_getdents+0x98/0x160
[10953024.090578] [e871ff40] [c0013278] ret_from_syscall+0x0/0x3c
[10953024.090583] --- interrupt: c01 at 0xfea2d14
[10953024.090583]     LR = 0xfea2ccc
[10953125.985432] INFO: task notfmgrd:5188 blocked for more than 120 seconds.
[10953125.992357]       Tainted: P           O    4.14.47+gb09b730 #1
[10953125.998646] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[10953126.006769] notfmgrd        D 3984  5188      1 0x00000002
[10953126.006785] Call Trace:
[10953126.006800] [e78cdad0] [c0088584]
__update_load_avg_se.isra.3+0x234/0x270 (unreliable)
[10953126.006808] [e78cdb90] [c09b8dcc] __schedule+0x2cc/0x9d0
[10953126.006813] [e78cdbf0] [c09b9508] schedule+0x38/0xc0
[10953126.006820] [e78cdc00] [c09bd6d4] schedule_timeout+0x244/0x4b0
[10953126.006826] [e78cdc50] [c09ba428] wait_for_common+0xc8/0x1c0
[10953126.006835] [e78cdc90] [c00c128c] __wait_rcu_gp+0x22c/0x240
[10953126.006844] [e78cdcd0] [c00ca284] synchronize_sched+0x54/0x90
[10953126.006861] [e78cdd00] [f46dc804] pram_notify_change+0x214/0x3b0 [pramfs]
[10953126.006872] [e78cdd40] [c02592f4] notify_change+0x1e4/0x520
[10953126.006879] [e78cdd70] [c022d9ec] do_truncate+0x7c/0x100
[10953126.006890] [e78cddc0] [c0246674] path_openat+0xf74/0x13f0
[10953126.006896] [e78cde50] [c02481e4] do_filp_open+0x74/0x100
[10953126.006902] [e78cdf00] [c022f3c0] do_sys_open+0x220/0x2c0
[10953126.006913] [e78cdf40] [c0013278] ret_from_syscall+0x0/0x3c
[10953126.006918] --- interrupt: c01 at 0xfed5eb8
[10953126.006918]     LR = 0xfed5e7c
[10953126.006921] Kernel panic - not syncing: hung_task: blocked tasks
[10953126.013201] CPU: 3 PID: 404 Comm: khungtaskd Tainted: P
 O    4.14.47+gb09b730 #1
[10953126.021740] Call Trace:
[10953126.024452] [e93b1e30] [c09a05a8] dump_stack+0x88/0xc0 (unreliable)
[10953126.030993] [e93b1e50] [c004a088] panic+0x124/0x28c
[10953126.036145] [e93b1eb0] [c0124fd8] watchdog+0x3c8/0x430
[10953126.041557] [e93b1f10] [c0072594] kthread+0x164/0x170
[10953126.046878] [e93b1f40] [c00133b0] ret_from_kernel_thread+0x5c/0x64

Thanks

regards
sky