lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Sat, 6 Aug 2022 15:19:37 +0800
From:   "Leizhen (ThunderTown)" <thunder.leizhen@...wei.com>
To:     <paulmck@...nel.org>
CC:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>,
        "Mel Gorman" <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        <linux-kernel@...r.kernel.org>,
        Frederic Weisbecker <frederic@...nel.org>,
        Neeraj Upadhyay <quic_neeraju@...cinc.com>,
        Josh Triplett <josh@...htriplett.org>,
        "Mathieu Desnoyers" <mathieu.desnoyers@...icios.com>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Joel Fernandes <joel@...lfernandes.org>, <rcu@...r.kernel.org>
Subject: Re: [PATCH v4 0/2] rcu: Display registers of self-detected stall as
 far as possible



On 2022/8/5 16:09, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/8/5 2:09, Paul E. McKenney wrote:
>> On Thu, Aug 04, 2022 at 10:34:18AM +0800, Zhen Lei wrote:
>>> v3 --> v4:
>>> 1. To avoid undo/redo, merge patch 1-2 in v3 into one.
>>>
>>> v2 --> v3:
>>> 1. Patch 1 Add trigger_single_cpu_backtrace(cpu) in synchronize_rcu_expedited_wait()
>>>    Subsequently, we can see that all callers of dump_cpu_task() try
>>>    trigger_single_cpu_backtrace() first. Then I do the cleanup in Patch 2.
>>> 2. Patch 3, as Paul E. McKenney's suggestion, push the code into dump_cpu_task().
>>>
>>> For newcomers:
>>> Currently, dump_cpu_task() is mainly used by RCU, in order to dump the
>>> stack traces of the current task of the specified CPU when a rcu stall
>>> is detected.
>>>
>>> For architectures that do not support NMI interrupts, registers is not
>>> printed when rcu stall is self-detected. This patch series improve it.
>>
>> Thank you!  I have queued both for further testing and review.  I had
>> to rebase them to the -rcu tree's "dev" branch.  There was one trivial
>> conflict, but could you please check the resulting commits, both for
>> my wordsmithing and to make sure that your changes still work in your
>> environment?  (I do not have access to that sort of hardware.)

I tested it on x86, arm64, and arm32 platforms, and all the results are
as expected.

x86:
[   54.750801] rcu: INFO: rcu_preempt self-detected stall on CPU
[   54.754289] rcu:     0-....: (4998 ticks this GP) idle=9e5c/1/0x4000000000000000 softirq=855/855 fqs=1219
[   54.755307]  (t=5005 jiffies g=1 q=36 ncpus=8)
[   54.755311] CPU: 0 PID: 379 Comm: test0 Not tainted 5.19.0-rc3-00108-g0aa4c0b532b6 #1
[   54.755313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
[   54.755321] RIP: 0010:pvclock_clocksource_read+0x12/0xc0
[   54.755328] Code: b6 47 1d 8b 17 39 ca 75 f1 22 05 79 9e a3 01 c3 0f 1f 84 00 00 00 00 00 55 53 48 83 ec 08 8b 17 89 d6 83 e6 fe 0f ae e8 0f 31 <48> c1 e2 20 48 8b 5f 10 0f b6 6f 1d 48 09 c2 48 89 d0 0f be 57 1c
[   54.755330] RSP: 0018:ffff936040547eb0 EFLAGS: 00000206
[   54.755332] RAX: 0000000088921016 RBX: 0000000c9957cfbb RCX: 0000000000000000
[   54.755333] RDX: 0000000000000026 RSI: 0000000000000006 RDI: ffffffffa46de000
[   54.755335] RBP: 0000000000000000 R08: 0000001336358637 R09: ffff936040547ea0
[   54.755336] R10: ffffffffa3e55dc0 R11: ffffffffa46ee4a1 R12: 0000000000003e42
[   54.755338] R13: ffffffffa4727d40 R14: ffffffffa28b0270 R15: 0000000000000000
[   54.755339] FS:  0000000000000000(0000) GS:ffff8e0cefc00000(0000) knlGS:0000000000000000
[   54.755343] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   54.755344] CR2: 0000563a3b1bae40 CR3: 0000000104378000 CR4: 00000000000006f0
[   54.755345] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   54.755346] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   54.755347] Call Trace:
[   54.755356]  <TASK>
[   54.755358]  kvm_clock_read+0x14/0x30
[   54.755361]  ktime_get+0x39/0x90
[   54.755368]  test_task+0x3f/0x60
[   54.755373]  kthread+0xe3/0x110
[   54.755376]  ? kthread_complete_and_exit+0x20/0x20
[   54.755378]  ret_from_fork+0x22/0x30
[   54.755385]  </TASK>


arm64:
[   27.235111] rcu: INFO: rcu_preempt self-detected stall on CPU
[   27.236915] rcu:     0-....: (1249 ticks this GP) idle=cac4/1/0x4000000000000000 softirq=1434/1434 fqs=625
[   27.237475]  (t=1251 jiffies g=3421 q=18 ncpus=4)
[   27.238994] CPU: 0 PID: 356 Comm: test0 Not tainted 5.19.0-rc3-00107-gbf5cb0bc4689 #1
[   27.239467] Hardware name: linux,dummy-virt (DT)
[   27.240038] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   27.240358] pc : arch_counter_read+0x18/0x24
[   27.241439] lr : arch_counter_read+0x18/0x24
[   27.241645] sp : ffff800008e53df0
[   27.241786] x29: ffff800008e53df0 x28: 0000000000000000 x27: 0000000000000000
[   27.242256] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
[   27.242586] x23: 0000000000001f40 x22: ffffd62939e61df0 x21: 00000006556b8c60
[   27.243115] x20: 00000000000028ae x19: ffffd6293aad7300 x18: ffffffffffffffff
[   27.243396] x17: 0000000000000001 x16: 0000000000000000 x15: ffffd6293aabad8e
[   27.243671] x14: ffffffffffffffff x13: ffffd6293aabad85 x12: fffffffffffc4627
[   27.243947] x11: ffffd6293a772c98 x10: 0000000000000a60 x9 : 0000000000000fa0
[   27.244256] x8 : ffffd6293a772c50 x7 : ffff800008e53c40 x6 : 00000000767b62ab
[   27.244541] x5 : 01ffffffffffffff x4 : 0000000000000000 x3 : 0000000000000017
[   27.244813] x2 : 00000000000028ae x1 : ffff800008e53df0 x0 : 0000000076804ac1
[   27.245247] Call trace:
[   27.245457]  arch_counter_read+0x18/0x24
[   27.245845]  ktime_get+0x48/0xa0
[   27.246012]  test_task+0x6c/0xec
[   27.246151]  kthread+0x10c/0x110
[   27.246282]  ret_from_fork+0x10/0x20


arm32
rcu: INFO: rcu_sched self-detected stall on CPU
rcu:    0-....: (499 ticks this GP) idle=c734/1/0x40000002 softirq=161/161 fqs=249
        (t=500 jiffies g=-899 q=16 ncpus=4)
CPU: 0 PID: 70 Comm: test0 Not tainted 5.19.0-rc3+ #1
Hardware name: ARM-Versatile Express
PC is at ktime_get+0x4c/0xe8
LR is at ktime_get+0x4c/0xe8
pc : [<8019b4fc>]    lr : [<8019b4fc>]    psr: 60000013
sp : c8a71f28  ip : 00000001  fp : 00000001
r10: f8ecbe00  r9 : 431bde82  r8 : d7b634db
r7 : 00000a0a  r6 : e2be6498  r5 : 00000001  r4 : 80ca8700
r3 : ffffffff  r2 : ff3a2de2  r1 : 00000000  r0 : 00c5d21d
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 627f806a  DAC: 00000051
 ktime_get from test_task+0x44/0x110
 test_task from kthread+0xd8/0xf4
 kthread from ret_from_fork+0x14/0x2c
Exception stack(0xc8a71fb0 to 0xc8a71ff8)
1fa0:                                     00000000 00000000 00000000 00000000
1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
1fe0: 00000000 00000000 00000000 00000000 00000013 00000000


> 
> Your description is much clearer than mine, thanks. I will test it
> tomorrow, the international network speed is too slow during the day.
> 
>>
>> In the future, could you please send your patches against the -rcu
>> tree's "dev" branch?
> 
> Okay, no problem. I'll do it next time.
> 
>>
>> 							Thanx, Paul
>>
>>> v2:
>>> https://lkml.org/lkml/2022/7/27/1800
>>>
>>> Zhen Lei (2):
>>>   sched/debug: Try trigger_single_cpu_backtrace(cpu) in dump_cpu_task()
>>>   sched/debug: Show the registers of 'current' in dump_cpu_task()
>>>
>>>  kernel/rcu/tree_stall.h |  8 +++-----
>>>  kernel/sched/core.c     | 14 ++++++++++++++
>>>  kernel/smp.c            |  3 +--
>>>  3 files changed, 18 insertions(+), 7 deletions(-)
>>>
>>> -- 
>>> 2.25.1
>>>
>> .
>>
> 

-- 
Regards,
  Zhen Lei

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ