lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 19 Apr 2022 12:06:30 +0800
From:   patrick wang <patrick.wang.shcn@...il.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     paulmck@...nel.org, frederic@...nel.org, quic_neeraju@...cinc.com,
        josh@...htriplett.org, mathieu.desnoyers@...icios.com,
        jiangshanlai@...il.com, joel@...lfernandes.org,
        rcu@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] rcu: ftrace: avoid tracing a few functions executed in multi_cpu_stop()

On Tue, Apr 19, 2022 at 2:34 AM Steven Rostedt <rostedt@...dmis.org> wrote:
>
> On Mon, 18 Apr 2022 12:37:35 +0800
> Patrick Wang <patrick.wang.shcn@...il.com> wrote:
>
> > A few functions are in the call chain of rcu_momentary_dyntick_idle()
> > which is executed in multi_cpu_stop() and marked notrace. They are running
> > in traced when ftrace modify code. This may cause non-ftrace_modify_code
> > CPUs stall:
>
> I'm confused by this. How is traced functions causing this exactly? Is this
> on RISC-V?

During ftrace modify code, these functions are running and their
instructions will
be modified by ftrace (I see the nop instructions in these functions
from the compiler).
When instructions are being modified, they shouldn't be executed. Or
the executor
may behave unpredictably.

Yes, it is on RISC-V.

>
> >
> > [   72.686113] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > [   72.687344] rcu:   1-...!: (0 ticks this GP) idle=14f/1/0x4000000000000000 softirq=3397/3397 fqs=0
> > [   72.687800] rcu:   3-...!: (0 ticks this GP) idle=ee9/1/0x4000000000000000 softirq=5168/5168 fqs=0
> > [   72.688280]        (detected by 0, t=8137 jiffies, g=5889, q=2 ncpus=4)
> > [   72.688739] Task dump for CPU 1:
> > [   72.688991] task:migration/1     state:R  running task     stack:    0 pid:   19 ppid:     2 flags:0x00000000
> > [   72.689594] Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
> > [   72.690242] Call Trace:
> > [   72.690603] Task dump for CPU 3:
> > [   72.690761] task:migration/3     state:R  running task     stack:    0 pid:   29 ppid:     2 flags:0x00000000
> > [   72.691135] Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
> > [   72.691474] Call Trace:
> > [   72.691733] rcu: rcu_preempt kthread timer wakeup didn't happen for 8136 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
> > [   72.692180] rcu:   Possible timer handling issue on cpu=2 timer-softirq=594
> > [   72.692485] rcu: rcu_preempt kthread starved for 8137 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2
> > [   72.692876] rcu:   Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> > [   72.693232] rcu: RCU grace-period kthread stack dump:
> > [   72.693433] task:rcu_preempt     state:I stack:    0 pid:   14 ppid:     2 flags:0x00000000
> > [   72.693788] Call Trace:
> > [   72.694018] [<ffffffff807f3740>] schedule+0x56/0xc2
> > [   72.694306] [<ffffffff807f9cd8>] schedule_timeout+0x82/0x184
> > [   72.694539] [<ffffffff8007c456>] rcu_gp_fqs_loop+0x19a/0x318
> > [   72.694809] [<ffffffff8007e408>] rcu_gp_kthread+0x11a/0x140
> > [   72.695325] [<ffffffff800324d6>] kthread+0xee/0x118
> > [   72.695657] [<ffffffff8000398a>] ret_from_exception+0x0/0x14
> > [   72.696089] rcu: Stack dump where RCU GP kthread last ran:
> > [   72.696383] Task dump for CPU 2:
> > [   72.696562] task:migration/2     state:R  running task     stack:    0 pid:   24 ppid:     2 flags:0x00000000
> > [   72.697059] Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
> > [   72.697471] Call Trace:
> >
> > Mark rcu_preempt_deferred_qs(), rcu_preempt_need_deferred_qs() and
> > rcu_preempt_deferred_qs_irqrestore() notrace to avoid this.
> >
>
> The rcu_momentary_dyntick_idle() was marked notrace because of RISC-V not
> being able to call functions from within stop machine. If that's what is
> being prevented,

Yes, that is.

> then I'm fine with this (although I'm thinking we need
> different kinds of "notrace" for different architectures as one arch's
> limitation should not be cause for another's).
>

Totally agree with this. The "notrace" currently is heavy, can effect all archs.

Thanks
Patrick


> But before I ack this patch, I want to understand the real issues here.
>
> -- Steve

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ