linux-kernel - Re: [PATCH 3/3] arm64: debug: Remove rcu_read

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20190718233133.146065f668da6297e57e52ef@kernel.org>
Date:   Thu, 18 Jul 2019 23:31:33 +0900
From:   Masami Hiramatsu <mhiramat@...nel.org>
To:     Mark Rutland <mark.rutland@....com>
Cc:     "Paul E. McKenney" <paulmck@...ux.ibm.com>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will.deacon@....com>,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        Naresh Kamboju <naresh.kamboju@...aro.org>,
        Dan Rue <dan.rue@...aro.org>,
        Matt Hart <matthew.hart@...aro.org>,
        Anders Roxell <anders.roxell@...aro.org>,
        Daniel Diaz <daniel.diaz@...aro.org>
Subject: Re: [PATCH 3/3] arm64: debug: Remove rcu_read_lock from debug
 exception

On Thu, 18 Jul 2019 10:20:23 +0100
Mark Rutland <mark.rutland@....com> wrote:

> On Wed, Jul 17, 2019 at 11:22:15PM -0700, Paul E. McKenney wrote:
> > On Thu, Jul 18, 2019 at 02:43:58PM +0900, Masami Hiramatsu wrote:
> > > Remove rcu_read_lock()/rcu_read_unlock() from debug exception
> > > handlers since the software breakpoint can be hit on idle task.
> 
> Why precisely do we need to elide these? Are we seeing warnings today?

Yes, unfortunately, or fortunately. Naresh reported that warns when
ftracetest ran. I confirmed that happens if I probe on default_idle_call too.

/sys/kernel/debug/tracing # echo p default_idle_call >> kprobe_events 
/sys/kernel/debug/tracing # echo 1 > events/kprobes/enable 
/sys/kernel/debug/tracing # [  135.122237] 
[  135.125035] =============================
[  135.125310] WARNING: suspicious RCU usage
[  135.125581] 5.2.0-08445-g9187c508bdc7 #20 Not tainted
[  135.125904] -----------------------------
[  135.126205] include/linux/rcupdate.h:594 rcu_read_lock() used illegally while idle!
[  135.126839] 
[  135.126839] other info that might help us debug this:
[  135.126839] 
[  135.127410] 
[  135.127410] RCU used illegally from idle CPU!
[  135.127410] rcu_scheduler_active = 2, debug_locks = 1
[  135.128114] RCU used illegally from extended quiescent state!
[  135.128555] 1 lock held by swapper/0/0:
[  135.128944]  #0: (____ptrval____) (rcu_read_lock){....}, at: call_break_hook+0x0/0x178
[  135.130499] 
[  135.130499] stack backtrace:
[  135.131192] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0-08445-g9187c508bdc7 #20
[  135.131841] Hardware name: linux,dummy-virt (DT)
[  135.132224] Call trace:
[  135.132491]  dump_backtrace+0x0/0x140
[  135.132806]  show_stack+0x24/0x30
[  135.133133]  dump_stack+0xc4/0x10c
[  135.133726]  lockdep_rcu_suspicious+0xf8/0x108
[  135.134171]  call_break_hook+0x170/0x178
[  135.134486]  brk_handler+0x28/0x68
[  135.134792]  do_debug_exception+0x90/0x150
[  135.135051]  el1_dbg+0x18/0x8c
[  135.135260]  default_idle_call+0x0/0x44
[  135.135516]  cpu_startup_entry+0x2c/0x30
[  135.135815]  rest_init+0x1b0/0x280
[  135.136044]  arch_call_rest_init+0x14/0x1c
[  135.136305]  start_kernel+0x4d4/0x500
[  135.136597] 


> 
> > The exception entry and exit use irq_enter() and irq_exit(), in this
> > case, correct?  Otherwise RCU will be ignoring this CPU.
> 
> This is missing today, which sounds like the underlying bug.

Agreed. I'm not so familier with how debug exception is handled on arm64,
would it be a kind of NMI or IRQ?

Anyway, it seems that normal irqs are also not calling irq_enter/exit
except for arch/arm64/kernel/smp.c. We need to fix that too for avoiding
unexpected RCU issues.

Thank you,

-- 
Masami Hiramatsu <mhiramat@...nel.org>