[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87edx7l5px.fsf@mpe.ellerman.id.au>
Date: Tue, 23 Aug 2022 18:33:14 +1000
From: Michael Ellerman <mpe@...erman.id.au>
To: Zhouyi Zhou <zhouzhouyi@...il.com>, npiggin@...il.com,
christophe.leroy@...roup.eu, atrajeev@...ux.vnet.ibm.com,
linuxppc-dev@...ts.ozlabs.org, linux-kernel@...r.kernel.org,
lance@...osl.org, paulmck@...nel.org, rcu@...r.kernel.org
Cc: Zhouyi Zhou <zhouzhouyi@...il.com>
Subject: Re: [PATCH linux-next] powerpc: disable sanitizer in irq_soft_mask_set
Zhouyi Zhou <zhouzhouyi@...il.com> writes:
> In ppc, compiler based sanitizer will generate instrument instructions
> around statement WRITE_ONCE(local_paca->irq_soft_mask, mask):
>
> 0xc000000000295cb0 <+0>: addis r2,r12,774
> 0xc000000000295cb4 <+4>: addi r2,r2,16464
> 0xc000000000295cb8 <+8>: mflr r0
> 0xc000000000295cbc <+12>: bl 0xc00000000008bb4c <mcount>
> 0xc000000000295cc0 <+16>: mflr r0
> 0xc000000000295cc4 <+20>: std r31,-8(r1)
> 0xc000000000295cc8 <+24>: addi r3,r13,2354
> 0xc000000000295ccc <+28>: mr r31,r13
> 0xc000000000295cd0 <+32>: std r0,16(r1)
> 0xc000000000295cd4 <+36>: stdu r1,-48(r1)
> 0xc000000000295cd8 <+40>: bl 0xc000000000609b98 <__asan_store1+8>
> 0xc000000000295cdc <+44>: nop
> 0xc000000000295ce0 <+48>: li r9,1
> 0xc000000000295ce4 <+52>: stb r9,2354(r31)
> 0xc000000000295ce8 <+56>: addi r1,r1,48
> 0xc000000000295cec <+60>: ld r0,16(r1)
> 0xc000000000295cf0 <+64>: ld r31,-8(r1)
> 0xc000000000295cf4 <+68>: mtlr r0
>
> If there is a context switch before "stb r9,2354(r31)", r31 may
> not equal to r13, in such case, irq soft mask will not work.
>
> This patch disable sanitizer in irq_soft_mask_set.
>
> Signed-off-by: Zhouyi Zhou <zhouzhouyi@...il.com>
> ---
> Dear PPC developers
>
> I found this bug when trying to do rcutorture tests in ppc VM of
> Open Source Lab of Oregon State University following Paul E. McKenny's guidance.
>
> console.log report following bug:
>
> [ 346.527467][ T100] BUG: using smp_processor_id() in preemptible [00000000] code: rcu_torture_rea/100^M
> [ 346.529416][ T100] caller is rcu_preempt_deferred_qs_irqrestore+0x74/0xed0^M
> [ 346.531157][ T100] CPU: 4 PID: 100 Comm: rcu_torture_rea Tainted: G W 5.19.0-rc5-next-20220708-dirty #253^M
> [ 346.533620][ T100] Call Trace:^M
> [ 346.534449][ T100] [c0000000094876c0] [c000000000ce2b68] dump_stack_lvl+0xbc/0x108 (unreliable)^M
> [ 346.536632][ T100] [c000000009487710] [c000000001712954] check_preemption_disabled+0x154/0x160^M
> [ 346.538665][ T100] [c0000000094877a0] [c0000000002ce2d4] rcu_preempt_deferred_qs_irqrestore+0x74/0xed0^M
> [ 346.540830][ T100] [c0000000094878b0] [c0000000002cf3c0] __rcu_read_unlock+0x290/0x3b0^M
> [ 346.542746][ T100] [c000000009487910] [c0000000002bb330] rcu_torture_read_unlock+0x30/0xb0^M
> [ 346.544779][ T100] [c000000009487930] [c0000000002b7ff8] rcutorture_one_extend+0x198/0x810^M
> [ 346.546851][ T100] [c000000009487a10] [c0000000002b8bfc] rcu_torture_one_read+0x58c/0xc90^M
> [ 346.548844][ T100] [c000000009487ca0] [c0000000002b942c] rcu_torture_reader+0x12c/0x360^M
> [ 346.550784][ T100] [c000000009487db0] [c0000000001de978] kthread+0x1e8/0x220^M
> [ 346.552555][ T100] [c000000009487e10] [c00000000000cd54] ret_from_kernel_thread+0x5c/0x64^M
>
> After 12 days debugging, I finally narrow the problem to irq_soft_mask_set.
Thanks for spending 12 days debugging it! O_o
> diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
> index 26ede09c521d..a5ae8d82cc9d 100644
> --- a/arch/powerpc/include/asm/hw_irq.h
> +++ b/arch/powerpc/include/asm/hw_irq.h
> @@ -121,7 +121,7 @@ static inline notrace unsigned long irq_soft_mask_return(void)
> * for the critical section and as a clobber because
> * we changed paca->irq_soft_mask
> */
> -static inline notrace void irq_soft_mask_set(unsigned long mask)
> +static inline notrace __no_kcsan __no_sanitize_address void irq_soft_mask_set(unsigned long mask)
> {
> /*
> * The irq mask must always include the STD bit if any are set.
My worry is that this will force irq_soft_mask_set() out of line, which
we would rather avoid. It's meant to be a fast path.
In fact with this applied I see nearly 300 out-of-line copies of the
function when building a defconfig, and ~1700 calls to it.
Normally it is inlined at every call site.
So I think I'm inclined to revert ef5b570d3700 ("powerpc/irq: Don't open
code irq_soft_mask helpers").
It was a nice looking cleanup, but those loads must not be instrumented
by KASAN, but we also want them inlined, and AFAICS the only way to
achieve that is to go back to inline asm.
cheers
Powered by blists - more mailing lists