lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAABZP2wYcNXkTo=tgX-ARziwgD2rng+-wCZ-qfQ6M30+vmLEug@mail.gmail.com>
Date:   Wed, 28 Sep 2022 09:48:52 +0800
From:   Zhouyi Zhou <zhouzhouyi@...il.com>
To:     Nicholas Piggin <npiggin@...il.com>
Cc:     mpe@...erman.id.au, christophe.leroy@...roup.eu,
        atrajeev@...ux.vnet.ibm.com, linuxppc-dev@...ts.ozlabs.org,
        linux-kernel@...r.kernel.org, lance@...osl.org, paulmck@...nel.org,
        rcu@...r.kernel.org
Subject: Re: [PATCH linux-next][RFC] powerpc: avoid lockdep when we are offline

Thank Nick for reviewing my patch

On Tue, Sep 27, 2022 at 12:25 PM Nicholas Piggin <npiggin@...il.com> wrote:
>
> On Tue Sep 27, 2022 at 11:48 AM AEST, Zhouyi Zhou wrote:
> > This is second version of my fix to PPC's  "WARNING: suspicious RCU usage",
> > I improved my fix under Paul E. McKenney's guidance:
> > Link: https://lore.kernel.org/lkml/20220914021528.15946-1-zhouzhouyi@gmail.com/T/
> >
> > During the cpu offlining, the sub functions of xive_teardown_cpu will
> > call __lock_acquire when CONFIG_LOCKDEP=y. The latter function will
> > travel RCU protected list, so "WARNING: suspicious RCU usage" will be
> > triggered.
> >
> > Avoid lockdep when we are offline.
>
> I don't see how this is safe. If RCU is no longer watching the CPU then
> the memory it is accessing here could be concurrently freed. I think the
> warning is valid.
Agree
>
> powerpc's problem is that cpuhp_report_idle_dead() is called before
> arch_cpu_idle_dead(), so it must not rely on any RCU protection there.
> I would say xive cleanup just needs to be done earlier. I wonder why it
> is not done in __cpu_disable or thereabouts, that's where the interrupt
> controller is supposed to be stopped.
Yes, I learn flowing events sequence from kgdb debugging
__cpu_disable -> pseries_cpu_disable -> set_cpu_online(cpu, false)  =
leads to =>  do_idle: if (cpu_is_offline(cpu) -> arch_cpu_idle_dead
so xive cleanup should be done in pseries_cpu_disable.
But as a beginner, I afraid that I am incompetent to do above
sophisticated work without error although I am very like to,
Could any expert do this for us?

Thanks a lot
Cheers
Zhouyi
>
> Thanks,
> Nick
>
> >
> > Signed-off-by: Zhouyi Zhou <zhouzhouyi@...il.com>
> > ---
> > Dear PPC and RCU developers
> >
> > I found this bug when trying to do rcutorture tests in ppc VM of
> > Open Source Lab of Oregon State University
> >
> > console.log report following bug:
> > [   37.635545][    T0] WARNING: suspicious RCU usage^M
> > [   37.636409][    T0] 6.0.0-rc4-next-20220907-dirty #8 Not tainted^M
> > [   37.637575][    T0] -----------------------------^M
> > [   37.638306][    T0] kernel/locking/lockdep.c:3723 RCU-list traversed in non-reader section!!^M
> > [   37.639651][    T0] ^M
> > [   37.639651][    T0] other info that might help us debug this:^M
> > [   37.639651][    T0] ^M
> > [   37.641381][    T0] ^M
> > [   37.641381][    T0] RCU used illegally from offline CPU!^M
> > [   37.641381][    T0] rcu_scheduler_active = 2, debug_locks = 1^M
> > [   37.667170][    T0] no locks held by swapper/6/0.^M
> > [   37.668328][    T0] ^M
> > [   37.668328][    T0] stack backtrace:^M
> > [   37.669995][    T0] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 6.0.0-rc4-next-20220907-dirty #8^M
> > [   37.672777][    T0] Call Trace:^M
> > [   37.673729][    T0] [c000000004653920] [c00000000097f9b4] dump_stack_lvl+0x98/0xe0 (unreliable)^M
> > [   37.678579][    T0] [c000000004653960] [c0000000001f2eb8] lockdep_rcu_suspicious+0x148/0x16c^M
> > [   37.680425][    T0] [c0000000046539f0] [c0000000001ed9b4] __lock_acquire+0x10f4/0x26e0^M
> > [   37.682450][    T0] [c000000004653b30] [c0000000001efc2c] lock_acquire+0x12c/0x420^M
> > [   37.684113][    T0] [c000000004653c20] [c0000000010d704c] _raw_spin_lock_irqsave+0x6c/0xc0^M
> > [   37.686154][    T0] [c000000004653c60] [c0000000000c7b4c] xive_spapr_put_ipi+0xcc/0x150^M
> > [   37.687879][    T0] [c000000004653ca0] [c0000000010c72a8] xive_cleanup_cpu_ipi+0xc8/0xf0^M
> > [   37.689856][    T0] [c000000004653cf0] [c0000000010c7370] xive_teardown_cpu+0xa0/0xf0^M
> > [   37.691877][    T0] [c000000004653d30] [c0000000000fba5c] pseries_cpu_offline_self+0x5c/0x100^M
> > [   37.693882][    T0] [c000000004653da0] [c00000000005d2c4] arch_cpu_idle_dead+0x44/0x60^M
> > [   37.695739][    T0] [c000000004653dc0] [c0000000001c740c] do_idle+0x16c/0x3d0^M
> > [   37.697536][    T0] [c000000004653e70] [c0000000001c7a1c] cpu_startup_entry+0x3c/0x40^M
> > [   37.699694][    T0] [c000000004653ea0] [c00000000005ca20] start_secondary+0x6c0/0xb50^M
> > [   37.701742][    T0] [c000000004653f90] [c00000000000d054] start_secondary_prolog+0x10/0x14^M
> >
> >
> > Tested on PPC VM of Open Source Lab of Oregon State University.
> > Test results show that although "WARNING: suspicious RCU usage" has gone,
> > and there are less "BUG: soft lockup" reports than the original kernel
> > (9 vs 13), which sounds good ;-)
> >
> > But after my modification, results-rcutorture-kasan/SRCU-P/console.log.diags
> > shows a new warning:
> > [  222.289242][  T110] WARNING: CPU: 6 PID: 110 at kernel/rcu/rcutorture.c:2806 rcu_torture_fwd_prog+0xc88/0xdd0
> >
> > I guess above new warning also exits in original kernel, so I write a tiny test script as follows:
> >
> > #!/bin/sh
> >
> > COUNTER=0
> > while [ $COUNTER -lt 1000 ] ; do
> >     qemu-system-ppc64 -nographic -smp cores=8,threads=1 -net none -M pseries -nodefaults -device spapr-vscsi -serial file:/tmp/console.log -m 2G -kernel /tmp/vmlinux -append "debug_boot_weak_hash panic=-1 console=ttyS0 rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 rcutorture.torture_type=srcud rcupdate.rcu_self_test=1 rcutorture.fwd_progress=3 srcutree.big_cpu_lim=5 rcutorture.onoff_interval=1000 rcutorture.onoff_holdoff=30 rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15 rcutorture.shutdown_secs=420 rcutorture.test_no_idle_hz=1 rcutorture.verbose=1"&
> >     qemu_pid=$!
> >     cd ~/next1/linux-next
> >     make clean
> > #I use "make vmlinux -j 8" to create heavy background jitter
> >     make vmlinux -j 8  > /dev/null 2>&1
> >     make_pid=$!
> >     wait $qemu_pid
> >     kill $qemu_pid
> >     kill $make_id
> >     if grep -q WARN /tmp/console.log;
> >     then
> >         echo $COUNTER > /tmp/counter
> >         exit
> >     fi
> >     COUNTER=$(($COUNTER+1))
> > done
> >
> > Above test shows that original kernel also warn about
> > "WARNING: CPU: 6 PID: 110 at kernel/rcu/rcutorture.c:2806 rcu_torture_fwd_prog+0xc88/0xdd0"
> >
> > But I am not very sure about my results, so I still add a [RFC] to my subject line.
> >
> > Thank all of you for your guidance and encouragement ;-)
> >
> > Cheers
> > Zhouyi
> > --
> >  arch/powerpc/platforms/pseries/hotplug-cpu.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > index e0a7ac5db15d..e47098f00da1 100644
> > --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > @@ -64,10 +64,15 @@ static void pseries_cpu_offline_self(void)
> >
> >       local_irq_disable();
> >       idle_task_exit();
> > +     /* prevent lockdep code from traveling RCU protected list
> > +      * when we are offline.
> > +      */
> > +     lockdep_off();
> >       if (xive_enabled())
> >               xive_teardown_cpu();
> >       else
> >               xics_teardown_cpu();
> > +     lockdep_on();
> >
> >       unregister_slb_shadow(hwcpu);
> >       rtas_stop_self();
> > --
> > 2.25.1
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ