lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <PH0PR11MB58808C49B2BFA8F4D6D9E0FEDAFF9@PH0PR11MB5880.namprd11.prod.outlook.com>
Date:   Tue, 10 Jan 2023 08:12:49 +0000
From:   "Zhang, Qiang1" <qiang1.zhang@...el.com>
To:     "paulmck@...nel.org" <paulmck@...nel.org>
CC:     "Liu, Yujie" <yujie.liu@...el.com>,
        "oe-lkp@...ts.linux.dev" <oe-lkp@...ts.linux.dev>,
        lkp <lkp@...el.com>, "rcu@...r.kernel.org" <rcu@...r.kernel.org>,
        "frederic@...nel.org" <frederic@...nel.org>,
        "quic_neeraju@...cinc.com" <quic_neeraju@...cinc.com>,
        "joel@...lfernandes.org" <joel@...lfernandes.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v4] rcu-tasks: Make rude RCU-Tasks work well with CPU
 hotplug


On Mon, Jan 09, 2023 at 06:10:12PM -0800, Paul E. McKenney wrote:
> On Fri, Jan 06, 2023 at 02:48:56AM +0000, Zhang, Qiang1 wrote:
> > 
> > On Thu, Dec 22, 2022 at 09:35:06AM +0000, Zhang, Qiang1 wrote:
> > > >On Mon, Dec 19, 2022 at 02:21:01AM +0000, Zhang, Qiang1 wrote:
> > > > >Greeting,
> > > > >FYI, we noticed WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_fwd_prog_cr[rcutorture] due to commit (built with gcc-11):
> > > > >
> > > > >commit: 572a17843591d3c03ad891492939a06833fdd17d ("[PATCH v4] rcu-tasks: Make rude RCU-Tasks work well with CPU hotplug")
> > > > >url: https://github.com/intel-lab-lkp/linux/commits/Zqiang/rcu-tasks-Make-rude-RCU-Tasks-work-well-with-CPU-hotplug/20221201-074127
> > > > >base: https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git dev
> > > > >patch link: https://lore.kernel.org/all/20221130234533.1983769-1-qiang1.zhang@intel.com/
> > > > >patch subject: [PATCH v4] rcu-tasks: Make rude RCU-Tasks work well with CPU hotplug
> > > > >
> > > > >in testcase: rcutorture
> > > > >version: 
> > > > >with following parameters:
> > > > >
> > > > >	runtime: 300s
> > > > >	test: cpuhotplug
> > > > >	torture_type: tasks-rude
> > > > >
> > > > >test-description: rcutorture is rcutorture kernel module load/unload test.
> > > > >test-url: https://www.kernel.org/doc/Documentation/RCU/torture.txt
> > > > >
> > > > >on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 8G
> > > > >
> > > > >caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> > > > >
> > > > >
> > > > >[  106.051532][  T583] rcu_torture_fwd_prog: Starting forward-progress test 0
> > > > >[  106.052085][  T583] rcu_torture_fwd_prog_cr: Starting forward-progress test 0
> > > > >[  133.611262][  T583] rcu_torture_fwd_prog_cr: Waiting for CBs: rcu_barrier_tasks_rude+0x0/0x10() 0
> > > > >[  146.800051][  T583] ------------[ cut here ]------------
> > > > >[  146.800411][  T583] WARNING: CPU: 1 PID: 583 at kernel/rcu/rcutorture.c:2806 rcu_torture_fwd_prog_cr+0x22c/0x2a7 [rcutorture]
> > > > >[  146.801075][  T583] Modules linked in: rcutorture torture ipmi_msghandler crc32c_intel serio_raw processor fuse
> > > > >[  146.801894][  T583] CPU: 1 PID: 583 Comm: rcu_torture_fwd Not tainted 6.1.0-rc1-00180-g572a17843591 #1 0cc09f902db70bae111a0c12c137296733dde4a9
> > > > >[  146.802916][  T583] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-5 04/01/2014
> > > > >[  146.803693][  T583] EIP: rcu_torture_fwd_prog_cr+0x22c/0x2a7 [rcutorture]
> > > > >[  146.804177][  T583] Code: 89 d8 e8 fc c5 ff ff e8 67 49 03 00 83 c4 10 84 c0 75 79 a0 96 c6 10 ef 84 c0 75 70 e8 c8 ee ff ff 84 c0 75 67 83 fe 63 7f 02 <0f> 0b 8b 45 f0 8b 15 40 25 8a c2 ff 75 e8 ff 75 e0 01 f8 2b 45 dc
> > > > >[  146.805599][  T583] EAX: 00000000 EBX: ecee3800 ECX: 00000000 EDX: 00000000
> > > > >[  146.805992][  T583] ESI: 00000000 EDI: 0000c350 EBP: ed9d5f64 ESP: ed9d5f40
> > > > >[  146.806491][  T583] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010293
> > > > >[  146.807010][  T583] CR0: 80050033 CR2: 08082ff0 CR3: 2daaf000 CR4: 000406d0
> > > > >[  146.807484][  T583] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > > > >[  146.808031][  T583] DR6: fffe0ff0 DR7: 00000400
> > > > >[  146.808384][  T583] Call Trace:
> > > > >[  146.808634][  T583]  rcu_torture_fwd_prog.cold+0x3b/0xee [rcutorture 6754ed9afe4685f50ef7fade6309181c73794538]
> > > > >[  146.809348][  T583]  kthread+0xc8/0xf0
> > > > >[  146.809635][  T583]  ? rcu_torture_fwd_prog_cbfree+0x80/0x80 [rcutorture 6754ed9afe4685f50ef7fade6309181c73794538]
> > > > >[  146.810347][  T583]  ? kthread_complete_and_exit+0x20/0x20
> > > > >[  146.810734][  T583]  ret_from_fork+0x1c/0x28
> > > > >[  146.811075][  T583] irq event stamp: 205883
> > > > >[  146.811400][  T583] hardirqs last  enabled at (205891): [<c114bb06>] __up_console_sem+0x66/0x80
> > > > >[  146.811960][  T583] hardirqs last disabled at (205898): [<c114baed>] __up_console_sem+0x4d/0x80
> > > > >[  146.812583][  T583] softirqs last  enabled at (205880): [<c1ecb40b>] __do_softirq+0x2bb/0x440
> > > > >[  146.813079][  T583] softirqs last disabled at (205871): [<c10845f0>] call_on_stack+0x40/0x50
> > > > >[  146.813567][  T583] ---[ end trace 0000000000000000 ]---
> > > > >[  146.813926][  T583] rcu_torture_fwd_prog_cr Duration 2411 barrier: 3960 pending 50000 n_launders: 0 n_launders_sa: 0 n_max_gps: 0 n_max_cbs: 50000 cver 1 gps 0
> > > > >[  147.914266][  T583] rcu_torture_fwd_cb_hist: Callback-invocation histogram 0 (duration 6702 jiffies): 1s/10: 0:0 2s/10: 
> > > > >[  149.453780][  T557] ------------[ cut here ]------------
> > > > >[  149.454322][  T557] rcu_torture_writer: rtort_pipe_count: 4
> > > > >[  149.454817][  T557] WARNING: CPU: 1 PID: 557 at kernel/rcu/rcutorture.c:1583 rcu_torture_writer+0x71d/0xc80 [rcutorture]
> > > > 
> > > > 
> > > > This is not a bug.  this is caused by grace period taking too long time, the previous callback
> > > > has not been completed.  from the dmesg, can be found that the cpuhotplug test is being
> > > > performed periodically, this may cause the rude RCU-Tasks  grace period to take more time,
> > > > due to we need to acquire the cpus_read_lock, and the CPU0 always bootup failed, that is to
> > > > say, only one CPU of your system is online at this time.
> > > >
> > > >Onlining of a CPU failing with EIO is a new one on me.  Especially
> > > >persistent failure.
> > > 
> > > I use the kernel configuration file in the attachment and  base on:
> > > https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git dev
> > > 
> > > use "echo 1 > /sys/devices/system/cpu/cpu0/online" can reproduce this problem,
> > > the CPU0 always fails to go online.  
> > > 
> > > Debug found CPU0 is always not set in cpu_initialized_mask.
> > > causes the do_boot_cpu() to return -1.
> > > 
> > > do_boot_cpu()
> > >      wakeup_cpu_via_init_nmi();
> > >      if (!boot_error) {
> > >                 /*
> > >                  * Wait 10s total for first sign of life from AP
> > >                  */
> > >                 boot_error = -1;
> > >                 timeout = jiffies + 10*HZ;
> > >                 while (time_before(jiffies, timeout)) {
> > >                         if (cpumask_test_cpu(cpu, cpu_initialized_mask)) {
> > >                                 /*
> > >                                  * Tell AP to proceed with initialization
> > >                                  */
> > >                                 cpumask_set_cpu(cpu, cpu_callout_mask);
> > >                                 boot_error = 0;
> > >                                 break;
> > >                         }
> > >                         schedule();
> > >                 }
> > >         }
> > > 
> > > This looks related to this modification e1c467e69040c("x86, hotplug: 
> > > Wake up CPU0 via NMI instead of INIT, SIPI, SIPI ").
> > > 
> > > 
> > > The following modification can make CPU0 go online successfully(This
> > > is just a test, not sure if there are other effects).
> > >
> > >
> > >Thank you for tracking this down!!!
> > >
> > >Huh.  CPU 0 is normally the boot CPU.  Back in the day, it could not
> > >be offlined.  Given that your testing indicates that CPU 0 can now be
> > >taken offline, maybe this "if" statement is a holdover that someone
> > >forgot to remove?
> > >
> > >But I must defer to those who know a lot more about this level of
> > >x86 code than I do.
> > 
> > I found relevant modification information, maybe it will be of some help
> > 
> > commit e1c467e69040c3be68959332959c07fb3d818e87
> > Author: Fenghua Yu <fenghua.yu@...el.com>
> > Date:   Wed Nov 14 04:36:53 2012 -0800
> > 
> >     x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI
> > 
> >     Instead of waiting for STARTUP after INITs, BSP will execute the BIOS boot-strap
> >     code which is not a desired behavior for waking up BSP. To avoid the boot-strap
> >     code, wake up CPU0 by NMI instead.
> > 
> >     This works to wake up soft offlined CPU0 only. If CPU0 is hard offlined (i.e.
> >     physically hot removed and then hot added), NMI won't wake it up. We'll change
> >     this code in the future to wake up hard offlined CPU0 if real platform and
> >     request are available.
> > 
> >     AP is still waken up as before by INIT, SIPI, SIPI sequence.
> > 
> >     Signed-off-by: Fenghua Yu <fenghua.yu@...el.com>
> >     Link: http://lkml.kernel.org/r/1352896613-25957-1-git-send-email-fenghua.yu@intel.com
> >     Signed-off-by: H. Peter Anvin <hpa@...ux.intel.com>
> 
> Interesting!
> 
> When I run rcutorture on x86 (under qemu/KVM), it refuses to attempt to
> offline CPU 0.  The reason is that cpu_is_hotpluggable(0) returns false.
> 
> If I comment out that check, I get this:
> 
> 	rcu-torture:torture_onoff task: offline 0 failed: errno -1
> 
> A bit of digging turned up the CONFIG_BOOTPARAM_HOTPLUG_CPU0 Kconfig
> option.  Setting that causes CPU 0 to be offlined.
> 
> I clearly need to add this to one of the scenarios.  I arbitrarily
> chose TREE01, but please let me know if some other scenario or
> group of scenarios would be better.
>
>For example, like this.

This looks good, whether all TREE* can be added ? 
(after all, this just makes CPU0 support offline, but the actual CPU going 
offline/online also depends on "onoff_interval").

Thanks
Zqiang

>
>						Thanx, Paul
>
>------------------------------------------------------------------------
>
>commit 6c11be38a0363b61db36352555e6746920711a1f
>Author: Paul E. McKenney <paulmck@...nel.org>
>Date:   Mon Jan 9 21:01:12 2023 -0800
>
>    rcutorture: Set CONFIG_BOOTPARAM_HOTPLUG_CPU0 to offline CPU 0
>    
>    There is now a BOOTPARAM_HOTPLUG_CPU0 Kconfig option that allows CPU 0
>    to be offlined on x86 systems.  This commit therefore sets this option in
>    the TREE01 rcutorture scenario in order to regularly test this capability.
>    
>    Reported-by: "Zhang, Qiang1" <qiang1.zhang@...el.com>
>    Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
>
>diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE01 b/tools/testing/selftests/rcutorture/configs/rcu/TREE01
>index 8ae41d5f81a3e..04831ef1f9b55 100644
>--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE01
>+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE01
>@@ -15,3 +15,4 @@ CONFIG_DEBUG_LOCK_ALLOC=n
>CONFIG_RCU_BOOST=n
> CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
> CONFIG_RCU_EXPERT=y
>+CONFIG_BOOTPARAM_HOTPLUG_CPU0=y

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ