[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <IA1PR11MB6171FFF325BABC76DCF2A8B989DF9@IA1PR11MB6171.namprd11.prod.outlook.com>
Date:   Sat, 11 Feb 2023 08:28:41 +0000
From:   "Zhuo, Qiuxu" <qiuxu.zhuo@...el.com>
To:     "Zhang, Qiang1" <qiang1.zhang@...el.com>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "juri.lelli@...hat.com" <juri.lelli@...hat.com>,
        "paulmck@...nel.org" <paulmck@...nel.org>,
        "frederic@...nel.org" <frederic@...nel.org>,
        "joel@...lfernandes.org" <joel@...lfernandes.org>,
        "rcu@...r.kernel.org" <rcu@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v3] sched/isolation: Fix illegal CPU value by
 housekeeping_any_cpu() return
> From: Zqiang <qiang1.zhang@...el.com>
> Sent: Friday, February 10, 2023 8:40 AM
> To: mingo@...hat.com; peterz@...radead.org; juri.lelli@...hat.com;
> paulmck@...nel.org; frederic@...nel.org; joel@...lfernandes.org;
> rcu@...r.kernel.org; linux-kernel@...r.kernel.org
> Subject: [PATCH v3] sched/isolation: Fix illegal CPU value by
> housekeeping_any_cpu() return
> 
> For kernels built with CONFIG_NO_HZ_FULL=y, running the following tests:
> 
> runqemu kvm slirp nographic qemuparams="-m 1024 -smp 4" bootparams=
> "console=ttyS0 nohz_full=0,1 rcu_nocbs=0,1 sched_verbose" -d
> 
> root@...ux86-64:~# echo 0 > /sys/devices/system/cpu/cpu2/online
> root@...ux86-64:~# echo 0 > /sys/devices/system/cpu/cpu3/online
Hi Qiang,
Did some quick testing using the same kernel parameters and the reproducing steps as yours:
1) If not apply this v3, the kernel was panic like you found.
2) If apply this v3, the kernel did NOT panic and worked well. 
     But a WARNING call trace [1] was thrown. 
     Not sure whether [1] was another issue.
[1]
[ 2445.396928] smpboot: CPU 2 is now offline
[ 2445.399084] CPU2 attaching NULL sched-domain.
[ 2445.399091] CPU3 attaching NULL sched-domain.
[ 2445.399202] CPU3 attaching NULL sched-domain.
[ 2445.399208] root domain span: 3 (max cpu_capacity = 1024)
[ 2449.731424] process 672 (tuned) no longer affine to cpu3
[ 2449.733332] process 509 (systemd-journal) no longer affine to cpu3
[ 2449.742278] process 541 (systemd-udevd) no longer affine to cpu3
[ 2449.745409] process 760 (bash) no longer affine to cpu3
[ 2449.748550] smpboot: CPU 3 is now offline
[ 2449.755129] CPU3 attaching NULL sched-domain.
[ 2449.755194] ------------[ cut here ]------------
[ 2449.756296] WARNING: CPU: 0 PID: 483 at kernel/sched/topology.c:2257 build_sched_domains+0x104c/0x1430
[ 2449.758227] Modules linked in: rfkill sunrpc psmouse i2c_piix4 atkbd libps2 vivaldi_fmap serio_raw virtio_net net_failover failover sr_mod cdrom i8042 qemu_fw_cfg pata_acpi ipmi_devintf ipmi_msghandler
[ 2449.760804] CPU: 0 PID: 483 Comm: kworker/3:6 Not tainted 6.2.0-rc7-rcu+ #21
[ 2449.761820] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[ 2449.762931] Workqueue: events cpuset_hotplug_workfn
[ 2449.763676] RIP: 0010:build_sched_domains+0x104c/0x1430
[ 2449.764465] Code: 45 98 f4 ff ff ff 0f 84 1a f8 ff ff 48 8b 7d 90 31 f6 e8 17 48 ff ff e9 0a f8 ff ff 0f 0b e9 01 fe ff ff 0f 0b e9 b6 fb ff ff <0f> 0b c7 45 98 f4 ff ff ff e9 3f f7 ff ff 48 c7 45 90 00 00 00 00
[ 2449.766934] RSP: 0000:ffffab51c08f7c00 EFLAGS: 00010246
[ 2449.767568] process 737 (tuned) no longer affine to cpu3
[ 2449.768378] RAX: 0000000000000004 RBX: 0000000000000004 RCX: 0000000000000000
[ 2449.769079] RDX: 0000000000000040 RSI: 0000000000000004 RDI: ffff9486442d7f08
[ 2449.769785] RBP: ffffab51c08f7ca0 R08: 0000000000000000 R09: 0000000000000000
[ 2449.770501] R10: 0000000000000190 R11: ffffab51c08f7ab8 R12: 0000000000000001
[ 2449.771227] R13: 0000000000000000 R14: ffff9486424379c0 R15: 0000000000000001
[ 2449.771920] FS:  0000000000000000(0000) GS:ffff948777c00000(0000) knlGS:0000000000000000
[ 2449.772714] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2449.773303] CR2: 000055ed0e0ed158 CR3: 000000010091a002 CR4: 0000000000370ef0
[ 2449.774011] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2449.774725] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2449.775437] Call Trace:
[ 2449.775752]  <TASK>
[ 2449.776053]  ? cpu_attach_domain+0x3d7/0x810
[ 2449.776532]  ? wait_for_completion+0xff/0x110
[ 2449.777015]  partition_sched_domains_locked+0x1e7/0x3a0
[ 2449.777554]  rebuild_sched_domains_locked+0x545/0x800
[ 2449.778032]  ? rcu_sync_enter+0x6b/0xc0
[ 2449.778377]  rebuild_sched_domains+0x1a/0x40
[ 2449.778728]  cpuset_hotplug_workfn+0x18a/0xe10
[ 2449.779105]  ? balance_push+0x51/0x110
[ 2449.779444]  ? finish_task_switch+0x85/0x2c0
[ 2449.779810]  ? __schedule+0x2f7/0x9f0
[ 2449.780134]  process_one_work+0x1cd/0x3e0
[ 2449.780495]  worker_thread+0x32/0x380
[ 2449.781436]  ? process_one_work+0x3e0/0x3e0
[ 2449.782006]  kthread+0xe8/0x110
[ 2449.782478]  ? kthread_complete_and_exit+0x20/0x20
[ 2449.783067]  ret_from_fork+0x1f/0x30
[ 2449.783566]  </TASK>
[ 2449.783953] ---[ end trace 0000000000000000 ]---
[ 2449.789269] process 741 (tuned) no longer affine to cpu3
[ 2449.794191] process 759 (sshd) no longer affine to cpu3
[ 2450.188215] process 732 (in:imjournal) no longer affine to cpu3
[ 2450.188457] process 733 (rs:main Q:Reg) no longer affine to cpu3
[ 2453.011183] process 659 (gmain) no longer affine to cpu3
[ 2465.517178] select_fallback_rq: 1 callbacks suppressed
[ 2465.517185] process 605 (rpcbind) no longer affine to cpu3
[ 2479.794154] process 652 (chronyd) no longer affine to cpu2
...
Powered by blists - more mailing lists
 
