linux-kernel - [PATCH v2 0/3] sched/fair: Fix NEXT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20241127055610.7076-1-adamli@os.amperecomputing.com>
Date: Wed, 27 Nov 2024 05:56:07 +0000
From: Adam Li <adamli@...amperecomputing.com>
To: peterz@...radead.org,
	mingo@...hat.com,
	juri.lelli@...hat.com,
	vincent.guittot@...aro.org
Cc: dietmar.eggemann@....com,
	rostedt@...dmis.org,
	bsegall@...gle.com,
	mgorman@...e.de,
	vschneid@...hat.com,
	linux-kernel@...r.kernel.org,
	patches@...erecomputing.com,
	cl@...ux.com,
	christian.loehle@....com,
	vineethr@...ux.ibm.com,
	Adam Li <adamli@...amperecomputing.com>
Subject: [PATCH v2 0/3] sched/fair: Fix NEXT_BUDDY panic and warning

When running Specjbb workload with NEXT_BUDDY enabled, kernel warning,
rcu stall and panic may be triggered.

The kernel panic is triggered in pick_next_entity() if pick_eevdf()
returns NULL.

In patch 1 ("Fix warning if NEXT_BUDDY enabled"), if sched_delayed is set
we do not set next buddy. After the patch, kernel warning, rcu stall and
panic disappear. However to avoid panic, we still check return value of
pick_eevdf().

The 'last' and 'skip' buddy are obsoleted by EEVDF. Update the comments in
pick_next_entity().

Detail log bellow:
[  124.972623] ------------[ cut here ]------------
[  124.977300] cfs_rq->next->sched_delayed
[  124.977310] WARNING: CPU: 51 PID: 2150 at kernel/sched/fair.c:5621 pick_task_fair+0x130/0x150
[  125.049547] CPU: 51 UID: 0 PID: 2150 Comm: kworker/51:1 Tainted: G            E      6.12.0.adam+ #1
[  125.058678] Tainted: [E]=UNSIGNED_MODULE
[  125.062591] Hardware name: IEI NF5280R7/Mitchell MB, BIOS 4.4.3.1 10/16/2024
[  125.069629] Workqueue:  0x0 (mm_percpu_wq)
[  125.073721] pstate: 634000c9 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
[  125.080671] pc : pick_task_fair+0x130/0x150
[  125.084841] lr : pick_task_fair+0x130/0x150
[  125.089013] sp : ffff8000ab41bc10
[  125.092315] x29: ffff8000ab41bc10 x28: 0000000000000000 x27: 0000000000000000
[  125.099440] x26: ffff000123bd8788 x25: 0000000000000402 x24: 0000000000000001
[  125.106565] x23: ffff000123bd8000 x22: ffff007dfef5cd00 x21: ffff007dfef5cd80
[  125.113689] x20: ffff007dfef5cd80 x19: ffff2001ab20a780 x18: 0000000000000006
[  125.120815] x17: 0000000000000000 x16: 0000000000000000 x15: ffff8000ab41b5e0
[  125.127938] x14: 0000000000000000 x13: 646579616c65645f x12: 64656863733e2d74
[  125.135062] x11: fffffffffc000000 x10: ffff207dfac9b420 x9 : ffff80008014ed60
[  125.142185] x8 : 00000000ffdfffff x7 : ffff207dfac80000 x6 : 000000000000122c
[  125.149309] x5 : ffff007dfef49408 x4 : 40000000ffe0122c x3 : ffff807d7d673000
[  125.156433] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000123bd8000
[  125.163561] Call trace:
[  125.165996]  pick_task_fair+0x130/0x150 (P)
[  125.170167]  pick_task_fair+0x130/0x150 (L)
[  125.174338]  pick_next_task_fair+0x48/0x3c0
[  125.178512]  __pick_next_task+0x4c/0x220
[  125.182426]  pick_next_task+0x44/0x980
[  125.186163]  __schedule+0x3d0/0x628
[  125.189645]  schedule+0x3c/0xe0
[  125.192776]  worker_thread+0x1a4/0x368
[  125.196516]  kthread+0xfc/0x110
[  125.199647]  ret_from_fork+0x10/0x20
[  125.203213] ---[ end trace 0000000000000000 ]---
[  125.207818] ------------[ cut here ]------------

[  211.151849] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  211.159759] rcu:     (detected by 141, t=15003 jiffies, g=5629, q=26516 ncpus=384)
[  211.169780] rcu: All QSes seen, last rcu_preempt kthread activity 15002 (4294943634-4294928632), jiffies_till_next_fqs=2, root ->qsmask 0x0
[  211.185062] rcu: rcu_preempt kthread timer wakeup didn't happen for 15004 jiffies! g5629 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0
[  211.199043] rcu:     Possible timer handling issue on cpu=352 timer-softirq=1091
[  211.208943] rcu: rcu_preempt kthread starved for 15012 jiffies! g5629 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=352
[  211.222141] rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[  211.234037] rcu: RCU grace-period kthread stack dump:
[  211.241854] task:rcu_preempt     state:R  running task     stack:0     pid:17    tgid:17    ppid:2      flags:0x00000008
[  211.255487] Call trace:
[  211.260698]  __switch_to+0xf0/0x150 (T)
[  211.267299]  __schedule+0x238/0x628
[  211.273553]  schedule+0x3c/0xe0
[  211.279459]  schedule_timeout+0x88/0x108
[  211.286147]  rcu_gp_fqs_loop+0x158/0x4d0
[  211.292835]  rcu_gp_kthread+0x164/0x198
[  211.299436]  kthread+0xfc/0x110
[  211.305342]  ret_from_fork+0x10/0x20
[  211.311684] rcu: Stack dump where RCU GP kthread last ran:
[  211.319940] Sending NMI from CPU 141 to CPUs 352:
[  211.327411] NMI backtrace for cpu 352
[  211.333835] CPU: 352 UID: 0 PID: 0 Comm: swapper/352 Tainted: G        W   E      6.12.0.adam+ #1
[  211.345466] Tainted: [W]=WARN, [E]=UNSIGNED_MODULE
[  211.353021] Hardware name: IEI NF5280R7/Mitchell MB, BIOS 4.4.3.1 10/16/2024
[  211.362834] pstate: 63400009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
[  211.372557] pc : cpuidle_enter_state+0xcc/0x4f0
[  211.379851] lr : cpuidle_enter_state+0xc0/0x4f0
[  211.387147] sp : ffff8000878c3d70
[  211.393226] x29: ffff8000878c3d70 x28: 0000000000000000 x27: 0000000000000000
[  211.403125] x26: 0000000000000000 x25: 0000000000000000 x24: 0000003133e26b84
[  211.413023] x23: 0000000000000000 x22: ffff800082092d98 x21: 00000031341844e8
[  211.422922] x20: 0000000000000000 x19: ffff20011d459800 x18: 0000000000000000
[  211.432820] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[  211.442719] x14: 0000000000000000 x13: ffff800081dcf030 x12: 0000000000000001
[  211.452619] x11: 0000003131433c14 x10: 071c71c71c71c71c x9 : ffff8000810b5900
[  211.462517] x8 : 00000000003bf790 x7 : ffff207e031d57e4 x6 : 0000002023a22338
[  211.472416] x5 : 1fffffffffffffff x4 : 0000000000000015 x3 : 000000000030e5a8
[  211.482314] x2 : ffffa07d818ed000 x1 : ffff207e031d6d00 x0 : 0000000000000000
[  211.492213] Call trace:
[  211.497426]  cpuidle_enter_state+0xcc/0x4f0 (P)
[  211.504719]  cpuidle_enter_state+0xc0/0x4f0 (L)
[  211.512013]  cpuidle_enter+0x40/0x60
[  211.518351]  cpuidle_idle_call+0x130/0x1c8
[  211.525212]  do_idle+0xec/0xf8
[  211.531033]  cpu_startup_entry+0x40/0x50
[  211.537719]  secondary_start_kernel+0xe0/0x120
[  211.544926]  __secondary_switched+0xc0/0xc8

[  297.371198] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000051
[  297.406112] CPU: 116 UID: 0 PID: 10328 Comm: Grizzly-worker( Tainted: G        W   E      6.12.0.adam+ #1
[  297.414884] Mem abort info:
[  297.424437] Tainted: [W]=WARN, [E]=UNSIGNED_MODULE
[  297.427219]   ESR = 0x0000000096000005
[  297.431997] Hardware name: IEI NF5280R7/Mitchell MB, BIOS 4.4.3.1 10/16/2024
[  297.435734]   EC = 0x25: DABT (current EL), IL = 32 bits
[  297.442770] pstate: a34000c9 (NzCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
[  297.448069]   SET = 0, FnV = 0
[  297.455018] pc : pick_task_fair+0x50/0x150
[  297.458060]   EA = 0, S1PTW = 0
[  297.462144] lr : pick_task_fair+0x50/0x150
[  297.465274]   FSC = 0x05: level 1 translation fault
[  297.469358] sp : ffff800101d93ae0
[  297.474223] Data abort info:
[  297.477526] x29: ffff800101d93ae0
[  297.480395]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[  297.480395]  x28: 0000000000000009
[  297.483703]  x27: 0000000000000000
[  297.489177]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  297.492567]
[  297.495956]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  297.500996] x26: ffff006da4381b08
[  297.502477] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000198b3b000
[  297.507777]  x25: 0000000000000080
[  297.511080] [0000000000000051] pgd=08000001c0636403
[  297.517509]  x24: 0000000000000001
[  297.520899] , p4d=08000001c0636403
[  297.525765]
[  297.529155] , pud=0000000000000000
[  297.532545] x23: ffff006da4381380
[  297.534025]
[  297.537415]  x22: ffff007dff7fed00 x21: ffff007dff7fed80
[  297.547496] x20: ffff000167f60c00 x19: 0000000000000000 x18: 0000000000000006
[  297.554621] x17: ffff8000820b3be8 x16: 0000000087c17f9e x15: ffff800083d53690
[  297.561745] x14: 0000000000000004 x13: ffff800081df4ac8 x12: 0000000000000000
[  297.568868] x11: ffff200111a3f0b0 x10: ffff200111a3efc8 x9 : ffff800080109e48
[  297.575992] x8 : 00000000000000b8 x7 : 0000000000000074 x6 : 0000000000000002
[  297.583115] x5 : 0000000000000002 x4 : 0000000000000002 x3 : 0000000000000000
[  297.590239] x2 : fffffffffffffff0 x1 : 0000000000000000 x0 : 0000000000000000
[  297.597362] Call trace:
[  297.599795]  pick_task_fair+0x50/0x150 (P)
[  297.603879]  pick_task_fair+0x50/0x150 (L)
[  297.607963]  pick_next_task_fair+0x30/0x3c0
[  297.612134]  __pick_next_task+0x4c/0x220
[  297.616045]  pick_next_task+0x44/0x980
[  297.619782]  __schedule+0x3d0/0x628
[  297.623259]  do_task_dead+0x50/0x60
[  297.626736]  do_exit+0x28c/0x410
[  297.629955]  do_group_exit+0x3c/0xa0
[  297.633518]  get_signal+0x8c4/0x8d0
[  297.636996]  do_signal+0x9c/0x270
[  297.640299]  do_notify_resume+0xe0/0x198
[  297.644212]  el0_svc+0xf4/0x170
[  297.647342]  el0t_64_sync_handler+0x10c/0x138
[  297.651687]  el0t_64_sync+0x1ac/0x1b0
[  297.655339] Code: d503201f 1400002a aa1403e0 97ffda0b (39414401)
[  297.661439] ---[ end trace 0000000000000000 ]---
[  297.726593] Kernel panic - not syncing: Oops: Fatal exception

v2:
  Follow Christian Loehle's suggestion, revise commit message.
  Add patch to check return value of pick_eevdf() in pick_next_entity().

v1:
   https://lore.kernel.org/all/20241125021222.356881-1-adamli@os.amperecomputing.com/

Adam Li (3):
  sched/fair: Fix warning if NEXT_BUDDY enabled
  sched/fair: Fix panic if pick_eevdf() returns NULL
  sched/fair: Update comments regarding last and skip buddy

 kernel/sched/fair.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

-- 
2.25.1