[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20241125021222.356881-1-adamli@os.amperecomputing.com>
Date: Mon, 25 Nov 2024 02:12:19 +0000
From: Adam Li <adamli@...amperecomputing.com>
To: peterz@...radead.org,
mingo@...hat.com,
juri.lelli@...hat.com,
vincent.guittot@...aro.org
Cc: dietmar.eggemann@....com,
rostedt@...dmis.org,
bsegall@...gle.com,
mgorman@...e.de,
vschneid@...hat.com,
linux-kernel@...r.kernel.org,
patches@...erecomputing.com,
cl@...ux.com,
Adam Li <adamli@...amperecomputing.com>
Subject: [PATCH 0/2] sched/fair: Fix NEXT_BUDDY panic and clean up comments
When running Specjbb workload with NEXT_BUDDY enabled, kernel warning and
panic may be triggered. We should not set next buddy if sched_delayed is set.
The 'last' and 'skip' buddy are obsoleted by EEVDF. Update the comments in
pick_next_entity().
[ 124.972623] ------------[ cut here ]------------
[ 124.977300] cfs_rq->next->sched_delayed
[ 124.977310] WARNING: CPU: 51 PID: 2150 at kernel/sched/fair.c:5621 pick_task_fair+0x130/0x150
[ 125.049547] CPU: 51 UID: 0 PID: 2150 Comm: kworker/51:1 Tainted: G E 6.12.0.adam+ #1
[ 125.058678] Tainted: [E]=UNSIGNED_MODULE
[ 125.062591] Hardware name: IEI NF5280R7/Mitchell MB, BIOS 4.4.3.1 10/16/2024
[ 125.069629] Workqueue: 0x0 (mm_percpu_wq)
[ 125.073721] pstate: 634000c9 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
[ 125.080671] pc : pick_task_fair+0x130/0x150
[ 125.084841] lr : pick_task_fair+0x130/0x150
[ 125.089013] sp : ffff8000ab41bc10
[ 125.092315] x29: ffff8000ab41bc10 x28: 0000000000000000 x27: 0000000000000000
[ 125.099440] x26: ffff000123bd8788 x25: 0000000000000402 x24: 0000000000000001
[ 125.106565] x23: ffff000123bd8000 x22: ffff007dfef5cd00 x21: ffff007dfef5cd80
[ 125.113689] x20: ffff007dfef5cd80 x19: ffff2001ab20a780 x18: 0000000000000006
[ 125.120815] x17: 0000000000000000 x16: 0000000000000000 x15: ffff8000ab41b5e0
[ 125.127938] x14: 0000000000000000 x13: 646579616c65645f x12: 64656863733e2d74
[ 125.135062] x11: fffffffffc000000 x10: ffff207dfac9b420 x9 : ffff80008014ed60
[ 125.142185] x8 : 00000000ffdfffff x7 : ffff207dfac80000 x6 : 000000000000122c
[ 125.149309] x5 : ffff007dfef49408 x4 : 40000000ffe0122c x3 : ffff807d7d673000
[ 125.156433] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000123bd8000
[ 125.163561] Call trace:
[ 125.165996] pick_task_fair+0x130/0x150 (P)
[ 125.170167] pick_task_fair+0x130/0x150 (L)
[ 125.174338] pick_next_task_fair+0x48/0x3c0
[ 125.178512] __pick_next_task+0x4c/0x220
[ 125.182426] pick_next_task+0x44/0x980
[ 125.186163] __schedule+0x3d0/0x628
[ 125.189645] schedule+0x3c/0xe0
[ 125.192776] worker_thread+0x1a4/0x368
[ 125.196516] kthread+0xfc/0x110
[ 125.199647] ret_from_fork+0x10/0x20
[ 125.203213] ---[ end trace 0000000000000000 ]---
[ 125.207818] ------------[ cut here ]------------
[ 297.371198] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000051
[ 297.406112] CPU: 116 UID: 0 PID: 10328 Comm: Grizzly-worker( Tainted: G W E 6.12.0.adam+ #1
[ 297.414884] Mem abort info:
[ 297.424437] Tainted: [W]=WARN, [E]=UNSIGNED_MODULE
[ 297.427219] ESR = 0x0000000096000005
[ 297.431997] Hardware name: IEI NF5280R7/Mitchell MB, BIOS 4.4.3.1 10/16/2024
[ 297.435734] EC = 0x25: DABT (current EL), IL = 32 bits
[ 297.442770] pstate: a34000c9 (NzCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
[ 297.448069] SET = 0, FnV = 0
[ 297.455018] pc : pick_task_fair+0x50/0x150
[ 297.458060] EA = 0, S1PTW = 0
[ 297.462144] lr : pick_task_fair+0x50/0x150
[ 297.465274] FSC = 0x05: level 1 translation fault
[ 297.469358] sp : ffff800101d93ae0
[ 297.474223] Data abort info:
[ 297.477526] x29: ffff800101d93ae0
[ 297.480395] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[ 297.480395] x28: 0000000000000009
[ 297.483703] x27: 0000000000000000
[ 297.489177] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 297.492567]
[ 297.495956] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 297.500996] x26: ffff006da4381b08
[ 297.502477] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000198b3b000
[ 297.507777] x25: 0000000000000080
[ 297.511080] [0000000000000051] pgd=08000001c0636403
[ 297.517509] x24: 0000000000000001
[ 297.520899] , p4d=08000001c0636403
[ 297.525765]
[ 297.529155] , pud=0000000000000000
[ 297.532545] x23: ffff006da4381380
[ 297.534025]
[ 297.537415] x22: ffff007dff7fed00 x21: ffff007dff7fed80
[ 297.547496] x20: ffff000167f60c00 x19: 0000000000000000 x18: 0000000000000006
[ 297.554621] x17: ffff8000820b3be8 x16: 0000000087c17f9e x15: ffff800083d53690
[ 297.561745] x14: 0000000000000004 x13: ffff800081df4ac8 x12: 0000000000000000
[ 297.568868] x11: ffff200111a3f0b0 x10: ffff200111a3efc8 x9 : ffff800080109e48
[ 297.575992] x8 : 00000000000000b8 x7 : 0000000000000074 x6 : 0000000000000002
[ 297.583115] x5 : 0000000000000002 x4 : 0000000000000002 x3 : 0000000000000000
[ 297.590239] x2 : fffffffffffffff0 x1 : 0000000000000000 x0 : 0000000000000000
[ 297.597362] Call trace:
[ 297.599795] pick_task_fair+0x50/0x150 (P)
[ 297.603879] pick_task_fair+0x50/0x150 (L)
[ 297.607963] pick_next_task_fair+0x30/0x3c0
[ 297.612134] __pick_next_task+0x4c/0x220
[ 297.616045] pick_next_task+0x44/0x980
[ 297.619782] __schedule+0x3d0/0x628
[ 297.623259] do_task_dead+0x50/0x60
[ 297.626736] do_exit+0x28c/0x410
[ 297.629955] do_group_exit+0x3c/0xa0
[ 297.633518] get_signal+0x8c4/0x8d0
[ 297.636996] do_signal+0x9c/0x270
[ 297.640299] do_notify_resume+0xe0/0x198
[ 297.644212] el0_svc+0xf4/0x170
[ 297.647342] el0t_64_sync_handler+0x10c/0x138
[ 297.651687] el0t_64_sync+0x1ac/0x1b0
[ 297.655339] Code: d503201f 1400002a aa1403e0 97ffda0b (39414401)
[ 297.661439] ---[ end trace 0000000000000000 ]---
[ 297.726593] Kernel panic - not syncing: Oops: Fatal exception
Adam Li (2):
sched/fair: Fix panic if NEXT_BUDDY enabled
sched/fair: Update comments regarding last and skip buddy
kernel/sched/fair.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
--
2.25.1
Powered by blists - more mailing lists