[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <752118a5-96a4-4e60-bf3c-4e51e29be4b1@oracle.com>
Date: Tue, 29 Oct 2024 15:51:23 +0000
From: John Garry <john.g.garry@...cle.com>
To: Dave Chinner <david@...morbit.com>
Cc: tj@...nel.org, jiangshanlai@...il.com, mingo@...hat.com,
peterz@...radead.org, juri.lelli@...hat.com, jack@...e.cz,
akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
chandan.babu@...cle.com
Subject: Re: workqueue lockup debug
On 29/10/2024 08:07, John Garry wrote:
+ Chandan
Note that Chandan also reported this other issue
https://lore.kernel.org/all/87r08zvsdd.fsf@debian-BULLSEYE-live-builder-AMD64/,
which still exists in rc4
>>> Maybe this is a scheduler issue, as Dave mentioned in that same thread.
>> I just got a new hit on the scheduler issue on 6.12-rc5:
>>
>> [ 172.477662] ------------[ cut here ]------------
>> [ 172.480660] se->on_rq
>> [ 172.480682] WARNING: CPU: 3 PID: 728318 at kernel/sched/fair.c:5629
That WARN is new code in 6.12
>> pick_task_fair+0xb6/0x1b0
>> [ 172.487172] Modules linked in:
>> [ 172.488911] CPU: 3 UID: 0 PID: 728318 Comm: 291 Not tainted 6.12.0-
>> rc5-dgc+ #273
>> [ 172.492022] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>> BIOS 1.16.3-debian-1.16.3-2 04/01/2014
>> [ 172.496029] RIP: 0010:pick_task_fair+0xb6/0x1b0
>> [ 172.497978] Code: 40 f3 f3 03 01 74 2a 41 80 7f 50 00 74 96 f6 05
>> 31 f3 f3 03 01 75 8d c6 05 28 f3 f3 03 01 48 c7 c7 41 26 e4 82 e8 7a
>> d7 fa ff <0f> 0b e9 73 ff ff ff c6 05 0d f3 f3 03 01 48 c7 c7 2f 26 e4
>> 82 e8
>> [ 172.505868] RSP: 0018:ffffc9002ec83d60 EFLAGS: 00010046
>> [ 172.508111] RAX: bdcdd05ee831a400 RBX: 0000000000000000 RCX:
>> 0000000000000027
>> [ 172.511139] RDX: 0000000000000000 RSI: 00000000ffdfffff RDI:
>> ffff88881fadc9c8
>> [ 172.514326] RBP: ffffc9002ec83d88 R08: 00000000001fffff R09:
>> ffff88a018a00000
>> [ 172.517804] R10: 00000000005ffffd R11: 0000000000000004 R12:
>> ffff888122b58c00
>> [ 172.520788] R13: ffff88881faf0140 R14: ffff88881faf00c0 R15:
>> ffff88819e8cd180
>> [ 172.523748] FS: 0000000000000000(0000) GS:ffff88881fac0000(0000)
>> knlGS:0000000000000000
>> [ 172.527167] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 172.529571] CR2: 00007f378e8866fc CR3: 000000104daae000 CR4:
>> 0000000000350ef0
>> [ 172.532601] Call Trace:
>> [ 172.533666] <TASK>
>> [ 172.534557] ? show_regs+0x5e/0x70
>> [ 172.535993] ? __warn+0xd0/0x1d0
>> [ 172.537430] ? pick_task_fair+0xb6/0x1b0
>> [ 172.539062] ? report_bug+0x145/0x1f0
>> [ 172.540619] ? handle_bug+0x67/0x90
>> [ 172.542060] ? exc_invalid_op+0x1b/0x50
>> [ 172.543854] ? asm_exc_invalid_op+0x1b/0x20
>> [ 172.545688] ? pick_task_fair+0xb6/0x1b0
>> [ 172.547353] pick_next_task_fair+0x27/0x330
>> [ 172.549090] __schedule+0x2ad/0xb10
>> [ 172.550460] do_task_dead+0x43/0x50
>> [ 172.551743] do_exit+0x836/0xab0
>> [ 172.552950] do_group_exit+0x8f/0x90
>> [ 172.554274] __x64_sys_exit_group+0x17/0x20
>> [ 172.555808] x64_sys_call+0x2ed3/0x2ee0
>> [ 172.557210] do_syscall_64+0x68/0x130
>> [ 172.558531] ? exc_page_fault+0x62/0xc0
>> [ 172.559907] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>> [ 172.561739] RIP: 0033:0x7f378e74b3c5
>> [ 172.563036] Code: Unable to access opcode bytes at 0x7f378e74b39b.
>> [ 172.565278] RSP: 002b:00007ffd21861488 EFLAGS: 00000206 ORIG_RAX:
>> 00000000000000e7
>> [ 172.568001] RAX: ffffffffffffffda RBX: 00007f378e857fa8 RCX:
>> 00007f378e74b3c5
>> [ 172.570589] RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI:
>> 0000000000000000
>> [ 172.573182] RBP: 0000000000000000 R08: 00007ffd21861428 R09:
>> 00007ffd218613af
>> [ 172.575758] R10: 00007ffd21861320 R11: 0000000000000206 R12:
>> 00007f378e856680
>> [ 172.578356] R13: 00007f378e8b9f40 R14: 0000000000000001 R15:
>> 00007f378e857fc0
>> [ 172.580941] </TASK>
>>
>> There was nothing particularly obvious that caused it. It's a 64p
>> VM, running a heavy concurrent fstests workload that takes about 20
>> minutes to complete. There are a bit over 8000 tasks reported, most
>> of them kernel threads. The load is consuming about 50 CPUs, 40GB of
>> RAM, sustaining ~3GB/s of IO to/from disk across about 130 block
>> devices and the scheduler is context switching around 800,000 times
>> a second.
>>
>> I have no idea how to reproduce this on demand - it randomly shows
>> up in about 1 in every 10 test runs and all the tests running at
>> that point in time report failure because they all detect this
>> warning in dmesg.
>
> I have not been able to test v6.12-rc5 yet.
Just to confirm, v6.12-rc5 also hangs for me
>
> Can you share your config? And/Or compare to mine at https://
> pastebin.com/Y7DXnMG2
>
> Maybe we can see what was merged for the sched in 6.12, and I can try
> before and after. I don't like the prospect of a full bisect.
Powered by blists - more mailing lists