[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <62304351-7fc0-48b6-883b-d346886dac8e@amd.com>
Date: Tue, 15 Apr 2025 16:04:11 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Jan Kiszka <jan.kiszka@...mens.com>, Aaron Lu <ziqianlu@...edance.com>,
Florian Bezdeka <florian.bezdeka@...mens.com>
CC: Valentin Schneider <vschneid@...hat.com>, Ben Segall <bsegall@...gle.com>,
Peter Zijlstra <peterz@...radead.org>, Josh Don <joshdon@...gle.com>, Ingo
Molnar <mingo@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>, Xi
Wang <xii@...gle.com>, <linux-kernel@...r.kernel.org>, Juri Lelli
<juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>, Steven
Rostedt <rostedt@...dmis.org>, Mel Gorman <mgorman@...e.de>, Chengming Zhou
<chengming.zhou@...ux.dev>, Chuyi Zhou <zhouchuyi@...edance.com>, "Sebastian
Andrzej Siewior," <bigeasy@...utronix.de>
Subject: Re: [RFC PATCH v2 0/7] Defer throttle when task exits to user
On 4/15/2025 2:15 PM, K Prateek Nayak wrote:
> (+ Sebastian)
>
> Hello Jan,
>
> On 4/15/2025 11:39 AM, Jan Kiszka wrote:
>>>> Attached the bits with which we succeeded, sometimes. Setup: Debian 12,
>>>> RT kernel, 2-4 cores VM, 1-5 instances of the test, 2 min - 2 h
To improve the reproducibility, I pinned the two tasks to the same CPU
as the bandwidth timer and I could hit this consistently within a few
minutes at the most.
>>>> patience. As we have to succeed with at least 3 race conditions in a
>>>> row, that is still not bad... But maybe someone has an idea how to
>>>> increase probabilities further.
>>>
>>> Looking at run.sh, there are only fair tasks with one of them being run
>>> with cfs bandwidth constraints. Are you saying something goes wrong on
>>> PREEMPT_RT as a result of using bandwidth control on fair tasks?
>>
>> Yes, exactly. Also our in-field workload that triggers (most likely)
>> this issue is not using RT tasks itself. Only kernel threads are RT here.
>>
>>>
>>> What exactly is the symptom you are observing? Does one of the assert()
>>> trip during the run? Do you see a stall logged on dmesg? Can you provide
>>> more information on what to expect in this 2min - 2hr window?
>>
>> I've just lost my traces from yesterday ("you have 0 minutes to find a
>> power adapter"), but I got nice RCU stall warnings in the VM, including
>> backtraces from the involved tasks (minus the read-lock holder IIRC).
>> Maybe Florian can drop one of his dumps.
>
> So I ran your reproducer on a 2vCPU VM running v6.15-rc1 PREEMPT_RT
> and I saw:
>
> rcu: INFO: rcu_preempt self-detected stall on CPU
> rcu: 0-...!: (15000 ticks this GP) idle=8a74/0/0x1 softirq=0/0 fqs=0
> rcu: (t=15001 jiffies g=12713 q=24 ncpus=2)
> rcu: rcu_preempt kthread timer wakeup didn't happen for 15000 jiffies! g12713 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
> rcu: Possible timer handling issue on cpu=0 timer-softirq=17688
> rcu: rcu_preempt kthread starved for 15001 jiffies! g12713 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
> rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> rcu: RCU grace-period kthread stack dump:
> task:rcu_preempt state:I stack:0 pid:17 tgid:17 ppid:2 task_flags:0x208040 flags:0x00004000
> Call Trace:
> <TASK>
> __schedule+0x401/0x15a0
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? lock_timer_base+0x77/0xb0
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? __pfx_rcu_gp_kthread+0x10/0x10
> schedule+0x27/0xd0
> schedule_timeout+0x76/0x100
> ? __pfx_process_timeout+0x10/0x10
> rcu_gp_fqs_loop+0x10a/0x4b0
> rcu_gp_kthread+0xd3/0x160
> kthread+0xff/0x210
> ? rt_spin_lock+0x3c/0xc0
> ? __pfx_kthread+0x10/0x10
> ret_from_fork+0x34/0x50
> ? __pfx_kthread+0x10/0x10
> ret_from_fork_asm+0x1a/0x30
> </TASK>
> CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.15.0-rc1-test-dirty #746 PREEMPT_{RT,(full)}
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> RIP: 0010:pv_native_safe_halt+0xf/0x20
> Code: 22 df e9 1f 08 e5 fe 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 85 96 15 00 fb f4 <e9> f7 07 e5 fe 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
> RSP: 0018:ffffffff95803e50 EFLAGS: 00000216
> RAX: ffff8e2d61534000 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000081f8a6c
> RBP: ffffffff9581d280 R08: 0000000000000000 R09: ffff8e2cf7d32301
> R10: ffff8e2be11ae5c8 R11: 0000000000000001 R12: 0000000000000000
> R13: 0000000000000000 R14: 0000000000000000 R15: 00000000000147b0
> FS: 0000000000000000(0000) GS:ffff8e2d61534000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000055e77c3a5128 CR3: 000000010ff78003 CR4: 0000000000770ef0
> PKRU: 55555554
> Call Trace:
> <TASK>
> default_idle+0x9/0x20
> default_idle_call+0x30/0x100
> do_idle+0x20f/0x250
> ? do_idle+0xb/0x250
> cpu_startup_entry+0x29/0x30
> rest_init+0xde/0x100
> start_kernel+0x733/0xb20
> ? copy_bootdata+0x9/0xb0
> x86_64_start_reservations+0x18/0x30
> x86_64_start_kernel+0xba/0x110
> common_startup_64+0x13e/0x141
> </TASK>
>
> Is this in line with what you are seeing?
These are the backtrace for timer and the individual
epoll-stall threads:
[ 539.155042] task:ktimers/1 state:D stack:0 pid:31 tgid:31 ppid:2 task_flags:0x4208040 flags:0x00004000
[ 539.155047] Call Trace:
[ 539.155049] <TASK>
[ 539.155051] __schedule+0x401/0x15a0
[ 539.155055] ? srso_alias_return_thunk+0x5/0xfbef5
[ 539.155059] ? propagate_entity_cfs_rq+0x115/0x290
[ 539.155063] ? srso_alias_return_thunk+0x5/0xfbef5
[ 539.155067] ? srso_alias_return_thunk+0x5/0xfbef5
[ 539.155070] ? rt_mutex_setprio+0x1c2/0x480
[ 539.155075] schedule_rtlock+0x1e/0x40
[ 539.155078] rtlock_slowlock_locked+0x20e/0xc60
[ 539.155088] ? srso_alias_return_thunk+0x5/0xfbef5
[ 539.155093] rt_read_lock+0x8f/0x190
[ 539.155099] ep_poll_callback+0x37/0x2b0
[ 539.155105] __wake_up_common+0x78/0xa0
[ 539.155110] timerfd_tmrproc+0x43/0x60
[ 539.155114] ? __pfx_timerfd_tmrproc+0x10/0x10
[ 539.155116] __hrtimer_run_queues+0xfd/0x2e0
[ 539.155124] hrtimer_run_softirq+0x9d/0xf0
[ 539.155128] handle_softirqs.constprop.0+0xc1/0x2a0
[ 539.155134] ? __pfx_smpboot_thread_fn+0x10/0x10
[ 539.155139] run_ktimerd+0x3e/0x80
[ 539.155142] smpboot_thread_fn+0xf3/0x220
[ 539.155147] kthread+0xff/0x210
[ 539.155151] ? rt_spin_lock+0x3c/0xc0
[ 539.155155] ? __pfx_kthread+0x10/0x10
[ 539.155159] ret_from_fork+0x34/0x50
[ 539.155165] ? __pfx_kthread+0x10/0x10
[ 539.155168] ret_from_fork_asm+0x1a/0x30
[ 539.155176] </TASK>
[ 557.323846] task:epoll-stall state:D stack:0 pid:885 tgid:885 ppid:1 task_flags:0x400000 flags:0x00004002
[ 557.323848] Call Trace:
[ 557.323849] <TASK>
[ 557.323851] __schedule+0x401/0x15a0
[ 557.323853] ? rt_write_lock+0x108/0x260
[ 557.323858] schedule_rtlock+0x1e/0x40
[ 557.323860] rt_write_lock+0xaa/0x260
[ 557.323864] do_epoll_wait+0x21f/0x4a0
[ 557.323869] ? __pfx_ep_autoremove_wake_function+0x10/0x10
[ 557.323872] __x64_sys_epoll_wait+0x63/0x100
[ 557.323876] do_syscall_64+0x6f/0x120
[ 557.323879] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.323881] ? ksys_read+0x6b/0xe0
[ 557.323883] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.323885] ? syscall_exit_to_user_mode+0x51/0x1a0
[ 557.323887] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.323889] ? do_syscall_64+0x7b/0x120
[ 557.323890] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.323892] ? ep_send_events+0x26d/0x2b0
[ 557.323896] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.323898] ? do_epoll_wait+0x17e/0x4a0
[ 557.323900] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.323902] ? __rseq_handle_notify_resume+0xa7/0x500
[ 557.323905] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.323907] ? aa_file_perm+0x123/0x4e0
[ 557.323911] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.323912] ? get_nohz_timer_target+0x2a/0x180
[ 557.323914] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.323916] ? _copy_to_iter+0xa3/0x630
[ 557.323920] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.323922] ? timerqueue_add+0x6a/0xc0
[ 557.323924] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.323926] ? hrtimer_start_range_ns+0x2e7/0x4a0
[ 557.323931] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.323932] ? timerfd_read_iter+0x141/0x2b0
[ 557.323934] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.323936] ? security_file_permission+0x123/0x140
[ 557.323940] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.323942] ? vfs_read+0x264/0x340
[ 557.323946] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.323948] ? ksys_read+0x6b/0xe0
[ 557.323950] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.323951] ? syscall_exit_to_user_mode+0x51/0x1a0
[ 557.323953] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.323955] ? do_syscall_64+0x7b/0x120
[ 557.323957] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.323958] ? syscall_exit_to_user_mode+0x168/0x1a0
[ 557.323960] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.323962] ? do_syscall_64+0x7b/0x120
[ 557.323963] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.323965] ? do_syscall_64+0x7b/0x120
[ 557.323967] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 557.323968] RIP: 0033:0x7f0371fd3dea
[ 557.323970] RSP: 002b:00007ffd1062cd68 EFLAGS: 00000246 ORIG_RAX: 00000000000000e8
[ 557.323971] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0371fd3dea
[ 557.323972] RDX: 0000000000000001 RSI: 00007ffd1062cda4 RDI: 0000000000000005
[ 557.323973] RBP: 00007ffd1062ce00 R08: 0000000000000000 R09: 000055ea1d8f12a0
[ 557.323974] R10: 00000000ffffffff R11: 0000000000000246 R12: 00007ffd1062cf18
[ 557.323975] R13: 000055ea03253249 R14: 000055ea03255d80 R15: 00007f0372121040
[ 557.323979] </TASK>
[ 557.431402] task:epoll-stall-wri state:R running task stack:0 pid:887 tgid:887 ppid:1 task_flags:0x400100 flags:0x00000002
[ 557.431405] Call Trace:
[ 557.431406] <TASK>
[ 557.431408] __schedule+0x401/0x15a0
[ 557.431410] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.431412] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.431414] ? psi_group_change+0x212/0x460
[ 557.431417] ? pick_eevdf+0x71/0x180
[ 557.431419] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.431421] ? update_curr+0x8d/0x240
[ 557.431425] preempt_schedule+0x41/0x60
[ 557.431427] preempt_schedule_thunk+0x16/0x30
[ 557.431431] try_to_wake_up+0x2f6/0x6e0
[ 557.431433] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.431437] ep_autoremove_wake_function+0x12/0x40
[ 557.431439] __wake_up_common+0x78/0xa0
[ 557.431443] __wake_up_sync+0x34/0x50
[ 557.431445] ep_poll_callback+0x13e/0x2b0
[ 557.431448] ? aa_file_perm+0x123/0x4e0
[ 557.431451] __wake_up_common+0x78/0xa0
[ 557.431454] __wake_up_sync_key+0x38/0x50
[ 557.431456] anon_pipe_write+0x43b/0x6d0
[ 557.431461] fifo_pipe_write+0x13/0xe0
[ 557.431463] vfs_write+0x374/0x420
[ 557.431468] ksys_write+0xc9/0xe0
[ 557.431471] do_syscall_64+0x6f/0x120
[ 557.431473] ? current_time+0x30/0x130
[ 557.431476] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.431479] ? vfs_write+0x1bd/0x420
[ 557.431481] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.431483] ? vfs_write+0x1bd/0x420
[ 557.431487] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.431489] ? ksys_write+0xc9/0xe0
[ 557.431491] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.431492] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.431494] ? syscall_exit_to_user_mode+0x51/0x1a0
[ 557.431496] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.431498] ? do_syscall_64+0x7b/0x120
[ 557.431501] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.431502] ? ksys_write+0xc9/0xe0
[ 557.431504] ? do_syscall_64+0x7b/0x120
[ 557.431506] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.431508] ? syscall_exit_to_user_mode+0x51/0x1a0
[ 557.431510] ? srso_alias_return_thunk+0x5/0xfbef5
[ 557.431511] ? do_syscall_64+0x7b/0x120
[ 557.431513] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 557.431515] RIP: 0033:0x7f0ef191c887
[ 557.431516] RSP: 002b:00007ffc15f50948 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 557.431517] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0ef191c887
[ 557.431518] RDX: 0000000000000001 RSI: 0000561a16a0203d RDI: 0000000000000003
[ 557.431519] RBP: 00007ffc15f50970 R08: 0000000000000065 R09: 0000561a45f0e2a0
[ 557.431520] R10: 0000000000000077 R11: 0000000000000246 R12: 00007ffc15f50a88
[ 557.431521] R13: 0000561a16a011a9 R14: 0000561a16a03da8 R15: 00007f0ef1a7b040
[ 557.431525] </TASK>
>
>>
>>>
>>> Additionally, do you have RT throttling enabled in your setup? Can long
>>> running RT tasks starve fair tasks on your setup?
>>
>> RT throttling is enabled (default settings) but was not kicking in - why
>> should it in that scenario? The only RT thread, ktimers, ran into the
>> held lock and stopped.
>>
>> Jan
>>
>
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists